Skip to content

proposal: io/fs, filepath: add more efficient Walk alternative #41974

@rsc

Description

@rsc

There are at least three problems with filepath.Walk:

  • The os.FileInfo passed to the callback keeps Walk from using the new, more efficient ReadDir API (os: add ReadDir method for lightweight directory reading #41467).
  • The function signature of the callback is a bit hard to remember - I always have to look it up.
  • The error handling, and in particular SkipDir, is a bit unusual and error-prone.

@kr's Walker API takes a different approach that solves all these, by providing an explicit iterator object instead of a callback-based model. I propose to adapt this API to provide an alternative to filepath.Walk and then also adopt this for io/fs instead of filepath.Walk.

The new API is:

// A Walker iterates over the directory entries of a file system subtree.
// The entries are walked in lexical order, which makes the traversal
// predictable but means that very large directories must be read
// in their entirety before any entries can be processed.
// (Note that Walker is itself an implementation of os.DirEntry.)
//
// Each directory is visited twice in the traversal:
// once before processing and once after processing.
// The two cases are distinguished by the Exit method.
type Walker struct {
	...
}

// NewWalker
func NewWalker(root string) (*Walker, error)

// Next advances the Walker to visit the next result in the traversal.
// It returns true on success, or false if the entire tree has been visited.
// Next must be called for each step in the traversal, including the first.
func (w *Walker) Next() bool

// Path returns the path the walker is currently visiting.
func (w *Walker) Path() string

// Rel returns the relative path from the root of the tree being walked
// to the entry being visited. The path is either "." or else a sequence of
// Separator-separated path elements that does not contain "." or "..".
func (w *Walker) Rel() string

// Name returns the final path element of the path being visited.
func (w *Walker) Name() string

// IsDir reports whether the path being visited is a directory.
func (w *Walker) IsDir() bool

// Type reports the type of the path being visited.
func (w *Walker) Type() os.FileMode

// Info returns the file information for the path being visited.
// It may return information obtained when the path's parent
// directory was read or (when visiting the root) when NewWalker was called,
// or it may return information obtained during the call to Info.
func (w *Walker) Info() (os.FileInfo, error)

// Exiting reports whether this step in the traversal marks the
// exiting of a directory.
// Each directory is visited twice by the Walker:
// before and after visiting the directory's children.
// During the first visit, Exiting returns false; during the second, it returns true.
// If the path being visited is not a directory, Exiting always returns false.
func (w *Walker) Exiting() bool

// Err returns any error encountered trying to visit the current path.
// Err can be non-nil before the initial call to Next,
// indicating a problem with the argument passed to NewWalker.
// In this case, the initial call Next will return false.
// Otherwise, Err can be non-nil when Exiting returns true,
// indicating a problem reading the directory being exited.
func (w *Walker) Err() error

// SkipDir instructs the walker to skip the remainder of the current directory.
// The subsequent call to Next will advance the walker to the next step
// in the traversal for which Exiting is true.
// That is:
//  - If IsDir() == true and Exiting() == false (entering a directory),
//     calling SkipDir skips over the children of that directory entirely.
//  - If IsDir() == false (visiting a file inside a directory),
//     calling SkipDir skips over the remaining siblings of the current file.
//  - If IsDir() == true and Exiting() == true (exiting a directory),
//     calling SkipDir skips over the remaining siblings of the current directory,
//     so that Next advances to exiting the parent of the current directory.
func (w *Walker) SkipDir()

An example:

w := NewWalker("/does/not/exist")
if err := w.Err(); err != nil {
	log.Print(err)
}
for w.Next() {
	if err := w.Err(); err != nil {
		log.Print(err)
	}
	if !w.Exiting() {
		fmt.Println(w.Rel())
	}
}

This API differs from github.com/kr/fs.Walker in that it reports both entry and exit from every directory. In contrast, kr/fs.Walker only reports an exit-directory step in case of a ReadDir error, and it provides no clear way to distinguish the second result (except implicitly that Info().IsDir() == false && Err() != nil indicates an “exiting” event, but that fact is undocumented).

The API also differs in that it provides all the os.DirEntry methods, where kr/fs.Walker provides only Info() os.FileInfo. A possible simplification of the API would be to provide DirEntry() os.DirEntry instead of Name, IsDir, Type, Info, but those are likely to be very commonly called, and forcing the user to write (say) w.DirEntry().IsDir() seems unnecessarily pedantic.

Rel is not strictly needed (nor included in kr/fs.Walk), but I've needed that result in almost every filepath.Walk call I've ever written. It can be derived from Path, but doing so is tricky.

And of course kr/fs.Walker's constructor is named Walk. We must use NewWalker instead because filepath.Walk is taken. For io/fs we may be able to call it Walk.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions