-
Notifications
You must be signed in to change notification settings - Fork 17.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: io/fs, filepath: add more efficient Walk alternative #41974
Comments
NewWalker should return an error instead of overloading the Err method: less to explain in the docs, harder to miss. |
I think the Unfortunately, I don't have the bandwidth to look into less error-prone alternatives to suggest at the moment. |
I also think it's unfortunate that this The |
As far as I can tell the |
I'm not clear on why we need all of |
I've needed the depth before. It would be pretty easy to wrap the walker to collect and report that. The same could be said for Path and Rel, though. |
Giving this a bit more thought, I think the API I would prefer is something like: // NewWalker returns a Walker that walks the subtree of fsys rooted at path root.
func NewWalker(fsys FS, root string) *Walker
// A Walker iterates over the entries in a filesystem tree.
type Walker struct {
…
}
// Done reports whether the last valid entry returned was for
// the root of the walk.
func (w *Walker) Done() bool
// Next advances the walker to the next entry within the current directory
// and returns that entry.
// If no entries remain, Next returns a nil *WalkEntry and io.EOF.
// Otherwise, Next always returns a non-nil *WalkEntry.
//
// A new Walker includes only one entry, for the root of the walk itself.
// The first call to Next on a new Walker returns the root of the walk.
// If the root is not entered, the next call to Done will return true.
//
// If the root does not exist, the first call to Next returns a non-nil
// WalkEntry and an error for which os.IsNotExist returns true.
func (w *Walker) Next() (*WalkEntry, error)
// EnterDir causes the Walker to begin traversing the directory
// corresponding to the last entry returned by Next.
//
// EnterDir returns a non-nil error (and leaves the walker
// unchanged) if:
// the entries for the directory could not be identified,
// Next has not yet been called for the current directory,
// the last call to Next did not return a directory,
// or ExitDir was called after the last call to Next.
func (w *Walker) EnterDir() error
// ExitDir returns the WalkEntry for the current directory, and
// causes the walker to move back to its parent directory.
// The subsequent call to Next will return the next sibling (if any)
// of the returned entry.
//
// If the current directory was already the root of the walk,
// ExitDir returns a nil *WalkEntry with ok=false.
func (w *Walker) ExitDir() (parent *WalkEntry, ok bool)
// A WalkEntry is an entry within a directory encountered during a Walk.
type WalkEntry struct {
…
}
// IsDir reports whether the entry describes a subdirectory.
func (e *WalkEntry) IsDir() bool
// Name returns the name of the entry within its parent directory.
func (e *WalkEntry) Name() string
// Path returns the path by which the walker reached this entry.
func (e *WalkEntry) Path() string
// Rel returns the relative path from the root of the tree being walked
// to the entry. The path is either "." or else a sequence of
// Separator-separated path elements that does not contain "." or "..".
func (e *WalkEntry) Rel() string
// Info returns the FileInfo for the file or subdirectory described by the entry.
func (e *WalkEntry) Info() (FileInfo, error) The corresponding implementation of the example might look like: w, err := fs.Walk(os.FS(), "/does/not/exist")
for !w.Done() {
e, err := w.Next()
if err == io.EOF {
w.ExitDir()
continue
}
if err != nil {
log.Print(err)
}
log.Print(e.Rel())
if e.IsDir() {
if err := w.EnterDir(); err != nil {
log.Print(err)
}
}
} That API addresses the same issues as the original proposal, but in my opinion provides more robust error-handling (since the entry cannot be accessed without also binding the error to either a variable or |
FWIW, the walker API I wrote originally has the benefit of matching existing patterns for iterators, specifically sql.Rows and reflect.MapIter. |
That's true, but experiences with iterators like Also, |
Thinking about the error-handling a bit more: probably Dropping the error from |
Actually, I think that reinforces my concern about error-handling with the The current proposal says: // Otherwise, Err can be non-nil when Exiting returns true,
// indicating a problem reading the directory being exited. But that seems surprising to me: I would expect errors to be possible only when entering directories, not when exiting them. |
So, perhaps: // Walk begins walking the subtree of fsys rooted at path root.
//
// The returned Walker initially contains only the root.
// If the root does not exist, Walk returns an empty, non-nil Walker
// and an error for which os.IsNotExist returns true.
func Walk(fsys FS, root string) (*Walker, error)
// Next advances the walker to the next entry within the current directory
// and returns that entry.
// If no entries remain, Next returns a nil *WalkEntry and ok=false.
//
// A new Walker includes only one entry, for the root of the walk itself.
// The first call to Next on a new Walker returns the root of the walk.
// If the root is not entered, the next call to Done will return true.
func (w *Walker) Next() (*WalkEntry, bool) (with the remaining API per #41974 (comment).) I'm still not quite happy about |
I had assumed that the error exiting the directory was only if the directory that was being read from no longer existed and that if it tried to open a directory that didn't exist it would just skip it |
@jimmyfrasche, I would expect an error entering a directory to be possible due to a number of reasons, such as:
|
I guess in those cases the original api would enter the directory and on the next iteration immediately exit it and report the error? |
Or we could just keep the callback and do a more surgical fix that only replaces FileInfo with DirEntry; see #42027 for that. |
Change https://golang.org/cl/243916 mentions this issue: |
Is |
Supposing for the moment that Looking at the docs for And I think this would not make anything else already in this proposal impossible. Just an idea. |
I agree that this snippet is more intuitive than the alternative: if w.Entering() {
if matchesExcludePattern(w.Path()) {
w.SkipDir()
continue
}
}
// ... it's not a new directory, do normal processing below |
This is turning out a lot more complex than I thought it would - I thought it would be a simple matter of copying the kr/fs API. And the primary goal was to allow use of the lazy ReadDir, but that can be done more directly by adding a filepath.WalkDir and fs.WalkDir that pass a DirEntry to the WalkFunc (#42027). That approach also has the benefit that code will be able to more easily update from filepath.Walk to filepath.WalkDir - the body of the WalkFunc may not need to change at all. I'm retracting this proposal in favor of #42027. Thanks for all the feedback - it was very helpful. |
There are at least three problems with filepath.Walk:
@kr's Walker API takes a different approach that solves all these, by providing an explicit iterator object instead of a callback-based model. I propose to adapt this API to provide an alternative to filepath.Walk and then also adopt this for io/fs instead of filepath.Walk.
The new API is:
An example:
This API differs from github.com/kr/fs.Walker in that it reports both entry and exit from every directory. In contrast, kr/fs.Walker only reports an exit-directory step in case of a ReadDir error, and it provides no clear way to distinguish the second result (except implicitly that Info().IsDir() == false && Err() != nil indicates an “exiting” event, but that fact is undocumented).
The API also differs in that it provides all the os.DirEntry methods, where kr/fs.Walker provides only
Info() os.FileInfo
. A possible simplification of the API would be to provideDirEntry() os.DirEntry
instead of Name, IsDir, Type, Info, but those are likely to be very commonly called, and forcing the user to write (say)w.DirEntry().IsDir()
seems unnecessarily pedantic.Rel is not strictly needed (nor included in kr/fs.Walk), but I've needed that result in almost every filepath.Walk call I've ever written. It can be derived from Path, but doing so is tricky.
And of course kr/fs.Walker's constructor is named Walk. We must use NewWalker instead because filepath.Walk is taken. For io/fs we may be able to call it Walk.
The text was updated successfully, but these errors were encountered: