-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: os: add iterator variant of File.ReadDir #70084
Comments
Related Issues and Documentation (Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.) |
(Note that this is complementary to #64341, which goes recursively) |
You left out the most interesting part of the proposal: what should the new method be called? Also, should it return |
Or even EDIT: I was thinking about the global |
You should be able to iterate over entries in both directory order and sorted by filename. Having an iterator return an error can help you remember to handle errors, but it can also make the code a bit messier if error handling isn’t just about returning the error. That said, here’s my proposal: // DirEntries returns a collection of the contents in the directory associated
// with file. It always returns a non-nil collection. If f is nil or not a
// directory, the iterator methods return an empty sequence, and a call to Err
// will return the previously occurred error.
func (f *File) DirEntries() *DirEntries
type DirEntries struct {
// contains filtered or unexported fields
}
func (d *DirEntries) All() iter.Seq[DirEntry]
func (d *DirEntries) Sorted() iter.Seq[DirEntry]
func (d *DirEntries) Names() iter.Seq[string]
func (d *DirEntries) SortedNames() iter.Seq[string]
func (d *DirEntries) Err() error and used as shown below: entries := f.DirEntries()
for entry := range entries.All() {
// ...
}
if err := entries.Err(); err != nil {
// handle the error
} |
I personally do not like having a separate |
That is a very fair criticism, but I am writing my own iterating wrapper around fs.WalkFunc/filepath.WalkFunc, and I have found there's not a good alternative. My current API takes an error handler callback and has a method to return the last error, but it's not totally satisfying. It's also a fairly fiddly API with a lot of decision points, and the more I work on it, I'm not actually sure it makes sense to move it into the standard library as opposed to just letting third parties handle it. |
There are many places where it's possible to forget to check an error in Go. For example, if you have a |
I thought that the conclusion of #65236 was to introduce a vet check for this, but it does not seem to be added. |
We have entirely too many ways to list a directory's contents:
Inconsistencies abound: Some functions return the full directory listing, some are iterative and return a chunk, some sort results, some don't. And yet we don't seem to have enough functions, because there are gaps: Walk is less efficient than WalkDir because it calls lstat on each file; WalkDir is less efficient than Walk because it needs to read each directory into memory to sort its contents. Walk/WalkDir do a preorder traversal, but some operations (like RemoveAll) require a postorder traversal. This isn't necessarily an argument against adding an iterator variant of ReadDir, but I think that we need to have a clear understanding on how new directory listing functions fit into the existing mess. (It's incredibly tempting to try to propose One New API that subsumes all the existing ones--flat directory listing, tree walking, pre- or post-order traversal, sorted or unsorted, a traversal-resistant way to stat or open files, room for expansion with whatever we forgot.) |
An idea that comes to my mind, this kind of helper can be added to the func ErrorAbsorber[T any](iter iter.Seq2[T, error], out *error) iter.Seq[T] {
return func(yield func(T) bool) {
for v, err := range iter {
if err != nil {
*out = err
break
}
if !yield(v) {
return
}
}
}
} Usage: var err error
for _ = range ErrorAbsorber(ErrorIter(), &err) {
// logic
}
if err != nil {
// error handling logic
} This might help in such cases to move the error handling logic easily from the loop body. |
I think it'd have to be closer to the former. Because the implementation may involve multiple system calls at runtime in the middle of the stream, the file system might hit corruption and error out in the middle (after the latter signature already returned an iterator and a nil error), and you'd need some way to yield that to the iterating caller. That's assuming it's unsorted. And I am kinda assuming it'd need to be unsorted to be interesting because we already have https://pkg.go.dev/os#ReadDir which buffers it all up to sort. People can use that today if that's what they want. |
This brings you back to: for _ = range it.Iter() {
// logic
}
if err := it.Err(); err != nil {
// error handling logic
} Which follows the convention we have already in scanners. |
Per chat with some Go folk today, a few of us didn't like the general pattern of In this particular case, So we should probably do something like type SomeStructWithErrorOrValue[T any] struct {
Err error // exactly one of Err or V is set
V T // only valid if Err is nil
} ... and then we went off into naming tangents and where such a type would live, etc. |
This can be solved by a vet check, see |
This adds a new generic result type (motivated by golang/go#70084) to try it out, and uses it in the lineread package, changing that package to return iterators: sometimes over []byte (when the input is all in memory), but sometimes iterators over results of []byte, if errors might happen at runtime. Updates #12912 Updates golang/go#70084 Change-Id: Iacdc1070e661b5fb163907b1e8b07ac7d51d3f83 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This adds a new generic result type (motivated by golang/go#70084) to try it out, and uses it in the lineread package, changing that package to return iterators: sometimes over []byte (when the input is all in memory), but sometimes iterators over results of []byte, if errors might happen at runtime. Updates #12912 Updates golang/go#70084 Change-Id: Iacdc1070e661b5fb163907b1e8b07ac7d51d3f83 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We ended up with a "result" type (named after Rust/Swift's) in a Then we can write code that produces errors in the middle of an iterator. e.g. here's an iterator over lines of a file: // File returns an iterator that reads lines from the named file.
func File(name string) iter.Seq[result.Of[[]byte]] {
f, err := os.Open(name)
return func(yield func(result.Of[[]byte]) bool) {
if err != nil {
yield(result.Error[[]byte](err))
return
}
defer f.Close()
bs := bufio.NewScanner(f)
for bs.Scan() {
if !yield(result.Value(bs.Bytes())) {
return
}
}
if err := bs.Err(); err != nil {
yield(result.Error[[]byte](err))
}
}
} And now callers can't so easily ignore errors, as is common with people using At least ignoring errors is obvious on the page now. |
This adds a new generic result type (motivated by golang/go#70084) to try it out, and uses it in the lineread package, changing that package to return iterators: sometimes over []byte (when the input is all in memory), but sometimes iterators over results of []byte, if errors might happen at runtime. Updates #12912 Updates golang/go#70084 Change-Id: Iacdc1070e661b5fb163907b1e8b07ac7d51d3f83 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This adds a new generic result type (motivated by golang/go#70084) to try it out, and uses it in the lineread package, changing that package to return iterators: sometimes over []byte (when the input is all in memory), but sometimes iterators over results of []byte, if errors might happen at runtime. Updates #12912 Updates golang/go#70084 Change-Id: Iacdc1070e661b5fb163907b1e8b07ac7d51d3f83 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This adds a new generic result type (motivated by golang/go#70084) to try it out, and uses it in the lineread package, changing that package to return iterators: sometimes over []byte (when the input is all in memory), but sometimes iterators over results of []byte, if errors might happen at runtime. Updates #12912 Updates golang/go#70084 Change-Id: Iacdc1070e661b5fb163907b1e8b07ac7d51d3f83 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This adds a new generic result type (motivated by golang/go#70084) to try it out, and uses it in the new lineutil package (replacing the old lineread package), changing that package to return iterators: sometimes over []byte (when the input is all in memory), but sometimes iterators over results of []byte, if errors might happen at runtime. Updates #12912 Updates golang/go#70084 Change-Id: Iacdc1070e661b5fb163907b1e8b07ac7d51d3f83 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This adds a new generic result type (motivated by golang/go#70084) to try it out, and uses it in the new lineutil package (replacing the old lineread package), changing that package to return iterators: sometimes over []byte (when the input is all in memory), but sometimes iterators over results of []byte, if errors might happen at runtime. Updates #12912 Updates golang/go#70084 Change-Id: Iacdc1070e661b5fb163907b1e8b07ac7d51d3f83 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This adds a new generic result type (motivated by golang/go#70084) to try it out, and uses it in the new lineutil package (replacing the old lineread package), changing that package to return iterators: sometimes over []byte (when the input is all in memory), but sometimes iterators over results of []byte, if errors might happen at runtime. Updates #12912 Updates golang/go#70084 Change-Id: Iacdc1070e661b5fb163907b1e8b07ac7d51d3f83 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
There is another solution:
This is strictly better than putting an |
@jba i suspect that a lot of apis are going to return an iterator directly so that consumers can range with just a single function call inlined in the for range statement. since it was decided that its not a compiler err to ignore the second value with seq2, we are back where we started. I do lean more towards something like brads result type because it seems more straight forward and go like. |
@jba Personally i would prefer making this behavior more explicit, with something like: #70084 (comment) (it could also return an |
I don't understand this. |
In the code
Passing a pointer to You could have a function that takes an That still leaves the problem of what to do if your iterator returns two values and an error. |
Would having it be an That being said, I feel that the code above using a The above example of an As a user doing some coding and lots of code review, I'd be happy with |
We can force errors to be handled / ignored explicitly with |
Not in every case: 😄 for range iter2 {
} |
Proposal Details
Today we have:
https://pkg.go.dev/os#File.ReadDir etc
That
n
feels so antiquated now that we have iterators! I propose that we add an iterator-based variant. 😄/cc @neild
The text was updated successfully, but these errors were encountered: