Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: os: add iterator variant of File.ReadDir #70084

Open
bradfitz opened this issue Oct 28, 2024 · 24 comments
Open

proposal: os: add iterator variant of File.ReadDir #70084

bradfitz opened this issue Oct 28, 2024 · 24 comments
Labels
Milestone

Comments

@bradfitz
Copy link
Contributor

Proposal Details

Today we have:

https://pkg.go.dev/os#File.ReadDir etc

func (f *File) ReadDir(n int) ([]DirEntry, error)

That n feels so antiquated now that we have iterators! I propose that we add an iterator-based variant. 😄

/cc @neild

@gopherbot gopherbot added this to the Proposal milestone Oct 28, 2024
@gabyhelp
Copy link

Related Issues and Documentation

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

@bradfitz
Copy link
Contributor Author

(Note that this is complementary to #64341, which goes recursively)

@ianlancetaylor ianlancetaylor moved this to Incoming in Proposals Oct 28, 2024
@ianlancetaylor
Copy link
Contributor

You left out the most interesting part of the proposal: what should the new method be called?

Also, should it return iter.Seq2[DirEntry, error] or should it return (iter.Seq[DirEntry], error) ?

@mateusz834
Copy link
Member

mateusz834 commented Oct 29, 2024

Also, should it return iter.Seq2[DirEntry, error] or should it return (iter.Seq[DirEntry], error) ?

Or even (iter.Seq2[DirEntry, error], error)

EDIT: I was thinking about the global os.ReadDir, this probably matters only there.

@gazerro
Copy link
Contributor

gazerro commented Oct 29, 2024

You should be able to iterate over entries in both directory order and sorted by filename.

Having an iterator return an error can help you remember to handle errors, but it can also make the code a bit messier if error handling isn’t just about returning the error.

That said, here’s my proposal:

// DirEntries returns a collection of the contents in the directory associated
// with file. It always returns a non-nil collection. If f is nil or not a
// directory, the iterator methods return an empty sequence, and a call to Err
// will return the previously occurred error.
func (f *File) DirEntries() *DirEntries

type DirEntries struct {
	// contains filtered or unexported fields
}

func (d *DirEntries) All() iter.Seq[DirEntry]

func (d *DirEntries) Sorted() iter.Seq[DirEntry]

func (d *DirEntries) Names() iter.Seq[string]

func (d *DirEntries) SortedNames() iter.Seq[string]

func (d *DirEntries) Err() error

and used as shown below:

entries := f.DirEntries()
for entry := range entries.All() {
        // ...
}
if err := entries.Err(); err != nil {
        // handle the error
}

@mateusz834
Copy link
Member

I personally do not like having a separate Err method, it is easy to forget to use it.

@earthboundkid
Copy link
Contributor

I personally do not like having a separate Err method, it is easy to forget to use it.

That is a very fair criticism, but I am writing my own iterating wrapper around fs.WalkFunc/filepath.WalkFunc, and I have found there's not a good alternative. My current API takes an error handler callback and has a method to return the last error, but it's not totally satisfying. It's also a fairly fiddly API with a lot of decision points, and the more I work on it, I'm not actually sure it makes sense to move it into the standard library as opposed to just letting third parties handle it.

@jimmyfrasche
Copy link
Member

iter.Seq2[T, error] makes sense when there can be an error per iteration and you can continue the loop after any such error. Otherwise, the error is only non-nil at most once and you need to write the error handling code that is logically after the loop inside the loop.

There are many places where it's possible to forget to check an error in Go. For example, if you have a iter.Seq2[T, error] you can write for t := range seq

@mateusz834
Copy link
Member

There are many places where it's possible to forget to check an error in Go. For example, if you have a iter.Seq2[T, error] you can write for t := range seq

I thought that the conclusion of #65236 was to introduce a vet check for this, but it does not seem to be added.

#65236 (comment)

@neild
Copy link
Contributor

neild commented Oct 29, 2024

We have entirely too many ways to list a directory's contents:

  • os.ReadDir
  • os.File.ReadDir
  • os.File.Readdir
  • os.File.Readdirnames
  • io/fs.ReadDir
  • io/fs.ReadDirFile.ReadDir
  • io/fs.ReadDirFS.ReadDir
  • io/fs.WalkDir
  • path/filepath.Walk
  • path/filepath.WalkDir

Inconsistencies abound: Some functions return the full directory listing, some are iterative and return a chunk, some sort results, some don't.

And yet we don't seem to have enough functions, because there are gaps: Walk is less efficient than WalkDir because it calls lstat on each file; WalkDir is less efficient than Walk because it needs to read each directory into memory to sort its contents. Walk/WalkDir do a preorder traversal, but some operations (like RemoveAll) require a postorder traversal.

This isn't necessarily an argument against adding an iterator variant of ReadDir, but I think that we need to have a clear understanding on how new directory listing functions fit into the existing mess.

(It's incredibly tempting to try to propose One New API that subsumes all the existing ones--flat directory listing, tree walking, pre- or post-order traversal, sorted or unsorted, a traversal-resistant way to stat or open files, room for expansion with whatever we forgot.)

@mateusz834
Copy link
Member

iter.Seq2[T, error] makes sense when there can be an error per iteration and you can continue the loop after any such error. Otherwise, the error is only non-nil at most once and you need to write the error handling code that is logically after the loop inside the loop.

That is a very fair criticism, but I am writing my own iterating wrapper around fs.WalkFunc/filepath.WalkFunc, and I have found there's not a good alternative. My current API takes an error handler callback and has a method to return the last error, but it's not totally satisfying.

An idea that comes to my mind, this kind of helper can be added to the iter/errors package:

func ErrorAbsorber[T any](iter iter.Seq2[T, error], out *error) iter.Seq[T] {
	return func(yield func(T) bool) {
		for v, err := range iter {
			if err != nil {
				*out = err
				break
			}
			if !yield(v) {
				return
			}
		}
	}
}

Usage:

var err error
for _ = range ErrorAbsorber(ErrorIter(), &err) {
        // logic
}
if err != nil {
        // error handling logic
}

This might help in such cases to move the error handling logic easily from the loop body.

@bradfitz
Copy link
Contributor Author

Also, should it return iter.Seq2[DirEntry, error] or should it return (iter.Seq[DirEntry], error)?

I think it'd have to be closer to the former. Because the implementation may involve multiple system calls at runtime in the middle of the stream, the file system might hit corruption and error out in the middle (after the latter signature already returned an iterator and a nil error), and you'd need some way to yield that to the iterating caller.

That's assuming it's unsorted. And I am kinda assuming it'd need to be unsorted to be interesting because we already have https://pkg.go.dev/os#ReadDir which buffers it all up to sort. People can use that today if that's what they want.

@jonlundy
Copy link

This might help in such cases to move the error handling logic easily from the loop body.

This brings you back to:

for _ = range it.Iter() {
        // logic
}
if err := it.Err(); err != nil {
        // error handling logic
}

Which follows the convention we have already in scanners.

@bradfitz
Copy link
Contributor Author

Per chat with some Go folk today, a few of us didn't like the general pattern of iter.Seq2[T, error] because it makes it too easy for callers to ignore errors.

In this particular case, iter.Seq2[DirEntry, error] is safer because DirEntry is an interface and a caller ignoring the error would panic using the nil interface, but if we adopted that pattern more broadly it's not safe, as many zero values are valid or easily ignorable. e.g. imagine a stream of integers in iter.Seq2[int, error] ... if the caller did for n := range seq, they'd get a zero at the end.... is that a real zero, or a zero with an error they discarded?

So we should probably do something like iter.Seq[SomeStructWithErrorOrValue[DirEntry]] ,

type SomeStructWithErrorOrValue[T any] struct {
    Err error // exactly one of Err or V is set
    V   T     // only valid if Err is nil
}

... and then we went off into naming tangents and where such a type would live, etc.

@mateusz834
Copy link
Member

Per chat with some Go folk today, a few of us didn't like the general pattern of iter.Seq2[T, error] because it makes it too easy for callers to ignore errors.

This can be solved by a vet check, see
#70084 (comment)

bradfitz added a commit to tailscale/tailscale that referenced this issue Nov 5, 2024
This adds a new generic result type (motivated by golang/go#70084)
to try it out, and uses it in the lineread package, changing that
package to return iterators: sometimes over []byte (when the input is
all in memory), but sometimes iterators over results of []byte, if
errors might happen at runtime.

Updates #12912
Updates golang/go#70084

Change-Id: Iacdc1070e661b5fb163907b1e8b07ac7d51d3f83
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
bradfitz added a commit to tailscale/tailscale that referenced this issue Nov 5, 2024
This adds a new generic result type (motivated by golang/go#70084)
to try it out, and uses it in the lineread package, changing that
package to return iterators: sometimes over []byte (when the input is
all in memory), but sometimes iterators over results of []byte, if
errors might happen at runtime.

Updates #12912
Updates golang/go#70084

Change-Id: Iacdc1070e661b5fb163907b1e8b07ac7d51d3f83
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
@bradfitz
Copy link
Contributor Author

bradfitz commented Nov 5, 2024

We ended up with a "result" type (named after Rust/Swift's) in a result package: https://go.dev/play/p/WzWDeZV42Qr

Then we can write code that produces errors in the middle of an iterator.

e.g. here's an iterator over lines of a file:

// File returns an iterator that reads lines from the named file.
func File(name string) iter.Seq[result.Of[[]byte]] {
	f, err := os.Open(name)
	return func(yield func(result.Of[[]byte]) bool) {
		if err != nil {
			yield(result.Error[[]byte](err))
			return
		}
		defer f.Close()
		bs := bufio.NewScanner(f)
		for bs.Scan() {
			if !yield(result.Value(bs.Bytes())) {
				return
			}
		}
		if err := bs.Err(); err != nil {
			yield(result.Error[[]byte](err))
		}
	}
}

And now callers can't so easily ignore errors, as is common with people using bufio.Scanner or database/sql.Rows, etc.

At least ignoring errors is obvious on the page now.

bradfitz added a commit to tailscale/tailscale that referenced this issue Nov 5, 2024
This adds a new generic result type (motivated by golang/go#70084)
to try it out, and uses it in the lineread package, changing that
package to return iterators: sometimes over []byte (when the input is
all in memory), but sometimes iterators over results of []byte, if
errors might happen at runtime.

Updates #12912
Updates golang/go#70084

Change-Id: Iacdc1070e661b5fb163907b1e8b07ac7d51d3f83
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
bradfitz added a commit to tailscale/tailscale that referenced this issue Nov 5, 2024
This adds a new generic result type (motivated by golang/go#70084)
to try it out, and uses it in the lineread package, changing that
package to return iterators: sometimes over []byte (when the input is
all in memory), but sometimes iterators over results of []byte, if
errors might happen at runtime.

Updates #12912
Updates golang/go#70084

Change-Id: Iacdc1070e661b5fb163907b1e8b07ac7d51d3f83
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
bradfitz added a commit to tailscale/tailscale that referenced this issue Nov 5, 2024
This adds a new generic result type (motivated by golang/go#70084)
to try it out, and uses it in the lineread package, changing that
package to return iterators: sometimes over []byte (when the input is
all in memory), but sometimes iterators over results of []byte, if
errors might happen at runtime.

Updates #12912
Updates golang/go#70084

Change-Id: Iacdc1070e661b5fb163907b1e8b07ac7d51d3f83
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
bradfitz added a commit to tailscale/tailscale that referenced this issue Nov 5, 2024
This adds a new generic result type (motivated by golang/go#70084) to
try it out, and uses it in the new lineutil package (replacing the old
lineread package), changing that package to return iterators:
sometimes over []byte (when the input is all in memory), but sometimes
iterators over results of []byte, if errors might happen at runtime.

Updates #12912
Updates golang/go#70084

Change-Id: Iacdc1070e661b5fb163907b1e8b07ac7d51d3f83
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
bradfitz added a commit to tailscale/tailscale that referenced this issue Nov 5, 2024
This adds a new generic result type (motivated by golang/go#70084) to
try it out, and uses it in the new lineutil package (replacing the old
lineread package), changing that package to return iterators:
sometimes over []byte (when the input is all in memory), but sometimes
iterators over results of []byte, if errors might happen at runtime.

Updates #12912
Updates golang/go#70084

Change-Id: Iacdc1070e661b5fb163907b1e8b07ac7d51d3f83
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
bradfitz added a commit to tailscale/tailscale that referenced this issue Nov 5, 2024
This adds a new generic result type (motivated by golang/go#70084) to
try it out, and uses it in the new lineutil package (replacing the old
lineread package), changing that package to return iterators:
sometimes over []byte (when the input is all in memory), but sometimes
iterators over results of []byte, if errors might happen at runtime.

Updates #12912
Updates golang/go#70084

Change-Id: Iacdc1070e661b5fb163907b1e8b07ac7d51d3f83
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
@jba
Copy link
Contributor

jba commented Nov 6, 2024

There is another solution:

entries, errf := f.Dirs()
for x := range entries {
    ...
}
if err := errf(); err != nil {
    return err
}

This is strictly better than putting an Err method on the iterator, because you can't forget about errf: the compiler will warn about an unused variable. For more detail, see #65236 (comment).

@jaloren
Copy link

jaloren commented Nov 6, 2024

@jba i suspect that a lot of apis are going to return an iterator directly so that consumers can range with just a single function call inlined in the for range statement. since it was decided that its not a compiler err to ignore the second value with seq2, we are back where we started.

I do lean more towards something like brads result type because it seems more straight forward and go like.

@mateusz834
Copy link
Member

@jba Personally i would prefer making this behavior more explicit, with something like: #70084 (comment) (it could also return an func() error instead).

@jba
Copy link
Contributor

jba commented Nov 6, 2024

since it was decided that its not a compiler err to ignore the second value with seq2, we are back where we started.

I don't understand this. File.Dirs returns (iter.Seq[T], func() error). You can't put that after range. Same with the return types (iter.Seq2[K, V], func() error).

@jba
Copy link
Contributor

jba commented Nov 6, 2024

Personally i would prefer making this behavior more explicit

In the code

var err error
for _ = range ErrorAbsorber(ErrorIter(), &err) {
        // logic
}
if err != nil {
        // error handling logic
}

Passing a pointer to err counts as using it, so the compiler won't give you an error if you omit the check at the bottom. My suggestion doesn't have that problem.

You could have a function that takes an iter.Seq2[T, error] and returns (iter.Seq[T], func() error), making it closer to my suggestion.

That still leaves the problem of what to do if your iterator returns two values and an error.

@AnomalRoil
Copy link

Would having it be an io.Closer be an option?
We should already be used to check error on Close, and it could use the extra go vet tooling to check we do. 😋
It does sound a bit weird since this isn't technically io, but it still is tangential to it and I don't feel it surprising to have to Close a directory...

That being said, I feel that the code above using a Result is not something I'd refer to someone saying "this is idiomatic Go". Having
The idea of returning an errf as @jba proposed is appealing, as it looks slightly more idiomatic, and prevents misuse by not encouraging people to "just plug it in a for i := range statement", which is nice.

The above example of an ErrorAbsorber also feels weird, first of all passing it an error as a reference just feels wrong and I wouldn't expect it to break on the first error from its name. Next even if the name was errors.UntilNext, its usage seems to involve some sort of loop, where error handling for these errors happens anyway and then execution resumes... So I'm not convinced it would actually simplify the said handling compared to just using a iter.Seq2[T, error], in which you could always choose to wrap all errors with errors.Join if you care about eventual errors only, or you could choose to do more complex handling...

As a user doing some coding and lots of code review, I'd be happy with iter.Seq2[T, error] being an io.Closer, with that being called out in its documentation, or maybe having an Err method, as long as we get some sort of go vet tooling around it... because an unused Err method is not so easy to spot in code review and not as typical as a missing Close.
But having a (iter.Seq2[T, error], func() Error) feels nicer since this would be very easy to catch in code review, while leaving a lot of flexibility for error handling to the users.

@seankhliao
Copy link
Member

We can force errors to be handled / ignored explicitly with iter.Seq2[error, DirEntry]

@mateusz834
Copy link
Member

We can force errors to be handled / ignored explicitly with iter.Seq2[error, DirEntry]

Not in every case: 😄

for range iter2 {
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

No branches or pull requests