Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: os: make Readdir return lazy FileInfo implementations #41188

Open
rsc opened this issue Sep 2, 2020 · 46 comments
Open

proposal: os: make Readdir return lazy FileInfo implementations #41188

rsc opened this issue Sep 2, 2020 · 46 comments
Labels
Projects
Milestone

Comments

@rsc
Copy link
Contributor

@rsc rsc commented Sep 2, 2020

An os.File provides two ways to read a directory: Readdirnames returns a list of the names of the directory entries, and Readdir returns the names along with stat information.

On Plan 9 and Windows, Readdir can be implemented with only a directory read - the directory read operation provides the full stat information.

But many Go users use Unix systems.

On most Unix systems, the directory read does not provide full stat information. So the implementation of Readdir reads the names from the directory and then calls Lstat for each file. This is fairly expensive.

Much of the time, such as in the implementation of filepath.Glob and other file system walking, the only information the caller of Readdir really needs is the name and whether the name denotes a directory. On most Unix systems, that single bit of information—is this name a directory?—is available from the plain directory read, without an additional stat. If the caller is only using that bit, the extra Lstat calls are unnecessary and slow. (Goimports, for example, has its own directory walker to avoid this cost.)

Various people have proposed adding a third directory reading option of one form or another, to get names and IsDir bits. This would certainly address the slow directory walk issue on Unix systems, but it seems like overfitting to Unix.

Note that os.FileInfo is an interface. What if we make Readdir return a slice of lazily-filled os.FileInfo? That is, on Unix, Readdir would stop calling Lstat. Each returned FileInfo would already know the answer for its Name and IsDir methods. The first call to any of the other methods would incur an Lstat at that moment to find out the rest of the information. A directory walk that uses Readdir and then only calls Name and IsDir would have all its Lstat calls optimized away with no code changes in the caller.

The downside of this is that the laziness would be visible when you do the Readdir and wait a while before looking at the results. For example if you did Readdir, then touched one of the files in the list, then called the ModTime method on the os.FileInfo that Readdir retruned, you'd see the updated modification time. And then if you touched the file again and called ModTime again, you wouldn't see the further-updated modification time. That could be confusing. But I expect that the vast majority of uses of Readdir use the results immediately or at least before making changes to files listed in the results. I suspect the vast majority of users would not notice this change.

I propose we make this change—make Readdir return lazy os.FileInfo—soon, intending it to land in Go 1.16, but ready to roll back the change if the remainder of the Go 1.16 dev cycle or beta/rc testing turns up important problems with it.

/cc @robpike @bradfitz @ianthehat @kr

@rsc rsc added this to the Proposal milestone Sep 2, 2020
@gopherbot gopherbot added the Proposal label Sep 2, 2020
@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Sep 2, 2020

Last time this was proposed there was debate about what the behavior for ModTime and Mode and Size should be if the lazy Lstat fails later, as they don't return errors. Panic is bad. Logging is weird. Zero values I guess?

@bcmills
Copy link
Member

@bcmills bcmills commented Sep 2, 2020

I think this is likely to introduce subtle changes in behavior. Perhaps more importantly, I don't think this is the sort of change that we can reliably verify during a development cycle.

In my experience, very few users who are not either Go contributors or Googlers test Beta or RC releases of the Go toolchain, and changes in the os package are less likely to turn up during Google testing because a significant fraction of Google programs do most of their I/O without using the os package directly.

@tv42
Copy link

@tv42 tv42 commented Sep 2, 2020

os.FileInfo can't be lazy as-is, because it can't return an error. Returning a 0 size on transient errors is not acceptable.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Sep 2, 2020

See also #40352, which is about different approaches to efficiently uncover similar information.

@tv42
Copy link

@tv42 tv42 commented Sep 2, 2020

(Ignoring the POSIX API for a moment) NFS, and likely many other network filesystems, can do completely separate operations depending on whether the stat info is going to be needed (NFSv3 readdir vs readdirplus, NFSv4 "bulk LOOKUP", FUSE_READDIRPLUS).

There's also been a lot of talk about a Linux syscall that would fetch getdents+lstat info, for example https://lwn.net/Articles/606995/ -- they all seem to revolve around the idea of the client knowing beforehand whether it will be doing the lstat calls or not, and communicating that to the kernel.

These combined make me think the right path forward would be a Readdir method that takes arguments that inform it which os.FileInfo fields will be wanted; the rest could be zero values.

(That extended Readdir could also take a flag for whether to sort the results or not, removing one common cause of forks for performance reasons.)

@networkimprov
Copy link

@networkimprov networkimprov commented Sep 3, 2020

EDIT: This has a detailed proposal in #41265

I believe we need a new dirent abstraction.

After reviewing suggestions from the FS API discussion...

Let's consider a hybrid:

  1. ReadDir(path string, n int, opt uint64) ([]DirItem, error) - opt is fields to load and sorting (0 is OS defaults)
    (may return more fields than requested)
  2. (d *DirEntry) Load(fields uint64) error - (re-)loads the fields
    (returns error if inode doesn't match)
  3. (d *DirEntry) Has(fields uint64) bool - indicates whether the fields are loaded
  4. (d *DirEntry) Id() FileId - gives unix inode or winapi fileId; could take an argument re device info
  5. (d *DirEntry) Xyz() T - panics for any field not loaded (a programmer mistake)

That solves the .ModTime() etc issue with lazy-loading, and avoids an error check after every field access.

EDIT: The interface which DirEntry implements and ReadDir() returns:

type DirItem interface {
   Load(fields uint64) error
   Has(fields uint64) bool
   Name() string
   IsDir() bool
}

Rationale:
a) If you need certain fields for every item, request them in ReadDir().
b) If you need certain fields for some items, request them in DirEntry.Load().
c) If you need certain fields only when they're the OS default, check for them with DirEntry.Has().
d) If you need the latest data for an item, request it with DirEntry.Load().

@tv42
Copy link

@tv42 tv42 commented Sep 3, 2020

  1. (d *DirEntry) Id() FileId - gives unix inode or winapi fileId

Wasn't the Windows FileId 128-bit? (Seen somewhere on go issues around the greater topic in the last few days.)

Either way, the unix inode number isn't very useful without the device major:minor. For example, you can't filepath.Walk and expect inode alone to identify hardlinked files, because you may have crossed a mountpoint.

Also, inode number and such belong in a .Sys() style, platform-specific, extension point.

@networkimprov
Copy link

@networkimprov networkimprov commented Sep 3, 2020

I gave the .Id() type as FileId, which can be whatever (e.g. opaque array), and store major:minor. The fact that a FileId can't be compared across platforms isn't a reason to hide it. It's needed to replicate a tree containing multiple hard links for a file -- which I do.

Today on Windows, FileInfo.Sys() can't even provide the fileId! Adding it was debated and discarded.

Is there any practical value in .Sys() besides providing .Ino on unix?

Winapi fileId is 64-bit on NTFS, 128-bit on ReFS (an alternative for Windows Server):
https://docs.microsoft.com/en-us/windows/win32/api/fileapi/ns-fileapi-by_handle_file_information
https://docs.microsoft.com/en-us/windows-server/storage/refs/refs-overview

@benhoyt
Copy link
Contributor

@benhoyt benhoyt commented Sep 3, 2020

I really like the intent, but I agree with other commenters that the API just doesn't quite fit as is, because of potential errors returned by the lazy methods. We actually debated a very similar issue when designing the os.scandir / os.DirEntry API in Python. At first we wanted to make the DirEntry methods properties, like entry.stat without the function call parentheses. That works in Python, but it looks like a plain attribute access, and people aren't expecting to have to catch exceptions (errors) when accessing an attribute, so we made it a function call. Per the docs:

Because the os.DirEntry methods can make operating system calls, they may also raise OSError. If you need very fine-grained control over errors, you can catch OSError when calling one of the os.DirEntry methods and handle as appropriate.

I believe this decision was based on theoretical concerns, not from actual testing, but still, the logic seems sound. Especially in Go, where all error handling is super-explicit (we always want "find grained control over errors"). Panic-ing is not going to work, and silently returning a zero value is arguably worse.

@diamondburned
Copy link

@diamondburned diamondburned commented Sep 4, 2020

The downside of this is that the laziness would be visible when you do the Readdir and wait a while before looking at the results. For example if you did Readdir, then touched one of the files in the list, then called the ModTime method on the os.FileInfo that Readdir retruned, you'd see the updated modification time. And then if you touched the file again and called ModTime again, you wouldn't see the further-updated modification time. That could be confusing. But I expect that the vast majority of uses of Readdir use the results immediately or at least before making changes to files listed in the results. I suspect the vast majority of users would not notice this change.

I have always been under the impression that calling a method on os.FileInfo will fetch the information again until recently. After all, it's an interface, so there wouldn't really be a way to know that unless I read the documentation carefully.

This leads me to believe that saving those information should be up to the user, the caller, to decide when and what to store. The os.FileInfo interface could then just not store anything and do an Lstat every time it needs that piece of information.

Another solution would probably be to dirtily make a new os.InfoResetter interface to accommodate scenarios where an explicit renew should be done. For example:

// Reset the os.FileInfo to fetch new information.
if resetter, ok := fileInfo.(os.InfoResetter); ok {
    resetter.ResetInfo()
}

Although this solution still has the behavior that a ModTime would return a new value after a touch and before calling the method, it would give callers explicit control over whether or not future ModTime calls should return the new value.

@tv42
Copy link

@tv42 tv42 commented Sep 4, 2020

@benhoyt

I believe this decision was based on theoretical concerns, not from actual testing, but still, the logic seems sound.

There's nothing theoretical about lstat(2) returning errors. For example, the file may be removed between the readdir and the lstat. A network partition can occur after readdir, before lstat. A filesystem can have a bug. There's no justification for filling os.FileInfo with imaginary content, Go doesn't throw exceptions for such things, and os.FileInfo methods don't have error returns so it's too late to make syscalls then. (And having a os.FileInfo replacement with errors on every method would be 1) very annoying to use 2) not match the reality, you'd end up checking the error multiple times for one syscall.)

@diamondburned

The os.FileInfo interface could then just not store anything and do an Lstat every time it needs that piece of information.

Can't, it has no way to communicate errors.

Repeating earlier suggestion, more concretely:

os.File.ReadDir(ReadDirOptions{...})

type ReadDirOptions struct {
    Number int    // like current argument n
    // Bitmap of Fields to include in the response.
    // Zero value means all fields, pass InfoFieldName
    // to only get names. Fields not included here
    // will return zero values from FileInfo methods.
    Fields InfoField
    Unsorted bool
}

type InfoField int

const (
    InfoFieldName InfoField = 1 << iota
    InfoFieldSize
    InfoFieldMode
    InfoFieldModTime
    // IsDir is included in Mode. No, getdents d_type
    // is not enough to serve FileInfo.IsDir (or Mode),
    // because it can be DT_UNKNOWN. It can be
    // exposed separately, for callers that can handle it.
    InfoFieldSys // this might finer control, per-platform?
)

If this is going to be used for pluggable filesystems (#41190), then it would probably be nicer to find a way for each filesystem to be able to control FS-specific extensions to FileInfo, instead of using a bitmap with stdlib-decided values. That would serve InfoFieldSys above, too. Regardless, the above would be the information needed to be expressed by stdlib, as far as I understand things.

@diamondburned
Copy link

@diamondburned diamondburned commented Sep 4, 2020

@tv42 This API seems too convoluted comparing to the current API. It also introduces an inconsistency between Readdir and ReadDir, which doesn't make sense.

@tv42
Copy link

@tv42 tv42 commented Sep 4, 2020

@tv42
Copy link

@tv42 tv42 commented Sep 4, 2020

@networkimprov DirEntry.Load is okay for something that looks exactly like lstat(2), but there are many cases where the FS would benefit from knowing up front whether those attributes would be needed or not. Chasing a ReadDir with entry.Load(fields) does not avoid the seek and round-trip penalty, except perhaps by luck in caching; all the entry.Load does is let the current getdents+lstat world avoid an lstat, sometimes.

@diamondburned
Copy link

@diamondburned diamondburned commented Sep 4, 2020

I'll make a second attempt.

What if we make ReadDir return a new mini-interface derived from a section of FileInfo, or more specifically, the Name and IsDir methods? Concretely, it would be this interface:

// DirEnt describes a directory entity.
type DirEnt interface {
    Name() string
    IsDir() bool
    Lstat() (FileInfo, error)
}

var _ DirEnt = (*dirEnt)(nil)

func (f *File) ReadDir(n int) ([]DirEnt, error)

// Windows impl:
func (e *dirEnt) Lstat() (FileInfo, error) {
    return e.finfo, nil
}

// Unix impl:
func (e *dirEnt) Lstat() (FileInfo, error) {
    return lstat(e.name)
}

// Custom impl where getting a FileInfo wouldn't error:
func (e *dirEnt) Lstat() (FileInfo, error) {
    return e, nil
}

This new DirEnt mini-interface should accommodate both the error handling required for Lstat on Unix as well as being relatively cheap on Windows.

@networkimprov
Copy link

@networkimprov networkimprov commented Sep 4, 2020

@tv42 your fields argument is in my ReadDir() :-)

I've added a Rationale section to #41188 (comment) describing cases for the DirEntry API.

I drafted that API after a careful reading of all discussion on this issue, and believe it maximizes performance and usability.

@tv42
Copy link

@tv42 tv42 commented Sep 4, 2020

@networkimprov Your API suggestion chases readdir with reading the stats, my API suggestion gives readdir the information needed to do the right thing. I believe a FS should have the fields up front, to be able to fetch them during the directory reading, and doing that afterward will never be the same. I'm not sure I can use more words to describe that again, and you seem to have completely skipped over the difference.

@networkimprov
Copy link

@networkimprov networkimprov commented Sep 4, 2020

You misunderstood. #41188 (comment) allows both
a) ReadDir(path, n, a|b) to request fields a & b for n items (your plan), and
b) item.Load(a|b) to request fields for a specific item after ReadDir(path, n, 0) retrieves it.

Each approach performs better in certain cases, listed in Rationale. (Either way, all OS-default fields are provided, even if not requested, as it's no extra overhead.)

@tv42
Copy link

@tv42 tv42 commented Sep 4, 2020

@networkimprov Oh I missed the opt argument to ReadDir there. Apologies.

@networkimprov
Copy link

@networkimprov networkimprov commented Sep 7, 2020

I filed #41265. It offers a new ReadDir() API for io/fs.

@rsc
Copy link
Contributor Author

@rsc rsc commented Sep 11, 2020

I intend to do a survey of actual uses in the Go corpus, but I haven't finished it yet.

@rsc
Copy link
Contributor Author

@rsc rsc commented Sep 11, 2020

I grepped through my Go corpus for Readdir/ReadDir calls. That turned up about 55,000 calls. See readdirs.txt, attached.

I randomly sampled 200 calls from that set; 6 were false positives (comments, functions named ReadDir but not the one we are talking about, and so on), and I randomly sampled 6 more to get back to 200 real samples. I classified all 200 samples, looking at how the result of ReadDir was used.

Of the 200 uses of ReadDir, the breakdown is:

  • 35 never called any method on any returned FileInfo. They only cared about the number of results or whether an error was returned.
  • 90 called only the Name method on returned FileInfos.
  • 52 called only the Name and IsDir methods on returned FileInfos.
  • 23 called other methods as well.

Of the 200, none of them saved the FileInfo slice for some future use. They all looked at the results immediately, meaning they would not have a chance to observe file system modifications sequenced between the ReadDir and the method calls.

Also, 35+90+52 = 177/200 = 88% of the calls would end up with no delayed (lazy) Stat calls at all - they never call any methods other than Name and IsDir. Those would be made much faster (and also guaranteed never to see any kind of race or inconsistency, since they never call any of the “extra work” methods).

This gives an estimate, to within maybe 0.5%, that about 88% of programs would get faster with no possibility of noticing the laziness.

And about 12% would run about the same speed, invoking the lazy extra work. These would not see lazy inconsistencies caused by any of their own actions, but they might notice zeroed modification times or sizes if files are being deleted out from under them by other processes (likely rare).

Again, best estimate, to within about 0.5%:

  • 88% get much faster and are guaranteed not to see the difference.
  • 12% stay about the same and only see the difference if another process is doing racing modifications to the file system.
  • 0% of Go programs would ever see a difference they caused themselves.

If you want to see my work, see the #-prefixed notes in readdir200.txt, also attached.

Overall, it seems like a low-risk large win. I'm particularly happy about the number of programs that just start working much faster with no rewriting required.

readdir200.txt
readdirs.txt.gz

@networkimprov
Copy link

@networkimprov networkimprov commented Sep 11, 2020

125/200 = 63% of the cases would get faster by switching to os.Readdirnames(). Could go vet flag those?

Readdir Docs: "... returns a slice of ... FileInfo values, as would be returned by Lstat..." Doesn't the proposal break this contract?

If any path component of the directory itself is renamed after ReadDir(), wouldn't every FileInfo result produce bad metadata?

This is considered bad form: fi, _ := os.Lstat(name) Why is it acceptable here?

Is 0.36% a representative sample size?

@mpx
Copy link
Contributor

@mpx mpx commented Sep 11, 2020

I have code that stores os.FileInfo for later comparison via os.SameFile. Similar purpose to "tail" - it's used to detect if the file has been replaced at a later time. I agree this is an uncommon usage.

I think most people would find the deferred stat violates their default mental model for this API - especially based on past experience. It's doubly weird since their is no way to observe failures (eg, file not found). It would be a particularly awkward compromise/wart to add to such a core part of the standard library.

I've wished there was a faster common API for processing directory entries in the past. However, deferring stat would still be my last choice after:

  1. Providing a new API with a different name and better behaviour (maybe something like: ReadDirent(n int) ([]Dirent, error))
  2. Using/implementing a different package when performance is critical
  3. Accepting Readdir is slow and waiting until a better solution comes along, or maybe even a redesigned /v2 package

This seems like a case where it would be better to take time to see if there is a better solution?

@tv42
Copy link

@tv42 tv42 commented Sep 11, 2020

I want to be very clearly on record that I think the design of silently returning 0 size on errors is a horrible idea. None of the existing "faster than stdlib" things has that design, for a good reason. No amount of statistics will change that; it's a fundamentally incorrect design, no matter whether it's only possible to observe in X% of programs or not. This idea goes against the whole ethos of Go's explicit error handling, and I'm frankly shocked that it's considered viable.

All you need to make that safe and correct is for ReadDir to take an extra bit of "do you want the extra FileInfo fields or not". Then it can skip the lstat when it's safe. Or split it into ReadDirent and ReadDir, where only latter provides full FileInfos.

@bcmills
Copy link
Member

@bcmills bcmills commented Sep 11, 2020

@tv42, would it make you feel better if Size() returned -1 instead of 0? That would at least be distinguishable from a valid size.

(But there isn't really anything similar to be done for Mode, since we can't set each individual bit to an obviously-bogus value.)

@rasky
Copy link
Member

@rasky rasky commented Sep 11, 2020

I'd strongly prefer -1 rather than 0. We could also define a special modebit for "stale FileInfo".

I'll be on record to say this is a bad idea as well. It's not clear why we are designing a new FS abstraction and inheriting what is clearly a mistake of the past. The only reason I see is that this is rushed to have embed.Files in 1.16, and this lazy FileInfo is probably the only solution not to miss the 1.16 deadline. I'm not sure it's a good idea.

@diamondburned
Copy link

@diamondburned diamondburned commented Sep 11, 2020

I'd still say that the better solution would be the first option that @mpx has listed, that is to provide a new API that returns a list of Dirents. Existing code will not get that free performance boost regarding Readdir, however, but I think this should at least allow a more efficient filepath.Walk. It will also allow for much more explicit error handling.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Sep 11, 2020

I'm not sure how I feel about this idea overall, but we could indeed add an error return to Mode, by defining ModeError as a new bit.

@bcmills
Copy link
Member

@bcmills bcmills commented Sep 11, 2020

@ianlancetaylor, I'm not sure that a ModeError would help much — in order to check for it you would already need to know about it, and existing code that does not know about it is not likely to change its behavior when ModeError is set.

@networkimprov
Copy link

@networkimprov networkimprov commented Sep 12, 2020

#40352 (an impetus for this issue) requested a way to see fields from the native dirent struct. It proposed to optionally lazy-load the data for IsDir() when not present in dirent. Essentially, it described a better os.Readdirnames().

For io/fs, retrieving all metadata in ReadDir() is fine -- although FileInfo assumes too many fields, and lacks a way to discern which fields are valid.

For regular storage, os.Readdir() is slow, and os.FileInfo is simplistic, but you need them as-is for os.DirFS.

So we need a faster, richer os API, which I explore in #41265. But that's not a prerequisite for io/fs.

@tv42
Copy link

@tv42 tv42 commented Sep 14, 2020

@bcmills I'll argue FileInfo.Size returning -1 on rainy days is a violation of Go1 compatibility promise; -1 is not a "length in bytes", for a regular file seen by ReadDir. Same for Mode having an "error bit". Also, what would IsDir do, it returns a bool.

I'm, again, surprised these ideas are considered worth considering! Just add a way for me to tell the system, up front, that I don't care about the FileInfo data; call that ReadDirEntries or an argument for ReadDir or whatever, let the caller explicitly state intent. That removes all this ambiguity and "rainy day syndrome".

@networkimprov
Copy link

@networkimprov networkimprov commented Sep 15, 2020

Russ has posted to say (IIUC) that a new API to solve this problem is off the table: #41265 (comment)

Disappointing. I guess enough folks haven't publicly flagged the problem to date, alas.

@rsc
Copy link
Contributor Author

@rsc rsc commented Sep 15, 2020

I have code that stores os.FileInfo for later comparison via os.SameFile. Similar purpose to "tail" - it's used to detect if the file has been replaced at a later time. I agree this is an uncommon usage.

For what it's worth, I believe that SameFile might actually work fine with the lazy API, since one thing you do get from the directory read on Unix systems is the inode number. That said, it doesn't invalidate the general hypothetical, but I still have yet to see a concrete case that would break.

@rsc
Copy link
Contributor Author

@rsc rsc commented Sep 15, 2020

@networkimprov

Is 0.36% a representative sample size?

That's not the right metric for sampling: the total number of samples is what matters. Consider these two cases:

  1. Suppose you have a bucket with 1,000 balls and you take 1 out at random and find that it's not red.
    It would definitely be a mistake to say "there are probably not many red balls in this bucket".
    If as many as 10% of the balls were red, drawing one at random and finding a non-red has a 90% chance of happening.

  2. Suppose you have a bucket with 1,000,000 balls and you take out 1,000 at random and find that none are red.
    At that point you really can say that "there are probably not many red balls in this bucket".
    If even 1% of the balls were red, drawing 1,000 at random and finding no reds has a 0.004% chance of happening.
    Maybe you got very lucky with the random draws (maybe start playing the lottery!);
    it's much more likely that there just aren't that many red balls in the bucket.

In both cases, you only looked at 0.1% of the balls, but the conclusions you can draw are very different, because what matters is the number of samples, not the overall fraction of the total you were drawing from.

All this is assuming random sampling. If you are taking balls from the top of the bucket but all the red ones are on the bottom, none of this applies. That's why I had a program pick random samples for me.

@networkimprov
Copy link

@networkimprov networkimprov commented Sep 16, 2020

Did the analysis consider cases that simply return the results of os.Readdir() to the caller, which may invoke FileInfo methods?

For example, MacOS directories may include ".DS_Store" files created by the OS. In apps that support MacOS, I wrap os.Readdir() and return its results -- after eliding the unexpected file in darwin-specific code.

Thanks for the stats tutorial :-)

@rsc
Copy link
Contributor Author

@rsc rsc commented Sep 16, 2020

Did the analysis consider cases that simply return the results of os.Readdir() to the caller, which may invoke FileInfo methods?

Yes, in all the sampled case I read enough of the code to find out what was happening with the results. That included looking at callers as needed.

@rsc
Copy link
Contributor Author

@rsc rsc commented Sep 16, 2020

For the record, I understand that everyone's intuition, including mine, is that this seems like a problematic change.
That's why I started looking for data.
And the data seems to say the opposite: random sampling hasn't found even a single program that would break, while it did identify many programs—the vast majority, in fact—that would run faster.

@jimmyfrasche
Copy link
Member

@jimmyfrasche jimmyfrasche commented Sep 16, 2020

My major concern is old programs getting the new sentinel values used to mark errors and not processing them correctly.

I have no issue with a new api that makes these things explicit.

@tv42
Copy link

@tv42 tv42 commented Sep 16, 2020

This continues to mystify me.

#41188 (comment)

They all looked at the results immediately, meaning they would not have a chance to observe file system modifications sequenced between the ReadDir and the method calls.

Surely we all know there's no such thing as "immediately", and that the filesystem may change between ReadDir doing getdents and lstat. The existing stdlib code already handles that, even when it's "immediate":

go/src/os/file_unix.go

Lines 360 to 364 in 6b42016

if IsNotExist(lerr) {
// File disappeared between readdir + stat.
// Just treat it as if it didn't exist.
continue
}

In Russ's design, that race window is significantly larger, and simply cannot be handled right. Why are we discussing this?!

I apologize up front if the following comes across as too personal, but I feel like I have to say this.
Please don't be stubborn just because something was your idea.

@mpx
Copy link
Contributor

@mpx mpx commented Sep 17, 2020

I think this proposal trades correctness & simplicity for performance, as well as breaking the Go1 compatiblity promise.

In the past I've really appreciated that Go hasn't made this tradeoff and has found other ways of improving performance. This kind of behaviour is better suited to unsafe or other similarly out of the way places where people are expected to understand the risks.

All existing programs have (implicitly or explicitly) been written with the assumption that FileInfo methods cannot fail, and that FileInfo contains data that was accurate during the Readdir call. All failures are explicitly handled via Readdir.

random sampling hasn't found even a single program that would break

Is this assuming that Stat cannot fail after Readdir? If so, it's very likely programs will misbehave. Eg, race with unlink, race with file replacement, corruption, network/fuse failures,.. these failures are uncommon but they will definitely occur and they should be handled gracefully. As a trivial example, a "du" implementation might display a bogus size, or subtract 1 from the total size.

In future, programs would need to check the Readdir error and the sentinel values from some of the FileInfo methods to be correct.

In practice, many developers will not check the results from Size, Mode, ModTime since it mostly works, is easier, and they don't expect failures (mismatch with mental model). When the deferred stat fails the resulting misbehaviour may be hard to recognise or understand - especially since there is no concrete error. This would be a poor API prone to incorrect use and bugs.

Using it correctly would be extra hassle:

    sz := fi.Size()
    if sz < 0 {
        // Some unknown error occurred, retry the operation to obtain the error or obtain a valid FileInfo.
        fi, err = os.Lstat(filepath.Join(f.Name(), fi.Name()))
        if err != nil {
           // Error handling
        }
    }
    // Use size.

APIs that guarantee correctness without needing explicit error handling are extremely useful (eg, FileInfo). It would be disappointing to lose this property.

I want to use io/fs and embed as soon as is practical - but I wouldn't want to compromise correctness or ease of use. If os.File cannot be changed, then I'd strongly prefer we accept the current performance over deferred stat.

If adding a method to os.File was acceptible we could achieve the same performance objective by:

  • Creating a Dirent interface as a subset of FileInfo (Name, IsDir methods).
  • Create a os.File.ReadDirent(n int) ([]Dirent, error) method
  • Optionally de-emphasizing Readdir and Readdirnames in the io/fs proposal in favour of ReadDirent

This might be less controversial than deferring stat?

@tv42
Copy link

@tv42 tv42 commented Sep 17, 2020

@mpx I like what you said, but: Dirent.IsDir is impossible. Linux dirent d_type generally communicates more than just IsDir (DT_LNK etc), but the direntry type can be unknown. There's no way to write an func (Dirent) IsDir() bool that isn't forced to lie; the API needs to have a slightly different shape.

https://www.man7.org/linux/man-pages/man3/readdir.3.html

@diamondburned
Copy link

@diamondburned diamondburned commented Sep 17, 2020

but the direntry type can be unknown

For the sake of completeness, couldn't there be a fallback to lstat when DT_UNKNOWN is seen? Given my API, one could implement like so:

func (d *dirEnt) IsDir() bool {
    if d.stat != nil {
        return d.stat.isDir
    }

    return d.isDir
}

Although this no longer completely satisfies the goals of this issue (i.e. having a ReadDir API that would not call lstat many times), I would argue that this is a simpler API than some of the other verbose ones while still covering most of the use-cases.

I can see another problem with this API though: Lstat() now may or may not return a newer FileInfo, which is the same issue as the lazy-loaded FileInfo API.

@jimmyfrasche
Copy link
Member

@jimmyfrasche jimmyfrasche commented Sep 17, 2020

While I am generally in favor, discussing the specifics of a new api is premature when it hasn't been decided if it's necessary yet.

@rsc
Copy link
Contributor Author

@rsc rsc commented Sep 18, 2020

I certainly hear you all about the change being strange. I agree it's a bit odd.
For what it's worth, I don't even think this is my idea. Brad says this has been proposed before.

The reason I'm trying hard to find a path forward here is that I'm trying to balance a few different concerns:

  • filepath.Walk can be made much faster by using the IsDir bits that are present in Unix directory reads without the full stat info. Tools like goimports substitute their own implementation doing just that. They get speed by giving up generality. That's fine since the case is actually specific anyway (it's always an OS file system underneath).
  • If the io/fs proposal (#41190) is adopted with no changes here, it will be impossible to achieve the same speed in a general Walk for FS. A few people raised that concern and it seems worth addressing.
  • Changes to the Walk API to help speed, such as something along the lines of https://pkg.go.dev/github.com/kr/fs#Walk, are worth considering but are impossible without a faster general underlying directory read.
  • If we're going to address that problem, now is the time.
  • We want to keep APIs simple (but as always not too simple).

Allowing lazy Readdir elegantly solves almost all of this, at the cost of the lazy behavior that seems from direct code inspection not to matter as much as you'd initially think. If we don't fix this problem now, we will be stuck with programs like goimports having their own custom filepath.Walk, and worse there will be no way to write a custom filepath.Walk for the general FS implementations.

If there's not consensus on the lazy Readdir - as there does not seem to be - then it still seems worth trying to fix the problem another way. Whatever we do, it needs to be a limited change: a simple, Go-like API. I expanded @mpx's suggestion above into a separate proposal, #41467. Please see the description and comment over there. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.