Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os: add ReadDir method for lightweight directory reading #41467

Closed
rsc opened this issue Sep 18, 2020 · 114 comments
Closed

os: add ReadDir method for lightweight directory reading #41467

rsc opened this issue Sep 18, 2020 · 114 comments

Comments

@rsc
Copy link
Contributor

@rsc rsc commented Sep 18, 2020

os.File provides two ways to read a directory: Readdirnames returns a list of the names of the directory entries, and Readdir returns the names along with stat information.

On Plan 9 and Windows, Readdir can be implemented with only a directory read - the directory read operation provides the full stat information.

But many Go users use Unix systems.

On most Unix systems, the directory read does not provide full stat information. So the implementation of Readdir reads the names from the directory and then calls Lstat for each file. This is fairly expensive.

Much of the time, such as in the implementation of file system walking, the only information the caller of Readdir really needs is the name and whether the name denotes a directory. On most Unix systems, that single bit of information—is this name a directory?—is available from the plain directory read, without an additional stat. If the caller is only using that bit, the extra Lstat calls are unnecessary and slow. (Goimports, for example, has its own directory walker to avoid this cost.) In fact, a survey of existing Go code found that only about 10% of uses of ReadDir actually need more than names and is-directory bits.

It appears that a third way to read directories should be added, to let all this code be written more efficiently. Expanding on a suggestion by @mpx, I propose to add:

// ReadDir reads the contents of the directory associated with the file f
// and returns a slice of DirEntry values in directory order.
// Subsequent calls on the same file will yield later DirEntry records in the directory.
//
// If n > 0, ReadDir returns at most n DirEntry records.
// In this case, if ReadDir returns an empty slice, it will return an error explaining why.
// At the end of a directory, the error is io.EOF.
//
// If n <= 0, ReadDir returns all the DirEntry records remaining in the directory.
// When it succeeds, it returns a nil error (not io.EOF).
func (f *File) ReadDir(n int) ([]DirEntry, error) 

// A DirEntry is an entry read from a directory (using the ReadDir method).
type DirEntry interface {
	// Name returns the name of the file (or subdirectory) described by the entry.
	// This name is only the final element of the path, not the entire path.
	// For example, Name would return "hello.go" not "/home/gopher/hello.go".
	Name() string
	
	// IsDir reports whether the entry describes a subdirectory.
	IsDir() bool
	
	// Info returns the FileInfo for the file or subdirectory described by the entry.
	// The returned FileInfo may be from the time of the original directory read
	// or from the time of the call to Info. If the file has been removed or renamed
	// since the directory read, Info may return an error satisfying errors.Is(err, ErrNotExist).
	// If the entry denotes a symbolic link, Info reports the information about the link itself,
	// not the link's target.
	Info() (FileInfo, error)
}

The FS proposal would then adopt this ReadDir and ignore Readdir entirely.

In #41188 I wrote:

Various people have proposed adding a third directory reading option of one form or another, to get names and IsDir bits. This would certainly address the slow directory walk issue on Unix systems, but it seems like overfitting to Unix.

I still believe that, but the survey convinced me that nearly all existing Readdir uses fall into this category, so it's not quite so bad to provide an optimized path for Unix systems. The DirEntry.Info method specification above allows both the eager info loading of Plan 9/Windows and the lazy loading needed on Unix. In contrast to #41188, the laziness is explicitly allowed from the beginning, and failures of the lazy loading can be reported in the error result.

Thoughts?

Update: A few clarifications to common questions:

  • Info returns the result of Lstat on systems where symbolic links happen.
  • On file systems that do not return even IsDir bits in the directory listing, ReadDir will have to do the full lstat loop that Readdir does today. That seems OK: slow file systems are slow.
  • IsDir returns false for symlinks to directories, since Lstat's FileInfo's IsDir does too.
  • ReadDir vs Readdir corrects the spelling of the name but does not differ only in case. It also differs in signature, so mistakes will not silently misbehave; they will loudly miscompile. In any event, the motivation here is to set a good, short name for the FS proposal, and for a method many people will need to implement, ReadDir is better than the longer ReadDirEntries.

Update 2: A few changes were made along the way to acceptnce. See #41467 (comment) for the final version.

@benhoyt
Copy link
Contributor

@benhoyt benhoyt commented Sep 18, 2020

I like this a lot (it's similar to Python's os.scandir, but simpler). A few questions:

  1. On Unix systems, the d_type field can be DT_UNKNOWN if the file system doesn't support it. Docs says "Currently, only some file systems (among them: Btrfs, ext2, ext3, and ext4) have full support for returning the file type in d_type. All applications must properly handle a return of DT_UNKNOWN." So we need to support this case. Either ReadDir could call Lstat at the time of the read if it gets any d_type==DT_UNKNOWN values, or we need to change the signature to IsDir() (bool, error) to reflect the fact that it too may call Lstat realtime -- I vote keeping the signature as Russ has it and having ReadDir call Lstat in that case (should be rare, so falling back to slower but correct seems fine).
  2. The name ReadDir seems to similar to the existing Readdir -- differentiating just on the case of one letter seems like asking for trouble. What about ReadEntries?
  3. Presumably IsDir and Info return info about the entry itself, without following symlinks? In other words, it's like Lstat and not Stat? Does this need mentioning explicitly? Or maybe it's clear from "the entry"?
  4. What does IsDir return in the case of a symbolic link (to a directory)? Presumably false given the above.
@mpx
Copy link
Contributor

@mpx mpx commented Sep 18, 2020

This looks great.

I felt that there may be some advantage to making DirEntry a subset of FileInfo interface, hence I didn't include the Info method and started with the smaller interface. However, I can't convince myself there there is any tangible disadvantage to adding Info. There certainly needs to be a way of "upgrading" a DirEntry into a FileInfo.

One comment from #41188 regarding this suggestion was that d_type is not always available for IsDir. In those cases extra work needs to be done to provide that information. Name and IsDir are the minimum required to support implementing walk. Upgrading to FileInfo would be free on those systems, so it's no more expensive to do it this way.

It might be good to rename ReadDir due to the unfortunate similarity with Readdir. I understand it works in practice, but it might cause some confusion ("ReadDir" with the capital "D" or lowercase "d"?).

From my point of view, I think the ReadDir effectively replaces the need for Readdir and Readdirnames. If it existed from Go1 I wouldn't see the need for adding the other calls. Obviously the existing methods shouldn't be removed, but they may not be needed in io/fs.

I have a number tools that will benefit significantly from this proposal.

@mpx
Copy link
Contributor

@mpx mpx commented Sep 18, 2020

@benhoyt IsDir should return bool. As I mentioned on #41188 there is an advantage to providing interfaces that are guaranteed to work. Systems that cannot provide "IsDir" will need to do extra work up front. However, it will be effectively free for them to upgrade to FileInfo, so it should be no slower to "walk" a tree.

Good point regarding IsDir and symlinks. I think it needs to return false. Following a symlink will necessarily require another Stat to determine whether the target is a directory. This is probably worth documenting.

@richardwilkes
Copy link
Contributor

@richardwilkes richardwilkes commented Sep 18, 2020

Presumably IsDir and Info return info about the entry itself, without following symlinks? In other words, it's like Lstat and not Stat? Does this need mentioning explicitly? Or maybe it's clear from "the entry"?

Regardless of which way this ends up working (Lstat vs Stat), it should be mentioned explicitly in the comments/docs for the method, as what is "obvious" to one person will not be to another.

@diamondburned
Copy link

@diamondburned diamondburned commented Sep 18, 2020

I felt that there may be some advantage to making DirEntry a subset of FileInfo interface, hence I didn't include the Info method and started with the smaller interface. However, I can't convince myself there there is any tangible disadvantage to adding Info. There certainly needs to be a way of "upgrading" a DirEntry into a FileInfo.

I personally don't see a point in this. Implementations could simply just return an internal FileInfo or itself when Info() is called.

@rsc
Copy link
Contributor Author

@rsc rsc commented Sep 18, 2020

@benhoyt, added answers above but (1) I agree with you; (2) ReadEntries is the wrong long-term name, and in the long term Readdir will just be deprecated (but not removed) and fade away; (3) yes, Lstat; (4) yes, false.

@networkimprov
Copy link

@networkimprov networkimprov commented Sep 18, 2020

Russ, glad to see a new API proposed, thanks. Some Q's

This entails a per-item lstat() when the type is absent in native dirent. You don't want that if you don't need to check DirEntry.IsDir() for every result. Can it be optional?

Does DirEntry.Info() cache its result if it had to lstat(), or if ReadDir() had to?

Could we flush the FileInfo cache, so a subsequent .Info() reloads it?

Wouldn't the different behavior of DirEntry.Info() on unix & Windows cause trouble in cross-platform apps? Such apps would call os.File.Readdir() when they need FileInfo for all results, and xfs.File.ReadDir() otherwise. The mix of those two could be confusing.

There is no way to get the Windows fileId (analagous to inode) via FileInfo. Could a way be added to DirEntry?

Unix dirents include the inode -- needed for tree replication. Could DirEntry provide that?

For the latter two, I think you'd return a type FileId interface { ... }

EDIT: Background... I'm building a desktop app for Windows, MacOS, and Linux -- and a server app for Linux -- that make heavy use of the filesystem. Go has thrown me a fair number of curve balls, and I expect more :-/

@diamondburned
Copy link

@diamondburned diamondburned commented Sep 18, 2020

This entails a per-item lstat() when the type is absent in native dirent. You don't want that if you don't need to check DirEntry.IsDir() for every result. Can it be optional?

As far as I know, no, this cannot be optional. Perhaps it could be if a more optimized Readdirnames could be made, but personally, I think that should stay there.

Does DirEntry.Info() cache its result if it had to lstat(), or if ReadDir() had to?

Could we flush the FileInfo cache, so a subsequent .Info() reloads it?

Wouldn't the different behavior of DirEntry.Info() on unix & Windows cause trouble in cross-platform apps? Such apps would call os.File.Readdir() when they need FileInfo for all results, and xfs.File.ReadDir() otherwise. The mix of those two could be confusing.

If we're going with my idea of returning an internal FileInfo, then I can see the behavior differing a little bit: on Windows, a FileInfo can be returned, but in Linux, a new FileInfo would be obtained on every Info() call.

In my honest opinion, this behavior is fine. In fact, it would be preferable if FileInfo of the Linux implementation is cached. The reason being that if the caller wants the newest information, then they should call Lstat or Stat explicitly.

There is no way to get the Windows fileId (analagous to inode) via FileInfo. Could a way be added to DirEntry?

Unix dirents include the inode -- needed for tree replication. Could DirEntry provide that?

Worst case scenario, the DirEnt interface could perhaps be overloaded to accommodate for the above flaws, but I feel like that would be overdesigning and overcomplicating it.

@mpx
Copy link
Contributor

@mpx mpx commented Sep 18, 2020

Could we flush the FileInfo cache, so a subsequent .Info() reloads it?

Any caller needing to guarantee a fresh stat should call Lstat/Stat themselves.

There is no way to get the Windows fileId (analagous to inode) via FileInfo. Could a way be added to DirEntry?

I've wanted to retrieve unique file identifiers for both Linux (inode) and Windows (fileId) to improve performance for several applications, so I'd appreciate a solution too. However, I think DirEntry should be kept simple. Most callers don't need a unique identifier, and it would unnecessarily complicate many implementations which don't use/need it. I think this could be solved via a separate proposal after this "DirEntry" proposal is settled. Eg, perhaps it could be a provided via an optional interface FileID() uint64.

@rsc
Copy link
Contributor Author

@rsc rsc commented Sep 18, 2020

I agree with what @diamondburned and @mpx said but just to reply directly as well.

This entails a per-item lstat() when the type is absent in native dirent. You don't want that if you don't need to check DirEntry.IsDir() for every result. Can it be optional?

In practice, when is that type absent? I'm not too worried if, say, VFAT file systems are slower to access.

Does DirEntry.Info() cache its result if it had to lstat(), or if ReadDir() had to?

The doc says the Info is either from the time of the ReadDir or the time of the Info call. So a stat during ReadDir can be cached, but not a stat during an earlier Info.

Could we flush the FileInfo cache, so a subsequent .Info() reloads it?

Adding cache state & manipulation to the API is needlessly complex. Lstat still exists.

Wouldn't the different behavior of DirEntry.Info() on unix & Windows cause trouble in cross-platform apps?

There is no magic wand here. Unix and Windows are different. They will always be different.

The path separator alone causes trouble in cross-platform apps. The answer is to use APIs like filepath.Clean and filepath.Join as appropriate. If you don't do that, you have trouble. It's not our job to make trouble impossible, only avoidable.

The docs on ReadDir are very clear about what you can rely on and what you cannot. If you rely on more than that, again, you will have trouble, same as hard-coding use of / or \ in your program.

Such apps would call os.File.Readdir() when they need FileInfo for all results, and xfs.File.ReadDir() otherwise. The mix of those two could be confusing.

Pretty much all apps should probably move to calling ReadDir or, if it matters, Lstat/Stat directly.

What's confusing is having two different APIs, but we clearly need a new one, and can't remove the old one. So be it.

There is no way to get the Windows fileId (analagous to inode) via FileInfo. Could a way be added to DirEntry?
Unix dirents include the inode -- needed for tree replication. Could DirEntry provide that?
For the latter two, I think you'd return a type FileId interface { ... }

[Note that Go would spell it FileID not FileId (the file does not have a mind).]

The concept seems too special-purpose for a general interface, and a bit difficult to use correctly.
The problem is that Windows file IDs and Unix inode numbers are not really identifiers: they only identify a file within a particular file system. Another file system can have a file with the same file ID/inode number. So to use them properly you need to combine them with some kind of identifier for the file system itself (like fsid_t or dev_t on Unix). It gets messy fast.

Also, if you are writing "tree replication", then you are already stat'ing all the files to get the other metadata, and you're already writing very OS-specific code to preserve all the OS-specific attributes. That same code can easily grab the info you need as far as inode number and file system identifier.

It's not our job to union together all the possible APIs on all the possible systems. Our job is to find a simple API that is enough for the vast majority of Go programs, with an option for the rest to get at what they need (FileInfo.Sys in this case).

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Sep 18, 2020

What about Readdirnames?

If we expect all users of Readdirnames to switch to ReadDir over time, then I'm not quite convinced that IsDir should not return (bool, error). Neither AIX nor Solaris record the type of the file in the directory entry. Even on GNU/Linux there are various obscure file systems that do not record the dirent type.

The downside of IsDir returning (bool, error) seems fairly low to me.

@networkimprov
Copy link

@networkimprov networkimprov commented Sep 18, 2020

Ian, wouldn't the error you're suggesting be returned by ReadDir() when it tries lstat() after seeing entries without types?

Or are you suggesting that ReadDir() should never lstat()?

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Sep 18, 2020

I'm suggesting that ReadDir should never call lstat, just as the current Readdirnames never calls lstat.

@jimmyfrasche
Copy link
Member

@jimmyfrasche jimmyfrasche commented Sep 18, 2020

Using the same name with different returns would mean a single type couldn't be both a DirEntry and a FileInfo

@diamondburned
Copy link

@diamondburned diamondburned commented Sep 18, 2020

If we expect all users of Readdirnames to switch to ReadDir over time, then I'm not quite convinced that IsDir should not return (bool, error).

I don't think this should be the case, which is why I said "that should stay there." I think Readdirnames should use its own dirEnts to not require calling lstat or stat, which should allow people who don't need IsDir to skip having to call a few extra stats.

I also think that IsDir() (bool, error) would be a bit too verbose.

@networkimprov
Copy link

@networkimprov networkimprov commented Sep 19, 2020

[ reposted after edits for clarity ]

It will cause confusion and bugs if .Info() & .IsDir() return new data every time on some filesystems but cached data on others. We performed a read, so we'd expect them to return whatever was retrieved by ReadDir() (possibly nothing). To get new data there's Lstat() as you said.

You could attain either best performance or cross-filesystem consistency if you could ask ReadDir() to:

  • satisfy .Name() -- never lstat (Ian's suggestion)
  • satisfy .Name(), .IsDir() -- possibly lstat (Russ' suggestion)
  • satisfy .Name(), .IsDir(), .Info() -- always lstat except on Windows (os.File.Readdir)

Let's not sacrifice sanity for the sake of simplicity.

if you are writing "tree replication", then you are already stat'ing all the files to get the other metadata

Actually, you don't need to lstat() if the inode matches one you've already seen.

an option for the rest to get at what they need (FileInfo.Sys in this case).

To clarify, there is no way to get the Windows fileID from FileInfo.Sys. You have to patch the stdlib at present; see
https://groups.google.com/g/golang-dev/c/raE01Fa2Kmo/m/3Ah9JslnBgAJ

BTW Windows accepts / path separators; you only need Windows-native syntax in rare cases :-)

@rsc
Copy link
Contributor Author

@rsc rsc commented Sep 22, 2020

It will cause confusion and bugs if .Info() & .IsDir() return new data every time on some filesystems but cached data on others.

Sorry, but that's the design constraint here: it must be possible to either return info learned during ReadDir or info learned during Info. Otherwise you are overfitting to either Unix or non-Unix and penalizing the other.

@rsc
Copy link
Contributor Author

@rsc rsc commented Sep 22, 2020

Neither AIX nor Solaris record the type of the file in the directory entry.

Thanks for pointing this out. That's unfortunate but good to know.

@jimmyfrasche
Copy link
Member

@jimmyfrasche jimmyfrasche commented Sep 22, 2020

Sorry, but that's the design constraint here: it must be possible to either return info learned during ReadDir or info learned during Info. Otherwise you are overfitting to either Unix or non-Unix and penalizing the other.

I think the argument is about the second call to Info, not the first:

de.Info() // maybe cached depending on os/fs
de.Info() // always cached in all cases for this and all subsequent calls
@networkimprov
Copy link

@networkimprov networkimprov commented Sep 22, 2020

os.File.Readdirnames() yields predictable results & performance.
os.File.Readdir() yields predictable results, but unpredictable performance.

The proposed File.ReadDir() yields unpredictable results & performance. It won't supplant os.File.Readdir or os.File.Readdirnames -- we'll still need both.

We can fix that with a simple adjustment to the API:

func (f *File) ReadDir(k deKind, n int) ([]DirEntry, error) // k specifies cached results

type deKind int
const (
   KindName deKind = iota  // cache Name; possibly IsDir & Info
   KindIsdir               // cache Name & IsDir; possibly Info
   KindInfo                // cache all
)

type DirEntry interface {   // methods do not lstat()
   Name() string
   IsDir() (bool, error)    // return error if dirent type not cached
   Info() (FileInfo, error) // return error if FileInfo not cached
}

I think the argument is about the second call to Info, not the first:

The second .Info() call is a problem, but so is the first if it occurs after the directory (or any parent) is moved or renamed.

.
cc @tv42 @israel-lugo

@diamondburned
Copy link

@diamondburned diamondburned commented Sep 22, 2020

A new enum argument is too verbose for something that simple, in my opinion. Furthermore, needing both Readdir and Readdirnames is fine, or perhaps, this could be ReadDirEnt.

Why should ReadDirEnt replace Readdir and Readdirnames anyway? Readdir is for when you need the entire FileInfo, and Readdirnames is for when you only need the names of the directory. I see no reasons for ReadDirEnt to replace both of them other than trying to save a few lines of code in the implementation of those two functions.

I would also like to point out again that having IsDir() return an error is really verbose. I would much prefer an lstat fallback over having to handle errors from IsDir(). Not to mention, having that function call lstat later and then cache only makes it more unpredictable.

@networkimprov
Copy link

@networkimprov networkimprov commented Sep 22, 2020

After f.ReadDir(fs.KindIsdir, n) you don't have to check the .IsDir() error.
After f.ReadDir(fs.KindInfo, n) you don't have to check the .Info() or .IsDir() errors.

There is no caching by DirEntry methods. My amendment entails zero lstats by DirEntry methods.

Note: the proposed File.ReadDir would be the only directory API in io/fs.

EDIT:
Predictable = produce the same results for a raceless sequence of events on all filesystems.
My amendment provides that for all cases. (The following post is incorrect.)

@diamondburned
Copy link

@diamondburned diamondburned commented Sep 22, 2020

My amendment entails zero lstats by DirEntry methods.

IsDir() would need an lstat to get the boolean though. The proposal only ensures predictable behavior for some few use-cases that could be done with the old Readdir.

Now that I think about it, I think trying to guarantee a consistent behavior with this API is almost moot. There isn't a global file system lock to be acquired, so changes could still happen while Readdir is looping. I don't think there's a point in ensuring predictable behavior if we cannot fully get rid of it, since the cost of having a more verbose API is worse, in my opinion.

@networkimprov
Copy link

@networkimprov networkimprov commented Sep 23, 2020

I amended my previous comment to clarify.

@mpx
Copy link
Contributor

@mpx mpx commented Sep 23, 2020

Allowing/requiring a DirEntry to perform lstat on demand for IsDir would result in a poor API:

  • It may fail, and would require require an error return which complicates using the API
  • IsDir would have a different signature to FileInfo.IsDir (somewhat confusing). It will be desirable for some io/fs implementations to use the same type for DirEntry and FileInfo - we shouldn't prevent that. This will likely be common in many simpler implementations.
  • Implementations might choose caching the IsDir lstat response (more complicated non-immutable DirEntry), or discarding the response and causing duplicate lstat for callers via Info. This goes against the reason for this interface in the first place.

We should not require complex caching/reuse logic to be implemented for DirEntry. I'd strongly prefer encouraging immutable implementations for DirEntry: they are easier to implement correctly and harder to misuse inadvertantly. ReadDir should return an immutable value with all the details needed to satisfy the interface (IsDir).

How common is Readdirnames in performance code? I suspect there will be less overall performance impact since most platforms provide enough details to support walking a tree within their dirents, and that the current walk performance is a larger problem for most users. If so, keeping IsDir() bool in the DirEntry interface is probably still a better outcome overall.

@benhoyt
Copy link
Contributor

@benhoyt benhoyt commented Oct 11, 2020

FWIW, I'm fine with this proposal now that we have Type() -- let's not go round and round on this again. It's the same with the Python version -- it's implementation (and with DT_UNKNOWN, filesystem) dependent whether the likes of IsDir will call stat. But that's okay. You need that information, so it has to be called. The "internal lstat" is not transparent in the Python version, and that hasn't been a problem.

@networkimprov
Copy link

@networkimprov networkimprov commented Oct 11, 2020

Ben, it's not the same as the Python version; .IsDir() never calls lstat because f.ReadDir() does when it sees DT_UNKNOWN. That is baggage in the scenario @israel-lugo described.

@benhoyt
Copy link
Contributor

@benhoyt benhoyt commented Oct 11, 2020

@networkimprov Ah yes, you're right, sorry. Either way, I don't think it matters.

@rsc
Copy link
Contributor Author

@rsc rsc commented Oct 12, 2020

@israel-lugo, the rationale for DT_UNKNOWN handling is in #41467 (comment).

Part of the rationale for making ReadDir do the lstat calls is that (1) Readdir already does and (2) we have yet to identify a common case where the lstat is needed.

Older systems - notably AIX and Solaris - do not include a type byte in the dirent structure at all. Those will need lstat.

On other systems, it is technically possible to get type DT_UNKNOWN, but does it happen in any common cases?
If so, what are they? What operating systems, what file systems, and what additional conditions? Thanks.

@gopherbot
Copy link

@gopherbot gopherbot commented Oct 12, 2020

Change https://golang.org/cl/261540 mentions this issue: os: add File.ReadDir method and DirEntry type

@networkimprov
Copy link

@networkimprov networkimprov commented Oct 12, 2020

I've read that USB flash drive and CD-ROM filesystems often yield DT_UNKNOWN.

@rsc
Copy link
Contributor Author

@rsc rsc commented Oct 12, 2020

I've read that USB flash drive and CD-ROM filesystems often yield DT_UNKNOWN.

Assuming USB flash drive means FAT (VFAT, etc) and CD-ROM means ISO9660, both of those file systems lay out the file type bits right next to the file name in the physical storage. If a driver has loaded the name, it has the file type bits just sitting there in memory waiting to be used. If it insists on using DT_UNKNOWN instead, then the driver is written inefficiently. I don't believe we should make Go's APIs more complex just because inefficient drivers exist.

Even so, I looked into this a bit. Linux, FreeBSD, and OpenBSD all appear (from driver inspection) to set a proper type in their VFAT file system implementations and to leave DT_UNKNOWN in their ISO9660 implementations. Again, there's no reason they couldn't do the right thing for ISO9660 if users needed it. I can't find the source code for macOS's file system drivers.

@israel-lugo
Copy link

@israel-lugo israel-lugo commented Oct 12, 2020

@networkimprov
Copy link

@networkimprov networkimprov commented Oct 12, 2020

Isn't ISO9660 also a common disk image format?

@rsc
Copy link
Contributor Author

@rsc rsc commented Oct 13, 2020

@networkimprov, yes basically any .iso CD image you download to install an OS in a VM is an ISO9660 image.
But since it is a read-only (well, append-only) format, it's pretty rare to use for much other than installers.
Preserving a simpler API for Go outweighs that use case, especially since the kernels can easily fix any actual
inefficiency with a tiny patch to their drivers.

@hirochachacha
Copy link
Contributor

@hirochachacha hirochachacha commented Oct 13, 2020

@rsc What about ioutil.ReadDir? Are there any subsequent proposals?
I think this API is great and it can be a drop-in replacement for (*File).Readdir.
However, according to your survey result:

$ grep Readdir readdirs.txt | wc -l
15502
$ grep ioutil.ReadDir readdirs.txt | wc -l
47424

Most people prefer to use ioutil.ReadDir. They don't like boilerplate.

@hirochachacha
Copy link
Contributor

@hirochachacha hirochachacha commented Oct 13, 2020

Since this is for performance optimization, to postpone ioutil changes to go2 may be fine though.

@diamondburned
Copy link

@diamondburned diamondburned commented Oct 13, 2020

What about ioutil.ReadDir?

I don't think ioutil.ReadDir could benefit from this proposal; the reason being the API needs os.FileInfo returns, which would require a full Lstat call. As far as I can tell, the proposal is heading towards DirEntry, which is a completely new interface in a completely new API.

This is a bit on a tangent, but is Go 2 allowed to break existing code?

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Oct 13, 2020

This would not be a Go 2 issue as such, it be a os/v2 issue, so code that continues to use os would be unchanged (or swap os with io/ioutil if you like). But an os/v2 (or io/ioutil/v2) package is unplanned and unlikely.

@hirochachacha
Copy link
Contributor

@hirochachacha hirochachacha commented Oct 13, 2020

I wouldn't expect that it improve ioutil.ReadDir directly. This API may supersede (*File).Readdir. But what about ioutil.ReadDir? Are there any plans to introduce new API which supersede ioutil.ReadDir? The name ReadDir conflict with existing ioutil.ReadDir, so I thought there might be a plan to replace existing signature by new one.

This proposal is justified by the fact 90% code doesn't use more than name and isDir, but 75% code is using ioutil.ReadDir.
Providing a solution for 25% code is good enough? I just wondered. Sorry for the interruption.

@hirochachacha
Copy link
Contributor

@hirochachacha hirochachacha commented Oct 13, 2020

You mean, io/fs might want to use the new signature which is introduced here, thus we can use something like fs.ReadDir(os.DefaultFS, "dir") instead of ioutil.ReadDir("dir") in the future? Sounds good. Thank you for sharing.

@rsc
Copy link
Contributor Author

@rsc rsc commented Oct 14, 2020

If this proposal is accepted we can worry about ioutil next. The most likely answer is to put the helper ReadDir(dir string) ([]DirEntry, error) into os itself (part of completely deprecating ioutil).

@rsc
Copy link
Contributor Author

@rsc rsc commented Oct 14, 2020

No change in consensus, so accepted.

Update: this is what was accepted:

// ReadDir reads the contents of the directory associated with the file f
// and returns a slice of DirEntry values in directory order.
// Subsequent calls on the same file will yield later DirEntry records in the directory.
//
// If n > 0, ReadDir returns at most n DirEntry records.
// In this case, if ReadDir returns an empty slice, it will return an error explaining why.
// At the end of a directory, the error is io.EOF.
//
// If n <= 0, ReadDir returns all the DirEntry records remaining in the directory.
// When it succeeds, it returns a nil error (not io.EOF).
func (f *File) ReadDir(n int) ([]DirEntry, error) 

// A DirEntry is an entry read from a directory (using the ReadDir method).
type DirEntry interface {
	// Name returns the name of the file (or subdirectory) described by the entry.
	// This name is only the final element of the path, not the entire path.
	// For example, Name would return "hello.go" not "/home/gopher/hello.go".
	Name() string
	
	// IsDir reports whether the entry describes a subdirectory.
	IsDir() bool
	
	// Type returns the type bits for the entry.
	// The type bits are a subset of the usual FileMode bits, those returned by the FileMode.Type method.
	Type() os.FileMode
	
	// Info returns the FileInfo for the file or subdirectory described by the entry.
	// The returned FileInfo may be from the time of the original directory read
	// or from the time of the call to Info. If the file has been removed or renamed
	// since the directory read, Info may return an error satisfying errors.Is(err, ErrNotExist).
	// If the entry denotes a symbolic link, Info reports the information about the link itself,
	// not the link's target.
	Info() (FileInfo, error)
}

// Type returns the type bits (m & ModeType).
func (m FileMode) Type() FileMode { return m & ModeType }
@benhoyt
Copy link
Contributor

@benhoyt benhoyt commented Oct 14, 2020

@rsc I don't know what you normally do for proposals like this, but is it worth updating the description at the top with the final proposal (including Type)? It would make it easier for people to see what the final proposal is at a glance, without wading through 100+ comments.

@rsc
Copy link
Contributor Author

@rsc rsc commented Oct 16, 2020

@benhoyt, I added a note and link to the top comment. Thanks.

@rsc rsc modified the milestones: Proposal, Go1.16 Oct 17, 2020
@rsc rsc changed the title proposal: os: add ReadDir method for lightweight directory reading os: add ReadDir method for lightweight directory reading Oct 17, 2020
@gopherbot gopherbot closed this in a4ede9f Oct 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Proposals
Accepted
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.