-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
path/filepath: Walk is slow compared to 'find' due to extra Stat calls #16399
Comments
The same on Linux. This isn't just about OS X. Actually on Linux it's even more pronounced:
Versus:
|
Walk sorts the file entries, but find doesn't. Not sure how much that affects the profiles though. |
@dgryski, that's a bit of it, but it's looking like the actual problem is that filepath.Walk does a Readdirnames followed by a bunch of Stat calls on each to figure out what's a directory, when the underlying kernel interface supports telling you which are directories in the same call where you read the names, but Go doesn't take advantage of that on Unix platforms. (And |
Actually, I realize now that the I think I'll move away from using |
@bradfitz out of curiosity, is your idea here to rewrite It looks like some OSs may have that information but Go is discarding in favor of |
@vinceprignano, yes, that's what I'm doing now. |
CL https://golang.org/cl/25001 mentions this issue. |
In my observations there were 2 problems:
I wrote an article (in russian, sorry for that) that describes how to achieve performance that is higher than find(1) has: https://habrahabr.ru/post/281382/ (google translate: https://translate.google.ru/translate?sl=ru&tl=en&js=y&prev=_t&hl=ru&ie=UTF-8&u=https%3A%2F%2Fhabrahabr.ru%2Fpost%2F281382%2F&edit-text=&act=url) |
I've run into a similar situation. I created For those specific needs, I ended up creating https://godoc.org/github.com/shurcooL/httpfs/vfsutil#WalkFiles which would pass both the // WalkFiles walks the filesystem rooted at root, calling walkFn for each file or
// directory in the filesystem, including root. In addition to FileInfo, it passes an
// ReadSeeker to walkFn for each file it visits.
func WalkFiles(fs http.FileSystem, root string, walkFn WalkFilesFunc) error { ... }
// WalkFilesFunc is the type of the function called for each file or directory visited by Walk.
// It's like filepath.WalkFunc, except it provides an additional ReadSeeker parameter for file being visited.
type WalkFilesFunc func(path string, info os.FileInfo, rs io.ReadSeeker, err error) error That worked well for my needs, but I like your idea of just not passing |
This brings goimports from 160ms to 100ms on my laptop, and under 50ms on my Linux machine. Using cmd/trace, I noticed that filepath.Walk is inherently slow. See https://golang.org/issue/16399 for details. Instead, this CL introduces a new (private) filepath.Walk implementation, optimized for speed and avoiding unnecessary work. In addition to avoid an Lstat per file, it also reads directories concurrently. The old goimports code did that too, but now that logic is removed from goimports and the code is simplified. This also adds some profiling command line flags to goimports that I found useful. Updates golang/go#16367 (goimports is slow) Updates golang/go#16399 (filepath.Walk is slow) Change-Id: I708d570cbaad3fa9ad75a12054f5a932ee159b84 Reviewed-on: https://go-review.googlesource.com/25001 Reviewed-by: Andrew Gerrand <adg@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
@hirochachacha It sounds like you are describing a bug in the syscall package, a bug that should be fixed in any case. The syscall package provides |
@cherrymui, can you run |
@hirochachacha, please file a separate issue for that. |
I'll remove my comments above and close the opened issue. Again, I'm sorry. |
|
@bradfitz, on MIPS64
Is the failure related? Should I go look into it? |
@cherrymui, those are old tests. I changed them to not rely on real GOROOT contents. Update & try again? |
I did a |
@cherrymui, oh, sorry... there are still those two tests which still use the machine's real $GOROOT. I didn't convert those yet I guess. In any case, you can ignore those errors. They should go away when you sync your $GOROOT. I'll fix those tests. But the real question was whether the Thanks! |
Ok, great. Thanks! |
Not sure what the current plan is to fix this, but given that |
@rasky, that doesn't work, as the |
We already had special cases for 0 and 1. Add 2 and 3 for now too. To be removed if the compiler is improved later (#6714). This halves the number of allocations and total bytes allocated via common filepath.Join calls, improving filepath.Walk performance. Noticed as part of investigating filepath.Walk in #16399. Change-Id: If7b1bb85606d4720f3ebdf8de7b1e12ad165079d Reviewed-on: https://go-review.googlesource.com/25005 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
I've written a few benchmarks:
Here they are: https://github.com/Tim15/golang-parallel-io. They're all slower than Hopefully this helps us figure this out! |
@tim15, we already figured it out over a year ago. See the comment I linked you to 3 days ago. The solution for goimports ended up being https://github.com/golang/tools/blob/master/imports/fastwalk.go but that's not applicable to the standard library. This isn't fixable with the current API. |
@bradfitz Cool, thanks for catching me up! |
@bradfitz Sorry to ask here but is it possible to use the |
@sirwindfield, no, I did not expose its implementation publicly. You can copy/paste it into your own package if you'd like, but it's not something I want to support at this time. |
For people looking for reusing the code into their own projecst, there is now this third-party library that is basically a clone of |
forked the repo and made an update script to clean things up with tags: https://github.com/s12chung/fastwalk |
@bradfitz Sorry for revisiting this old issue, but I have a few questions I was hoping you'd be able to answer. Your fastwalk implementation only provides |
The fastwalk callback gives you an os.FileMode, not an os.FileInfo. |
@bradfitz I guess my question is, is there any reason it couldn't provide an os.FileInfo with no additional overhead? Unless I'm mistaken, it providing os.FileMode was intended to remove the overhead filepath.Walk has with using lstat on each call, guaranteeing a full os.FileInfo, but it looks like your implementation already achieves this but throws the information away and provides only os.FileMode. I'm not suggesting that you change your implementation, but I guess I'm just confused as to why file.ReadDir and filepath.Walk both rely on additional lstat calls, rather than doing something similar to your implementation where a fallback is done if the full information isn't available in a single readdir call. |
@saracen, Size. See #16399 (comment) above. |
@bradfitz Thank you for the clarification, and sorry for adding to the noise there. For people stumbling across this issue needing a fast version that returns a full There's a few people here that are really familiar with the issue, so I'd really appreciate if there's anybody willing to check my solution. |
Years ago I started importing https://github.com/karrick/godirwalk into I am still open to a port of the (For the record, |
I think this issue is fixed by the new
So
|
…cgroup decoder. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
I think this can be closed now that we have the new |
On my Mac laptop (SSD, warm caches),
But with a basic use of
filepath.Walk
instead offind
:It's much slower:
This is the bulk of the
goimports
execution time.goimports
actually does a slightly parallelized walk with goroutines (which helps on NFS filesystems), but it doesn't seem to matter. I'm just trying to get any Go program to be closer in performance tofind
.Any clues?
Speed tracking bug for
goimports
is #16367/cc @josharian @ianlancetaylor
The text was updated successfully, but these errors were encountered: