Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

path/filepath: Walk is slow compared to 'find' due to extra Stat calls #16399

Open
bradfitz opened this issue Jul 17, 2016 · 42 comments

Comments

Projects
None yet
@bradfitz
Copy link
Member

commented Jul 17, 2016

On my Mac laptop (SSD, warm caches),

bradfitz@laptop ~$ time (find src/ | wc -l)
   42408

real    0m0.177s
user    0m0.035s
sys     0m0.144s
bradfitz@laptop ~$ time (find src/ | wc -l)
   42408

real    0m0.178s
user    0m0.036s
sys     0m0.144s
bradfitz@laptop ~$ time (find src/ | wc -l)
   42408

real    0m0.177s
user    0m0.035s
sys     0m0.144s

But with a basic use of filepath.Walk instead of find:

$ cat walk.go 
package main

import (
        "fmt"
        "log"
        "os"
        "path/filepath"
)

func main() {
        err := filepath.Walk("src", func(path string, fi os.FileInfo, err error) error {
                if err != nil {
                        return err
                }
                fmt.Println(path)
                return nil
        })
        if err != nil {
                log.Fatal(err)
        }
}

It's much slower:

bradfitz@laptop ~$ time (./walk  | wc -l)
   42408

real    0m0.447s
user    0m0.123s
sys     0m0.406s
bradfitz@laptop ~$ time (./walk  | wc -l)
   42408

real    0m0.434s
user    0m0.120s
sys     0m0.399s
bradfitz@laptop ~$ time (./walk  | wc -l)
   42408

real    0m0.435s
user    0m0.119s
sys     0m0.398s

This is the bulk of the goimports execution time. goimports actually does a slightly parallelized walk with goroutines (which helps on NFS filesystems), but it doesn't seem to matter. I'm just trying to get any Go program to be closer in performance to find.

Any clues?

Speed tracking bug for goimports is #16367

/cc @josharian @ianlancetaylor

@bradfitz bradfitz added this to the Go1.8Maybe milestone Jul 17, 2016

@bradfitz bradfitz changed the title path/filepath: Walk is slow on OS X path/filepath: Walk is slow Jul 17, 2016

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Jul 17, 2016

The same on Linux. This isn't just about OS X.

Actually on Linux it's even more pronounced:

bradfitz@dev-bradfitz-debian2:~$ time (find src/ | wc )
  97909   98025 5108045

real    0m0.146s
user    0m0.128s
sys     0m0.092s
bradfitz@dev-bradfitz-debian2:~$ time (find src/ | wc )
  97909   98025 5108045

real    0m0.147s
user    0m0.136s
sys     0m0.084s
bradfitz@dev-bradfitz-debian2:~$ time (find src/ | wc )
  97909   98025 5108045

real    0m0.150s
user    0m0.148s
sys     0m0.076s

Versus:

bradfitz@dev-bradfitz-debian2:~$ time (./walk | wc )
  97909   98025 5108044

real    0m0.530s
user    0m0.412s
sys     0m0.388s
bradfitz@dev-bradfitz-debian2:~$ time (./walk | wc )
  97909   98025 5108044

real    0m0.515s
user    0m0.344s
sys     0m0.432s
bradfitz@dev-bradfitz-debian2:~$ time (./walk | wc )
  97909   98025 5108044

real    0m0.528s
user    0m0.312s
sys     0m0.484s
@dgryski

This comment has been minimized.

Copy link
Contributor

commented Jul 17, 2016

Walk sorts the file entries, but find doesn't. Not sure how much that affects the profiles though.

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Jul 17, 2016

@dgryski, that's a bit of it, but it's looking like the actual problem is that filepath.Walk does a Readdirnames followed by a bunch of Stat calls on each to figure out what's a directory, when the underlying kernel interface supports telling you which are directories in the same call where you read the names, but Go doesn't take advantage of that on Unix platforms. (And path/filepath could do that on Windows, which Go's syscall and os package already supports, but it hard-codes the use of Readdirnames)

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Jul 17, 2016

Actually, I realize now that the filepath.Walk (at least with our definition of os.FileInfo) is inherently slow, as its API guarantees you get a complete FileInfo with each call, which requires us to do a stat per file, even if the user doesn't want it.

I think I'll move away from using filepath.Walk for goimports.

@vincepri

This comment has been minimized.

Copy link

commented Jul 17, 2016

@bradfitz out of curiosity, is your idea here to rewrite readdirnames for unix/linux to make use of the Type in Dirent (https://golang.org/src/syscall/syscall_linux.go?h=ParseDirent#L773) as you were mentioning above?

It looks like some OSs may have that information but Go is discarding in favor of lstat.

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Jul 17, 2016

@vinceprignano, yes, that's what I'm doing now.

@gopherbot

This comment has been minimized.

Copy link

commented Jul 18, 2016

CL https://golang.org/cl/25001 mentions this issue.

@YuriyNasretdinov

This comment has been minimized.

Copy link

commented Jul 18, 2016

In my observations there were 2 problems:

  1. GC-unfriendly API for reading directory (no ability to do just readdir and get file types without all other stat structures, a lot of slice and struct pointer allocations)
  2. That's it

I wrote an article (in russian, sorry for that) that describes how to achieve performance that is higher than find(1) has: https://habrahabr.ru/post/281382/ (google translate: https://translate.google.ru/translate?sl=ru&tl=en&js=y&prev=_t&hl=ru&ie=UTF-8&u=https%3A%2F%2Fhabrahabr.ru%2Fpost%2F281382%2F&edit-text=&act=url)

@dmitshur

This comment has been minimized.

Copy link
Member

commented Jul 18, 2016

Actually, I realize now that the filepath.Walk (at least with our definition of os.FileInfo) is inherently slow, as its API guarantees you get a complete FileInfo with each call, which requires us to do a stat per file, even if the user doesn't want it.

I think I'll move away from using filepath.Walk for goimports.

I've run into a similar situation.

I created filepath.Walk-like functionality for http.FileSystem in https://godoc.org/github.com/shurcooL/httpfs/vfsutil#Walk. In my use cases, I often worked with virtual filesystems where files did not exist on disk, but rather virtually after doing some computation. So the act of "opening" a file was an a relatively expensive action. I ran into the same problem, the filepath.Walk behavior was suboptimal. It would open a virtual file (potentially somewhat expensive), do a stat on it to pass its os.FileInfo to walk func, then close the file. Then, inside the walk func, I would often need to access the file contents, so I'd have to open it again (somewhat expensive).

For those specific needs, I ended up creating https://godoc.org/github.com/shurcooL/httpfs/vfsutil#WalkFiles which would pass both the os.FileInfo but also the file contents as an io.ReadSeeker:

// WalkFiles walks the filesystem rooted at root, calling walkFn for each file or
// directory in the filesystem, including root. In addition to FileInfo, it passes an
// ReadSeeker to walkFn for each file it visits.
func WalkFiles(fs http.FileSystem, root string, walkFn WalkFilesFunc) error { ... }

// WalkFilesFunc is the type of the function called for each file or directory visited by Walk.
// It's like filepath.WalkFunc, except it provides an additional ReadSeeker parameter for file being visited.
type WalkFilesFunc func(path string, info os.FileInfo, rs io.ReadSeeker, err error) error

That worked well for my needs, but I like your idea of just not passing os.FileInfo at all, since computing even that can be expensive and sometimes not neccessary for caller. So it'd be simpler, more general, and more performant to just give the file path to walk func and let it do what it needs (stat, read file, etc.).

gopherbot pushed a commit to golang/tools that referenced this issue Jul 19, 2016

cmd/goimports, imports: optimize directory scanning and other things
This brings goimports from 160ms to 100ms on my laptop, and under 50ms
on my Linux machine.

Using cmd/trace, I noticed that filepath.Walk is inherently slow.
See https://golang.org/issue/16399 for details.

Instead, this CL introduces a new (private) filepath.Walk
implementation, optimized for speed and avoiding unnecessary work.

In addition to avoid an Lstat per file, it also reads directories
concurrently. The old goimports code did that too, but now that logic
is removed from goimports and the code is simplified.

This also adds some profiling command line flags to goimports that I
found useful.

Updates golang/go#16367 (goimports is slow)
Updates golang/go#16399 (filepath.Walk is slow)

Change-Id: I708d570cbaad3fa9ad75a12054f5a932ee159b84
Reviewed-on: https://go-review.googlesource.com/25001
Reviewed-by: Andrew Gerrand <adg@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Jul 20, 2016

@hirochachacha It sounds like you are describing a bug in the syscall package, a bug that should be fixed in any case. The syscall package provides ReadDirent and ParseDirent functions on all GNU/Linux targets; do those functions work on mips64?

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Jul 20, 2016

@cherrymui, can you run go test golang.org/x/tools/imports on mips64?

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Jul 20, 2016

@hirochachacha, please file a separate issue for that.

@hirochachacha

This comment has been minimized.

Copy link
Contributor

commented Jul 20, 2016

x/sys/unix 's Getdents uses SYS_GETDENTS64.
So what I am saying is completely wrong.
I'm really sorry about confusing people.

I'll remove my comments above and close the opened issue.

Again, I'm sorry.

@mattn

This comment has been minimized.

Copy link
Member

commented Jul 20, 2016

ref #16435

@cherrymui

This comment has been minimized.

Copy link
Contributor

commented Jul 20, 2016

@bradfitz, on MIPS64

$ go test -v golang.org/x/tools/imports 
=== RUN   TestFastWalk_Basic
--- PASS: TestFastWalk_Basic (0.01s)
=== RUN   TestFastWalk_Symlink
--- PASS: TestFastWalk_Symlink (0.01s)
=== RUN   TestFastWalk_SkipDir
--- PASS: TestFastWalk_SkipDir (0.01s)
=== RUN   TestFastWalk_TraverseSymlink
--- PASS: TestFastWalk_TraverseSymlink (0.01s)
=== RUN   TestFixImports
--- PASS: TestFixImports (0.06s)
=== RUN   TestImportSymlinks
--- PASS: TestImportSymlinks (0.04s)
=== RUN   TestFixImportsVendorPackage
--- PASS: TestFixImportsVendorPackage (0.01s)
=== RUN   TestFindImportGoPath
--- PASS: TestFindImportGoPath (0.01s)
=== RUN   TestFindImportInternal
--- FAIL: TestFindImportInternal (0.03s)
    fix_test.go:1034: findImportGoPath("race", Acquire ...) = "", false; want "internal/race", false
=== RUN   TestFindImportRandRead
--- PASS: TestFindImportRandRead (0.00s)
=== RUN   TestFindImportVendor
--- PASS: TestFindImportVendor (0.01s)
=== RUN   TestProcessVendor
--- FAIL: TestProcessVendor (0.03s)
    fix_test.go:1127: Process("/home/a/src/git/go-official/src/math/x.go") did not add expected hpack import "golang_org/x/net/http2/hpack"; got:
        package http

        import "strings"

        func f() { strings.NewReader(); hpack.HuffmanDecode() }
=== RUN   TestFindImportStdlib
--- PASS: TestFindImportStdlib (0.00s)
=== RUN   TestRenameWhenPackageNameMismatch
--- PASS: TestRenameWhenPackageNameMismatch (0.03s)
=== RUN   TestOptimizationWhenInGoroot
--- PASS: TestOptimizationWhenInGoroot (0.03s)
=== RUN   TestIgnoreDocumentationPackage
--- PASS: TestIgnoreDocumentationPackage (0.03s)
=== RUN   TestImportPathToNameGoPathParse
--- PASS: TestImportPathToNameGoPathParse (0.03s)
=== RUN   TestIgnoreConfiguration
--- PASS: TestIgnoreConfiguration (0.03s)
=== RUN   TestSkipNodeModules
--- PASS: TestSkipNodeModules (0.03s)
=== RUN   TestPkgIsCandidate
--- PASS: TestPkgIsCandidate (0.00s)
FAIL
exit status 1
FAIL    golang.org/x/tools/imports  0.499s

$ go version
go version devel +1d2ca9e Mon Jul 18 21:07:34 2016 +0000 linux/mips64le

Is the failure related? Should I go look into it?

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Jul 20, 2016

@cherrymui, those are old tests. I changed them to not rely on real GOROOT contents. Update & try again?

@cherrymui

This comment has been minimized.

Copy link
Contributor

commented Jul 20, 2016

$ go get -u golang.org/x/tools/imports
$ go test -v golang.org/x/tools/imports
=== RUN   TestFastWalk_Basic
--- PASS: TestFastWalk_Basic (0.01s)
=== RUN   TestFastWalk_Symlink
--- PASS: TestFastWalk_Symlink (0.00s)
=== RUN   TestFastWalk_SkipDir
--- PASS: TestFastWalk_SkipDir (0.00s)
=== RUN   TestFastWalk_TraverseSymlink
--- PASS: TestFastWalk_TraverseSymlink (0.01s)
=== RUN   TestFixImports
--- PASS: TestFixImports (0.06s)
=== RUN   TestImportSymlinks
--- PASS: TestImportSymlinks (0.04s)
=== RUN   TestFixImportsVendorPackage
--- PASS: TestFixImportsVendorPackage (0.01s)
=== RUN   TestFindImportGoPath
--- PASS: TestFindImportGoPath (0.01s)
=== RUN   TestFindImportInternal
--- FAIL: TestFindImportInternal (0.03s)
    fix_test.go:1034: findImportGoPath("race", Acquire ...) = "", false; want "internal/race", false
=== RUN   TestFindImportRandRead
--- PASS: TestFindImportRandRead (0.00s)
=== RUN   TestFindImportVendor
--- PASS: TestFindImportVendor (0.01s)
=== RUN   TestProcessVendor
--- FAIL: TestProcessVendor (0.03s)
    fix_test.go:1127: Process("/home/a/src/git/go-official/src/math/x.go") did not add expected hpack import "golang_org/x/net/http2/hpack"; got:
        package http

        import "strings"

        func f() { strings.NewReader(); hpack.HuffmanDecode() }
=== RUN   TestFindImportStdlib
--- PASS: TestFindImportStdlib (0.00s)
=== RUN   TestRenameWhenPackageNameMismatch
--- PASS: TestRenameWhenPackageNameMismatch (0.03s)
=== RUN   TestOptimizationWhenInGoroot
--- PASS: TestOptimizationWhenInGoroot (0.03s)
=== RUN   TestIgnoreDocumentationPackage
--- PASS: TestIgnoreDocumentationPackage (0.03s)
=== RUN   TestImportPathToNameGoPathParse
--- PASS: TestImportPathToNameGoPathParse (0.03s)
=== RUN   TestIgnoreConfiguration
--- PASS: TestIgnoreConfiguration (0.03s)
=== RUN   TestSkipNodeModules
--- PASS: TestSkipNodeModules (0.03s)
=== RUN   TestPkgIsCandidate
--- PASS: TestPkgIsCandidate (0.00s)
FAIL
exit status 1
FAIL    golang.org/x/tools/imports  0.489s

I did a go get -u but see no difference...

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Jul 20, 2016

@cherrymui, oh, sorry... there are still those two tests which still use the machine's real $GOROOT. I didn't convert those yet I guess. In any case, you can ignore those errors. They should go away when you sync your $GOROOT. I'll fix those tests.

But the real question was whether the TestFastWalk_ tests passed, which they did.

Thanks!

@cherrymui

This comment has been minimized.

Copy link
Contributor

commented Jul 20, 2016

Ok, great. Thanks!

@rasky

This comment has been minimized.

Copy link
Member

commented Aug 1, 2016

Not sure what the current plan is to fix this, but given that os.FileInfo is an interface, wouldn't make sense to return an object that does a lazy Lstat only when required? It could answer Name() and IsDir() using only information returned by ParseDirent, and anything else would lazily call Lstat.

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Aug 1, 2016

@rasky, that doesn't work, as the Size() int64 method (or IsDir() bool) on the os.FileInfo has no means of returning an error if the lazy lstat fails.

gopherbot pushed a commit that referenced this issue Aug 16, 2016

strings: add special cases for Join of 2 and 3 strings
We already had special cases for 0 and 1. Add 2 and 3 for now too.
To be removed if the compiler is improved later (#6714).

This halves the number of allocations and total bytes allocated via
common filepath.Join calls, improving filepath.Walk performance.

Noticed as part of investigating filepath.Walk in #16399.

Change-Id: If7b1bb85606d4720f3ebdf8de7b1e12ad165079d
Reviewed-on: https://go-review.googlesource.com/25005
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>

@quentinmit quentinmit added the NeedsFix label Oct 10, 2016

@rsc rsc added this to the Unplanned milestone Oct 19, 2016

@c4milo

This comment has been minimized.

Copy link
Member

commented Mar 15, 2017

@bradfitz should we copy your fastwalk in the mean time? or this is something that would be merged or put on some of the experimental repos?

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Mar 15, 2017

The license permits copying if you want to copy it. I suppose I could put it on github or something.

@gopherbot

This comment has been minimized.

Copy link

commented Apr 8, 2017

CL https://golang.org/cl/40092 mentions this issue.

gopherbot pushed a commit to golang/tools that referenced this issue Apr 9, 2017

imports: wait for fastWalk workers to finish before returning
In some cases walkFn is being called after the fastWalk function has
returned. This often happens when an error was encountered early on in
scanning directories with many entries.

It is caused by fastWalk not waiting for its workers to complete their
work. A sync.WaitGroup is used to wait for all workers to finish when
the function returns.

Updates golang/go#16399

Change-Id: I695d30c18e4878b789520b9d8a650f9688d896ac
Reviewed-on: https://go-review.googlesource.com/40092
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@johansglock

This comment has been minimized.

Copy link

commented Apr 11, 2017

On macOS 10.12.4 with go version go1.8.1 darwin/amd64 goimports panics with a deadlock on this change.

After "go get -u golang.org/x/tools/cmd/goimports" this is the panic:


goroutine 1 [chan receive]:
golang.org/x/tools/imports.fixImports(0xc420015380, 0xc420094e80, 0x7fff5fbff0e2, 0x12, 0x329, 0x600, 0x132c770, 0xc420094e80, 0x0)
	/Users/johans/Go/src/golang.org/x/tools/imports/fix.go:237 +0x6ba
golang.org/x/tools/imports.Process(0x7fff5fbff0e2, 0x12, 0xc42016e000, 0x329, 0x600, 0x132c770, 0x0, 0x0, 0x0, 0xc420049da8, ...)
	/Users/johans/Go/src/golang.org/x/tools/imports/imports.go:58 +0x719
main.processFile(0x7fff5fbff0e2, 0x12, 0x0, 0x0, 0x132d620, 0xc42000c018, 0x1, 0x0, 0x0)
	/Users/johans/Go/src/golang.org/x/tools/cmd/goimports/goimports.go:136 +0x139
main.gofmtMain()
	/Users/johans/Go/src/golang.org/x/tools/cmd/goimports/goimports.go:273 +0x21c
main.main()
	/Users/johans/Go/src/golang.org/x/tools/cmd/goimports/goimports.go:189 +0x32

goroutine 5 [semacquire]:
sync.runtime_Semacquire(0xc4201bf26c)
	/usr/local/go/src/runtime/sema.go:47 +0x34
sync.(*WaitGroup).Wait(0xc4201bf260)
	/usr/local/go/src/sync/waitgroup.go:131 +0x7a
golang.org/x/tools/imports.fastWalk(0xc4201c6860, 0x14, 0xc4201bf250, 0x132daa0, 0xc4202ab608)
	/Users/johans/Go/src/golang.org/x/tools/imports/fastwalk.go:92 +0x7c7
golang.org/x/tools/imports.scanGoDirs(0xc420014200)
	/Users/johans/Go/src/golang.org/x/tools/imports/fix.go:568 +0x354
golang.org/x/tools/imports.scanGoPath()
	/Users/johans/Go/src/golang.org/x/tools/imports/fix.go:490 +0x26
sync.(*Once).Do(0x135f600, 0x1223060)
	/usr/local/go/src/sync/once.go:44 +0xbe
golang.org/x/tools/imports.findImportGoPath(0xc420011810, 0x8, 0xc4201bd380, 0x7fff5fbff0e2, 0x12, 0x0, 0x0, 0x0, 0x0, 0x0)
	/Users/johans/Go/src/golang.org/x/tools/imports/fix.go:723 +0x1cd
golang.org/x/tools/imports.fixImports.func2(0x7fff5fbff0e2, 0x12, 0xc420077320, 0xc420011810, 0x8, 0xc4201bd380)
	/Users/johans/Go/src/golang.org/x/tools/imports/fix.go:227 +0x7b
created by golang.org/x/tools/imports.fixImports
	/Users/johans/Go/src/golang.org/x/tools/imports/fix.go:233 +0x614

goroutine 7 [chan send]:
golang.org/x/tools/imports.(*walker).doWork(0xc4201bd4a0, 0xc4201bf260)
	/Users/johans/Go/src/golang.org/x/tools/imports/fastwalk.go:121 +0x190
created by golang.org/x/tools/imports.fastWalk
	/Users/johans/Go/src/golang.org/x/tools/imports/fastwalk.go:71 +0x273

goroutine 8 [chan send]:
golang.org/x/tools/imports.(*walker).doWork(0xc4201bd4a0, 0xc4201bf260)
	/Users/johans/Go/src/golang.org/x/tools/imports/fastwalk.go:121 +0x190
created by golang.org/x/tools/imports.fastWalk
	/Users/johans/Go/src/golang.org/x/tools/imports/fastwalk.go:71 +0x273

goroutine 10 [chan send]:
golang.org/x/tools/imports.(*walker).doWork(0xc4201bd4a0, 0xc4201bf260)
	/Users/johans/Go/src/golang.org/x/tools/imports/fastwalk.go:121 +0x190
created by golang.org/x/tools/imports.fastWalk
	/Users/johans/Go/src/golang.org/x/tools/imports/fastwalk.go:71 +0x273

goroutine 11 [chan send]:
golang.org/x/tools/imports.(*walker).doWork(0xc4201bd4a0, 0xc4201bf260)
	/Users/johans/Go/src/golang.org/x/tools/imports/fastwalk.go:121 +0x190
created by golang.org/x/tools/imports.fastWalk
	/Users/johans/Go/src/golang.org/x/tools/imports/fastwalk.go:71 +0x273

After doing:

cd ~/Go/src/golang.org/x/tools
git revert 4436e5475416d77a9352558d118d0b585b962ef1
go get golang.org/x/tools/cmd/goimports

goimports works as intended again and executes successfully.

Of course if there is any extra information needed, please let me know.

@johansglock

This comment has been minimized.

Copy link

commented Apr 11, 2017

Dug a little further, all works as expected using only one worker, the problem is that walk is blocking on writing to the error channel. Emptying the work and error channels on return resolved this.

This patch works for me:

diff --git a/imports/fastwalk.go b/imports/fastwalk.go
index c8a79490..0a15d898 100644
--- a/imports/fastwalk.go
+++ b/imports/fastwalk.go
@@ -64,7 +64,21 @@ func fastWalk(root string, walkFn func(path string, typ os.FileMode) error) erro
                // buffered for correctness & not leaking goroutines:
                resc: make(chan error, numWorkers),
        }
-       defer close(w.donec)
+       defer func() {
+               close(w.donec)
+
+       L:
+               for {
+                       select {
+                       case <-w.workc:
+                       case <-w.resc:
+                       default:
+                               break L
+                       }
+               }
+
+       }()
+
        // TODO(bradfitz): start the workers as needed? maybe not worth it.
        for i := 0; i < numWorkers; i++ {
                wg.Add(1)
@gopherbot

This comment has been minimized.

Copy link

commented Apr 11, 2017

CL https://golang.org/cl/40351 mentions this issue.

@jstemmer

This comment has been minimized.

Copy link
Contributor

commented Apr 11, 2017

That's interesting, I hadn't considered that. It think this happens when a worker received more input but then gets stuck on sending the result to w.resc here.

Another possible way of dealing with this that doesn't involve draining could be:

func (w *walker) doWork(wg *sync.WaitGroup) {
	defer wg.Done()
	for {
		select {
		case <-w.donec:
			return
		case it := <-w.workc:
			err := w.walk(it.dir, !it.callbackDone)
			select {
			case <-w.donec:
				return
			case w.resc <- err:
			}
		}
	}
}

I wonder if there's a better way of dealing with this.

@johansglock

This comment has been minimized.

Copy link

commented Apr 11, 2017

Confirmed that your approach works as well. I'm not happy with either solution either, but don't see any better way, so either way works from my perspective ;)

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Apr 11, 2017

@johansglock, thanks, and sorry. I've reverted the change in https://go-review.googlesource.com/c/40296/

gopherbot pushed a commit to golang/tools that referenced this issue Apr 11, 2017

Revert "imports: wait for fastWalk workers to finish before returning"
This reverts commit 4436e54.

Reason for revert: Breaks goimports. See:
golang/go#16399 (comment)

Change-Id: I3bda8f0fd32380d19d7daecf3489a24e51abfbe7
Reviewed-on: https://go-review.googlesource.com/40296
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@gopherbot

This comment has been minimized.

Copy link

commented Apr 25, 2017

CL https://golang.org/cl/41681 mentions this issue.

gopherbot pushed a commit to golang/tools that referenced this issue Apr 25, 2017

imports: wait for fastWalk workers to finish before returning (take 2)
This is Joël Stemmer's https://golang.org/cl/40092 again, but with
a fix to prevent workers from deadlocking on send if the caller had
already started to shut down. See:

golang/go#16399 (comment)

Updates golang/go#16399
Fixes golang/go#20109 (it looks like)

Change-Id: I3d1cf6f24563d02e1369a4496c2d37dcc1f5e5b8
Reviewed-on: https://go-review.googlesource.com/41681
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Joël Stemmer <jstemmer@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@alexkreidler

This comment has been minimized.

Copy link

commented Oct 29, 2017

Any updates on this?

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Nov 1, 2017

@tim15, there can't be any update on this due to the fundamental API design. See comment #16399 (comment) above.

I'll label this bug Go 2. Maybe we can change the walk interface to be more efficient later.

@bradfitz bradfitz added the Go2 label Nov 1, 2017

@alexkreidler

This comment has been minimized.

Copy link

commented Nov 4, 2017

I've written a few benchmarks:

  • a naive recursive using os.File.Readdir()
  • parallelized recursive version using goroutines and wait groups
  • and also included the one by @bradfitz up above using filepath.Walk()

Here they are: https://github.com/Tim15/golang-parallel-io.

They're all slower than find, probably due to the fact that os.File.Readdir() returns a complete os.FileInfo. I was surprised though that my parallel version was faster than filepath.Walk().

Hopefully this helps us figure this out!

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Nov 4, 2017

@tim15, we already figured it out over a year ago. See the comment I linked you to 3 days ago. The solution for goimports ended up being https://github.com/golang/tools/blob/master/imports/fastwalk.go but that's not applicable to the standard library. This isn't fixable with the current API.

@alexkreidler

This comment has been minimized.

Copy link

commented Nov 4, 2017

@bradfitz Cool, thanks for catching me up!

@SirWindfield

This comment has been minimized.

Copy link

commented Nov 28, 2017

@bradfitz Sorry to ask here but is it possible to use the fastwalk API in one of my applications? I couldn't find any information on that.
Looks like no method is exported to the outside (all are lower-cased).

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Nov 28, 2017

@SirWindfield, no, I did not expose its implementation publicly. You can copy/paste it into your own package if you'd like, but it's not something I want to support at this time.

@rasky

This comment has been minimized.

Copy link
Member

commented Jan 5, 2018

For people looking for reusing the code into their own projecst, there is now this third-party library that is basically a clone of fastwalk: https://github.com/karrick/godirwalk

@s12chung

This comment has been minimized.

Copy link

commented Aug 27, 2018

forked the repo and made an update script to clean things up with tags: https://github.com/s12chung/fastwalk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.