Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: mod tidy takes excessive wall & CPU time statting irrelevant files #38791

Open
pwaller opened this issue May 1, 2020 · 11 comments
Open

cmd/go: mod tidy takes excessive wall & CPU time statting irrelevant files #38791

pwaller opened this issue May 1, 2020 · 11 comments

Comments

@pwaller
Copy link
Contributor

@pwaller pwaller commented May 1, 2020

Version: go1.14.2, Repro on latest release: Yes. OS/Arch: Ubuntu 18.04 and amd64.

What did you do?

Ran go mod tidy on a directory where go mod tidy had already been run.

(In the top level of a go module which also happens to be a deeply nested directory tree with millions of XML files)

What did you expect to see?

An instant result.

What did you see instead?

24s wall & 13s CPU.

Discussion

Running strace reveals that go mod tidy is doing a stat-like system call to every file under my directory tree, which includes very many (millions) of irrelevant xml files in a deeply nested directory tree within my go module. My expectation is that Go would ignore these files. Maybe it has to read the directory entry, to discover go files or so, but it shouldn't do more work than that.

This bug currently renders vscode unusable because it frequently runs go mod tidy, and the net result is that I end up with hundreds of go mod tidy processes all contending to see who amongst them can stat files the fastest.

My initial workaround was to run pkill -cf 'go mod tidy' in a loop, but @jayconrod informs me that go tooling will ignore directories beginning with _ or ., so I will probably use that workaround for now.

/cc @bcmills @matloob @jayconrod

@bcmills bcmills added ToolSpeed and removed Performance labels May 1, 2020
@bcmills bcmills added this to the Backlog milestone May 1, 2020
@jayconrod jayconrod added this to the Backlog milestone May 1, 2020
@bcmills bcmills added the help wanted label May 1, 2020
@jayconrod
Copy link
Contributor

@jayconrod jayconrod commented May 1, 2020

tidy walks the file system tree in the main module to find packages. Depending on the operating system, it may not need to stat every file during that walk, since we mostly just need to know whether something's a file or directory. There's probably opportunity for optimization here.

@pwaller
Copy link
Contributor Author

@pwaller pwaller commented May 1, 2020

Taking a stack dump with ctrl-\ shows the offending call to filepath.Walk, which obviously populates os.FileInfo using stat. I guess this would require some other walking logic.

filepath.Walk(root, func(path string, fi os.FileInfo, err error) error {

walkPkgs(root, modPrefix, prune)

modload.LoadALL()

@pwaller
Copy link
Contributor Author

@pwaller pwaller commented May 1, 2020

Following a conversation with @bcmills, he suggests https://godoc.org/golang.org/x/tools/internal/fastwalk which avoids all the stat calls.

I'm unlikely to be able to submit a patch for this, as much as I want to.

I did try out fastwalk quickly (protecting 'have' with a mutex), and found there is yet another thing which does a readdir:

infos, err := ioutil.ReadDir(dir)

if _, _, err := scanDir(path, tags); err != imports.ErrNoGo {

@pwaller
Copy link
Contributor Author

@pwaller pwaller commented May 1, 2020

OK, I think I succeeded at testing fastwalk and using fastwalk instead of readdir. It reduces the runtime to 1s wall + 1.5 CPU + 0.9s sys. GNU find by comparison takes 0.8s+0.3s+0.5s. So the CPU cost isn't fantastic but it's still better than it was.

I still care about the battery usage of so many go mod tidy instances in vscode, and given how many processes there are lying around to kill it seems like there is an appreciable amount of CPU time being used over time, even with my fix /cc @stamblerre on that point.

@mvdan
Copy link
Member

@mvdan mvdan commented May 1, 2020

I definitely think that commands like go mod tidy should be faster, but I also wonder if gopls could do better here given that it's a long-running daemon. Watching an entire module for file changes is tricky, but surely there is a way we can avoid walking the entire module over and over.

@bcmills
Copy link
Member

@bcmills bcmills commented May 1, 2020

go mod tidy should only be necessary if it is possible that the module has become untidy.

In general, we try to make all go commands tidiness-preserving, so if the module was already tidy to begin with, it should only have become untidy if either:

  1. some package has changed, in which case go list -deps -test on that package should suffice (rather than a full tree walk), or

  2. some dependency has changed (due to running go build or go test on a package outside the main module, an explicit user edit to the go.mod or go.sum file, or an aforementioned change in some package), all of which we can detect by watching for changes in go.mod and go.sum alone.

@bcmills
Copy link
Member

@bcmills bcmills commented May 1, 2020

(So, you may want to open a separate issue against gopls.)

@heschik
Copy link
Contributor

@heschik heschik commented May 1, 2020

If I remove the last dep on a module from my project, how would gopls use go list -deps -test to discover that the module is now unused?

@bcmills
Copy link
Member

@bcmills bcmills commented May 4, 2020

If I remove the last dep on a module from my project, how would gopls use go list -deps -test to discover that the module is now unused?

If you have a cached invocation of go list -deps -test for that package, then you could notice that that dependency is removed by a later edit, and (perhaps) also notice that that dependency is not present in the output of go list -deps -test for any other package in the module.

Or, you could use the change in package dependencies to trigger a go mod tidy run, but more conservatively than for the whole module on ~every edit.

@stamblerre
Copy link
Contributor

@stamblerre stamblerre commented May 4, 2020

Or, you could use the change in package dependencies to trigger a go mod tidy run, but more conservatively than for the whole module on ~every edit.

We run it on saved changes to the go.mod file and on changes to imports that are saved on disk. We can probably optimize it further, but I think the gopls issue is probably a cancellation bug.

@pwaller
Copy link
Contributor Author

@pwaller pwaller commented May 9, 2020

(That is now #38816)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants
You can’t perform that action at this time.