Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/tools/gopls: "stuck" at 100% CPU #34569

Closed
myitcv opened this issue Sep 27, 2019 · 24 comments
Closed

x/tools/gopls: "stuck" at 100% CPU #34569

myitcv opened this issue Sep 27, 2019 · 24 comments
Assignees

Comments

@myitcv
Copy link
Member

@myitcv myitcv commented Sep 27, 2019

What version of Go are you using (go version)?

$ go version
go version devel +1804bbab62 Tue Sep 24 18:20:52 2019 +0000 linux/amd64
$ go list -m golang.org/x/tools
golang.org/x/tools v0.0.0-20190926165942-a8d5d34286bd
$ go list -m golang.org/x/tools/gopls
golang.org/x/tools/gopls v0.1.8-0.20190926165942-a8d5d34286bd

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="on"
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/myitcv/.cache/go-build"
GOENV="/home/myitcv/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/myitcv/gostuff"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/myitcv/gos"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/myitcv/gos/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/myitcv/gostuff/src/github.com/myitcv/govim/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build780296000=/tmp/go-build -gno-record-gcc-switches"

What did you do?

There is clearly extra CPU required with staticcheck enabled, and I see that on startup. It's just that 1 time in 5 something gets stuck

A kill -SIGABRT stack trace and gopls log file can be found here:

https://gist.github.com/myitcv/4ef84fd1115d27f1305566647278f72e

Disabling staticcheck within gopls I no longer see any problems.

What did you expect to see?

Normal operation

What did you see instead?

Per above


cc @stamblerre @ianthehat @dominikh

@mvdan

This comment has been minimized.

Copy link
Member

@mvdan mvdan commented Sep 28, 2019

Did you investigate to see if it's really stuck somewhere? Maybe staticcheck is just spending a lot of time doing extra work for some reason.

@myitcv

This comment has been minimized.

Copy link
Member Author

@myitcv myitcv commented Sep 28, 2019

Not looked in detail as yet. Raised the issue to see whether others are seeing problems as much as anything.

@zchee

This comment has been minimized.

Copy link
Contributor

@zchee zchee commented Sep 28, 2019

@myitcv @mvdan I also repro this issue. I'll looking in detail soon.

@tj

This comment has been minimized.

Copy link

@tj tj commented Oct 1, 2019

Not sure if I'm doing something wrong, but gopls seems to get completely stuck using %400+ CPU every few minutes for me during regular development, I have to keep killing it. I'm using golint and VSCode

@dominikh

This comment has been minimized.

Copy link
Member

@dominikh dominikh commented Oct 1, 2019

What strikes me as odd are the many goroutines doing work in honnef.co/go/tools/ssa – how does gopls handle in-progress analyses when the user keeps on typing? Will only one instance of an analysis run until it finishes? Will new analyses be spawned, causing multiple instances to run at the same time?

My suspicion is that SSA construction is taking a fair amount of time, and gopls keeps spawning more of it, not waiting for completion. But that's just a guess.

@muirdm

This comment has been minimized.

Copy link

@muirdm muirdm commented Oct 2, 2019

I think @dominikh is right. Every file change triggers a new diagnostics run, cancelling any previous in-progress diagnostics. As discussed in slack, the rub is that go/analysis doesn't support cancellation, so you can end up with multiple running at once. gopls has its own go/analysis driver, so it could at least check the context before starting each pass.

@myitcv myitcv changed the title x/tools/gopls: stuck at 100% CPU with staticcheck enabled x/tools/gopls: "stuck" at 100% CPU Oct 7, 2019
@myitcv

This comment has been minimized.

Copy link
Member Author

@myitcv myitcv commented Oct 7, 2019

Adding my very rudimentary analysis from Slack:

So here's my current understanding after a brief analysis into why my machine was getting "hot" when gopls handled a steady stream of changes to a file:

  • the package containing the file I am changing takes ~200ms to type check
  • the analysis phase (with staticcheck turned off) takes a further ~50ms
  • the diagnostic calculations appear to run concurrently
  • it appears therefore that multiple staticcheck runs can happen simultaneously
  • there is no limit to the number of concurrent diagnostic calculations
  • there is no "collapsing" of diagnostic calculation runs - every changes triggers a recalculation of diagnostics
  • there appears to be a race whereby an earlier diagnostic run can return results after a later one

I've retitled to confirm this is not a problem with staticcheck per se. Rather that diagnostic calculations (of which staticcheck can be a part) back up after a steady stream of changes.

Could we separate the calculation of parse, type and analysis diagnostics? It seems that none of these categories should ever run concurrently in response to a change (i.e. shouldn't have two parse checks running concurrently, but ok for a parse and type check to be running concurrently), but they can (and should) run independently given the widely varying times taken to run each. Taking the example above, if staticcheck is added into the mix, the analysis phase will lengthen (significantly?). This would mean that parse errors are returned almost immediately (which is a good thing) with type errors following later, and analysis errors later still.

Thoughts?

@primalmotion

This comment has been minimized.

Copy link

@primalmotion primalmotion commented Oct 15, 2019

I have the same issue. gopls is constantly stuck between 100% and 600% while I do nothing. If I kill It, it starts over and after a second or two, back to battery draining. It has became totally unusable for me

@primalmotion

This comment has been minimized.

Copy link

@primalmotion primalmotion commented Oct 15, 2019

Actually, my issue seems to be caused (and fixed) by what's discussed in here : #34906 (comment)

@myitcv

This comment has been minimized.

Copy link
Member Author

@myitcv myitcv commented Oct 15, 2019

I suspect that #34906 and the fix in https://go-review.googlesource.com/c/tools/+/201098 will deal with part of the problem. However the CPU issue will likely remain because per #34569 (comment), type checking and analysis happens on every single change to gopls.

@myitcv

This comment has been minimized.

Copy link
Member Author

@myitcv myitcv commented Oct 15, 2019

I've done a bit more digging here because something really wasn't adding up. The ~200ms for type checking reported in #34569 (comment) is the time taken for a full type check, i.e. all transitive dependencies. It doesn't therefore make sense that after a single change to a file in said package that a full type check happens, because gopls is reusing unchanged packages via go/types.Importer.

The issue appears to be that making changes in any file in a package, say example.com/apiserver, where there are also test files, results in the test package example.com/apiserver [example.com/apiserver.test] being fully typed checked after each change.

If instead you make changes to any file in a package that does not have test files (i.e. by definition you're changing on of the package files) then the type check is instantaneous, ~5ms.

This is entirely reproducible on my machine, and visible via the Jaeger trace viewer.

I think this explains the discrepancy, however I've not looked into why this is happening within gopls.

@myitcv

This comment has been minimized.

Copy link
Member Author

@myitcv myitcv commented Oct 15, 2019

My previous comment is accurate as of f936694f27bff40964345764535d9ccfdf3eb98c.

@myitcv

This comment has been minimized.

Copy link
Member Author

@myitcv myitcv commented Oct 21, 2019

Further update to #34569 (comment) after some investigation with @stamblerre:

The package in question, example.com/apiserver, is rather large. It has 1.4MB of code (mostly generated). It turns out that the type checking of example.com/apiserver [example.com/apiserver.test] is using the cached results from previous type checking runs, and that ~200ms is spent type checking this 1.4MB of code.

Therefore I wonder whether, given the implementation of type checking/analysis stands, i.e. limits on ability to cancel work, and working on the assumption that type checking/analysis are both are as concurrent as they can be, we shouldn't simply limit the number of concurrent type checking/analysis runs to one each. i.e. this means we will potentially collapse down type checking/analysis runs when there are fast streams of updates (in larger packages).

@bmhatfield

This comment has been minimized.

Copy link

@bmhatfield bmhatfield commented Oct 22, 2019

I believe I am hitting this case as well. I find that it's specifically triggered when refactoring code - VSCode submitting "incomplete" code to gopls really sends it off the deep end. This makes gopls really rough to work with at the moment.

This has been my experience on multiple versions, including the most recent at the moment 774d2ec1, though as of that version, even though it goes off the deep end, it's not necessarily getting stuck "forever" anymore.

@ermik

This comment has been minimized.

Copy link

@ermik ermik commented Nov 6, 2019

Using VS Code remote development container, gopls at ~500% CPU (15'' MacBook Pro 2017) hangs Docker (hyperkit) indefinitely.

For regular environment best workaround seems to be routinely restarting the language server.

I am still amazed there aren't expletives all over this GitHub issue.

@stamblerre

This comment has been minimized.

Copy link
Contributor

@stamblerre stamblerre commented Nov 6, 2019

@ermik: What version of gopls are you using (gopls version)? You can try downloading the latest pre-release (cd in to a temporary directory and run go get golang.org/x/tools/gopls@v0.2.0-pre2) to see if that works better for you.

@stamblerre stamblerre self-assigned this Nov 7, 2019
@ermik

This comment has been minimized.

Copy link

@ermik ermik commented Nov 7, 2019

The VS Code tools update command ran recently (~2-3 days ago) has pulled in gopls@v1.7.0. The container in question is using the same workflow. I will try assigning

"go.toolsGopath": "${workspaceFolder}/.tools"

and upping the "insider" version of gopls for that folder. Will report back.

@ermik

This comment has been minimized.

Copy link

@ermik ermik commented Nov 8, 2019

From a day of using 0.2.0-pre2 in a VS Code multi-root workspace I can say that I have not seen it spike above expected high load at around 100% CPU.

The obvious failures came from import handling:

  • Modules in $GOPATH / vendor directories aren't guessed by name or added to imports at all when gopls runs on save. Not a big deal. Standard library imports seemed to have this function work flawlessly.
  • When new import is added manually, with a valid URI, all functionality may halt (observed multiple times throughout the day) until editor window is reloaded (no CPU spikes and no errors in the output).
  • When a new import is added with an invalid (unreachable URL, e.g. one that needs a local replace directive) the editor still does not provide feedback, blocking CPU at 100% or simply halting all editor feedback until URL is fixed (or replace is specified) and window is reloaded.
  • Invoking restart language server command seems to have no useful impact. Reloading editor worked better in all failure cases.

Thanks, Rebecca.

@stamblerre

This comment has been minimized.

Copy link
Contributor

@stamblerre stamblerre commented Nov 9, 2019

Thanks for the report, @ermik. If you are able to file an issue with a repro case and a log I can help you investigate the issues that you've noticed (see https://github.com/golang/tools/blob/master/gopls/doc/troubleshooting.md#capturing-logs for information on how to capture the logs).

@stamblerre stamblerre changed the title x/tools/gopls: "stuck" at 100% CPU x/tools/gopls: "stuck" at 100% CPU with staticcheck Dec 4, 2019
@stamblerre stamblerre modified the milestones: Unreleased, gopls unplanned Dec 4, 2019
@myitcv myitcv removed this from the gopls unplanned milestone Jan 16, 2020
@myitcv myitcv added this to the gopls/v1.0.0 milestone Jan 16, 2020
@myitcv

This comment has been minimized.

Copy link
Member Author

@myitcv myitcv commented Jan 16, 2020

Marking as v1.0.0 because this remains a big usability issue.

@stamblerre

This comment has been minimized.

Copy link
Contributor

@stamblerre stamblerre commented Jan 16, 2020

@myitcv: This is only with staticcheck on, right? I'm not sure if you were there when I said this, but staticcheck will still be off by default in gopls/v1.0.0. We don't plan to invest time into fixing CPU/memory usage with staticcheck enabled before 1.0 because it's too different from the underlying gopls architecture and smoothing the integration will take a lot of work.

@stamblerre stamblerre modified the milestones: gopls/v1.0.0, gopls unplanned Jan 16, 2020
@myitcv

This comment has been minimized.

Copy link
Member Author

@myitcv myitcv commented Jan 16, 2020

@stamblerre

This is only with staticcheck on, right

No, I ended up turning staticcheck off. Things were getting swamped simply because of the load of type checking with the 1.4MB of code in that package and many changes per second (as I typed to make changes, for example).

I'll retest this against the latest master once things stabilise and confirm where we stand.

@stamblerre

This comment has been minimized.

Copy link
Contributor

@stamblerre stamblerre commented Jan 16, 2020

Disabling staticcheck within gopls I no longer see any problems.

Sorry, based on this, I had assumed that this was only relevant to staticcheck so I changed the issue title. I think you can try it with master now, it should be stable enough to check.

@myitcv

This comment has been minimized.

Copy link
Member Author

@myitcv myitcv commented Jan 20, 2020

I've just done a test against 0cba7a3a with partial revert of CL 214586 applied on top, in the form of CL 215239, and things are massively improved. i.e. we don't appear to suffer anywhere near the level of analysis "floods" we did before. Therefore I'm going to close this issue; we can always re-open if there are specific cases that do cause problems.

Thanks for all the fixes as ever @stamblerre 😄

@myitcv myitcv closed this Jan 20, 2020
@stamblerre stamblerre changed the title x/tools/gopls: "stuck" at 100% CPU with staticcheck x/tools/gopls: "stuck" at 100% CPU Jan 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.