Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/tools/gopls: deepCompletionState.enqueue is an allocation hotspot #53798

Open
adonovan opened this issue Jul 11, 2022 · 3 comments
Open

x/tools/gopls: deepCompletionState.enqueue is an allocation hotspot #53798

adonovan opened this issue Jul 11, 2022 · 3 comments
Assignees
Labels
gopls/performance gopls Tools
Milestone

Comments

@adonovan
Copy link
Member

adonovan commented Jul 11, 2022

I noticed while running TestBenchmarkConfiguredCompletion on k8s that deepCompletionState.enqueue

func (s *deepCompletionState) enqueue(cand candidate) {
	s.nextQueue = append(s.nextQueue, cand)
}

allocates 10GB of memory because it adds 5 million completion candidates to the queue, each being an 18-word structure. This causes each call of env.Completion in the benchmark to take 10s. There are two calls, the warmup and the one in the b.N loop (which only reaches N=1).

This was the command. The .go file is my favorite large file in k8s.

$ go test -bench=TestBenchmarkConfiguredCompletion ./gopls/internal/regtest/bench -completion_workdir=$HOME/w/kubernetes -completion_file=../kubernetes/pkg/generated/openapi/zz_generated.openapi.go -completion_regexp=Get

Also, 2GB (and 3s CPU) are spent on this call to Type.String in completer.methodsAndFields:

if typ.String() == "*testing.F" && addressable {
	// is that a sufficient test? (or is more care needed?)
	if c.fuzz(typ, mset, imp, cb, c.snapshot.FileSet()) {
		return
	}
}

@muirdm

@gopherbot gopherbot added Tools gopls labels Jul 11, 2022
@gopherbot gopherbot added this to the Unreleased milestone Jul 11, 2022
@muirdm
Copy link

muirdm commented Jul 11, 2022

The queue was added when the candidate search was changed from DFS to BFS. I presume the benchmark disables the completion budget to avoid flakiness. In practice the budget is set to ~100ms, so the search gives up sooner. Improving this hotspot would allow gopls to search more candidates within its 100ms budget, but might not end up materially improving the completion results or reducing overall CPU usage (assuming it couldn't search 5 million candidates in 100ms anyway).

@findleyr
Copy link
Contributor

findleyr commented Jul 12, 2022

Ah, thanks @muirdm. Right: the benchmark takes 10s because IIRC the regtest environment sets a completion budget of 10s.

We should investigate whether reducing allocation allows us to search more candidates per second, as that might matter more for smaller repos.

@adonovan
Copy link
Member Author

adonovan commented Jul 12, 2022

Thanks @muirdm for the insight. The running time of both calls---warmup and the first benchmark run---did seem curiously similar, not to mention close to 10s.

So, the benchmark is fundamentally flawed: it always measures the first 10s of a loop that never completes, and that includes completion tasks and other tasks (parsing and type checking). Therefore making the production code faster at the task the benchmark claims to measure does not improve the benchmark numbers, it merely reduces the fraction of the constant 10s of wall time that is spent in completion tasks. Consequently memory allocation appears to go through the roof (because parsing and type-checking are so allocy). I don't know the code well enough to say how, but we should change the benchmark to measure a task that corresponds to the latency experienced by the user in a completion request.

@suzmue suzmue modified the milestones: Unreleased, gopls/later Jul 12, 2022
@suzmue suzmue added the gopls/performance label Jul 12, 2022
gopherbot pushed a commit to golang/tools that referenced this issue Jul 13, 2022
Before, these methods of the Source interface used to use
a cache of buildASTCache, which built a Pos-keyed map for
a whole file, but the necessary algorithm is essentially
a binary search which is plenty fast enough to avoid the
need for cache.

This change implements that algorithm and moves
both methods out of the interface into a single function,
source.FindDeclAndField.

--

I measured the duration of all calls to astCacheData (before)
and FindDeclAndField (after) occurring within this command:

  $ go test -bench=TestBenchmarkConfiguredCompletion -v ./gopls/internal/regtest/bench -completion_workdir=$HOME/w/kubernetes -completion_file=../kubernetes/pkg/generated/openapi/zz_generated.openapi.go -completion_regexp=Get

(The numbers reported by this benchmark are problematic,
which is why I measured call times directly; see
golang/go#53798.)

Results:
before (n=4727) max =  21ms, 90% = 4.4ms, median = 19us
after  (n=6282) max = 2.3ms, 90% = 25us,  median = 14us

The increased number of calls to the function after the change
is due to a longstanding bug in the benchmark: each iteration of
the b.N loop doesn't do a fixed amount of work, it does as much
as it can in 10s. Thus making the code faster simply causes
the benchmark to spend the same amount of time on other parts of
the program--such as the loop that calls FindDeclAndField.

See https://go-review.googlesource.com/c/tools/+/221021 for
background on the previous implementation.

Change-Id: I745ecc4e65378fbe97f456228cafba84105b7e49
Reviewed-on: https://go-review.googlesource.com/c/tools/+/416880
Auto-Submit: Alan Donovan <adonovan@google.com>
Run-TryBot: Alan Donovan <adonovan@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
gopls-CI: kokoro <noreply+kokoro@google.com>
Reviewed-by: Robert Findley <rfindley@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gopls/performance gopls Tools
Projects
None yet
Development

No branches or pull requests

5 participants