Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: get of module already present in proxy can be slow #50158

Open
myitcv opened this issue Dec 14, 2021 · 11 comments
Open

cmd/go: get of module already present in proxy can be slow #50158

myitcv opened this issue Dec 14, 2021 · 11 comments

Comments

@myitcv
Copy link
Member

@myitcv myitcv commented Dec 14, 2021

What version of Go are you using (go version)?

$ go version
go version devel go1.18-d34051bf16 Thu Dec 2 07:04:05 2021 +0000 linux/arm64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="arm64"
GOBIN=""
GOCACHE="/home/myitcv/.cache/go-build"
GOENV="/home/myitcv/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="arm64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/myitcv/gostuff/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/myitcv/gostuff"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/myitcv/gos"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/myitcv/gos/pkg/tool/linux_arm64"
GOVCS=""
GOVERSION="devel go1.18-d34051bf16 Thu Dec 2 07:04:05 2021 +0000"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/tmp/tmp.1s97wmUiwJ/go.mod"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build99146596=/tmp/go-build -gno-record-gcc-switches"

What did you do?

$ go get -x cuelang.org/go@v0.4.1-beta.4
# get https://proxy.golang.org/cuelang.org/@v/v0.4.1-beta.4.info
go: downloading cuelang.org/go v0.4.1-beta.4
# get https://proxy.golang.org/cuelang.org/go/@v/v0.4.1-beta.4.zip
# get https://proxy.golang.org/cuelang.org/go/@v/v0.4.1-beta.4.zip: 200 OK (0.202s)
# get https://proxy.golang.org/sumdb/sum.golang.org/supported
# get https://proxy.golang.org/sumdb/sum.golang.org/supported: 410 Gone (0.009s)
# get https://sum.golang.org/lookup/cuelang.org/go@v0.4.1-beta.4
# get https://sum.golang.org/lookup/cuelang.org/go@v0.4.1-beta.4: 200 OK (0.073s)
# get https://sum.golang.org/tile/8/0/x032/326
# get https://sum.golang.org/tile/8/2/000.p/127
# get https://sum.golang.org/tile/8/1/127.p/228
# get https://sum.golang.org/tile/8/1/126
# get https://sum.golang.org/tile/8/0/x032/740.p/38
# get https://sum.golang.org/tile/8/2/000.p/127: 200 OK (0.021s)
# get https://sum.golang.org/tile/8/0/x032/326: 200 OK (0.022s)
# get https://sum.golang.org/tile/8/1/126: 200 OK (0.023s)
# get https://sum.golang.org/tile/8/1/127.p/228: 200 OK (0.025s)
# get https://sum.golang.org/tile/8/0/x032/740.p/38: 200 OK (0.029s)
# get https://sum.golang.org/tile/8/0/x032/033
# get https://sum.golang.org/tile/8/0/x032/033: 200 OK (0.039s)
# get https://proxy.golang.org/cuelang.org/@v/v0.4.1-beta.4.info: 410 Gone (2.429s)
# get https://proxy.golang.org/cuelang.org/go/@v/v0.4.1-beta.4.mod
# get https://proxy.golang.org/cuelang.org/go/@v/v0.4.1-beta.4.mod: 200 OK (0.027s)
**********************
# get https://proxy.golang.org/cuelang.org/go/@v/list
# get https://proxy.golang.org/cuelang.org/go/@v/list: 200 OK (0.042s)
go: added cuelang.org/go v0.4.1-beta.4
go: added golang.org/x/exp v0.0.0-20210126221216-84987778548c
go: added golang.org/x/mod v0.3.1-0.20200828183125-ce943fd02449
go: added golang.org/x/tools v0.0.0-20200612220849-54c614fe050c
go: added golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543

The command stalled at the line ********************** for a good 15-20 secs.

What did you expect to see?

  • No delay; the proxy already knows about this version
  • The -x logs to indicate where the delay occurred; the 15-20 secs delay I experienced is not accounted for above

What did you see instead?

As abve.

Noting a subsequent call had output as follows:

$ go get -x cuelang.org/go@v0.4.1-beta.4
# get https://proxy.golang.org/cuelang.org/@v/v0.4.1-beta.4.info
# get https://proxy.golang.org/cuelang.org/@v/v0.4.1-beta.4.info: 410 Gone (2.159s)
# get https://proxy.golang.org/cuelang.org/go/@v/list
# get https://proxy.golang.org/cuelang.org/go/@v/list: 200 OK (0.045s)

Here the go get command ultimately took as long as it did (~2.5 secs) because of the long response time to:

# get https://proxy.golang.org/cuelang.org/@v/v0.4.1-beta.4.info: 410 Gone (2.159s)

Given go get already by this point got a successful response from:

# get https://proxy.golang.org/cuelang.org/go/@v/list: 200 OK (0.045s)

could cmd/go not optimise to simply "drop" the other requests, given the most specific one returned successfully?

cc @bcmills for cmd/go and @heschi in case there is any proxy-related issues here

@bcmills
Copy link
Member

@bcmills bcmills commented Dec 14, 2021

Yes, in theory we could return as soon as we find the package at the most precise path. That short-circuit isn't optimized because the shorter paths are supposed to return quickly — so the question is, why did the proxy take 2+ seconds to tell you about a version that trivially doesn't exist?

@myitcv
Copy link
Member Author

@myitcv myitcv commented Dec 14, 2021

so the question is, why did the proxy take 2+ seconds to tell you about a version that trivially doesn't exist?

Indeed. Plus the question above about the 15-20 secs delay we can't account for in the -x output.

@heschi
Copy link
Contributor

@heschi heschi commented Dec 14, 2021

Checking if a version exists or not takes some measurable amount of time, and the proxy's implementation creates some overhead. I'm not sure what "trivially doesn't exist" means, but if we have to run a go command and it has to talk to an origin server, 2 seconds does not seem unreasonable to me.

@myitcv
Copy link
Member Author

@myitcv myitcv commented Dec 14, 2021

@heschi

if we have to run a go command and it has to talk to an origin server, 2 seconds does not seem unreasonable to me.

Why would a go command call be necessary here? Because the proxy can't be sure that it's previously cached answer of "nothing there" is still valid or not?

@bcmills
Copy link
Member

@bcmills bcmills commented Dec 14, 2021

FWIW, the go command invocation in this case completes in ~100ms:

$ GOPROXY=direct time go list -m cuelang.org@v0.4.1-beta.4
go: cuelang.org@v0.4.1-beta.4: unrecognized import path "cuelang.org": parsing cuelang.org: XML syntax error on line 44: expected /> in element
Command exited with non-zero status 1
real    0:00.10
user    0.02
sys     0.01
rss     20944k

So that seems like 2+ seconds for container setup? Which I guess seems ok, but given the previous trace I would have expected the negative result to already be present in the cache.

@bcmills
Copy link
Member

@bcmills bcmills commented Dec 14, 2021

Indeed. Plus the question above about the 15-20 secs delay we can't account for in the -x output.

Agreed. If that happens again, could you kill the go command with SIGQUIT and post the goroutine dump (or send it to me)?

@heschi
Copy link
Contributor

@heschi heschi commented Dec 14, 2021

Because the proxy can't be sure that it's previously cached answer of "nothing there" is still valid or not?

Yes.

real 0:00.10

I don't believe this was done with an empty module cache. For me it takes anywhere from 300ms to 1.5s. But yes, there is quite a bit overhead proportional to the required work for something this small.

The behavior of the proxy is well within my expectations, so I'm going to drop out of the discussion now.

@myitcv
Copy link
Member Author

@myitcv myitcv commented Dec 14, 2021

Thanks, Heschi. Very much appreciate the clarifications.

Because the proxy can't be sure that it's previously cached answer of "nothing there" is still valid or not?

Yes.

@bcmills I think this points towards there being some merit in the short-circuit described earlier?

@myitcv
Copy link
Member Author

@myitcv myitcv commented Dec 15, 2021

@bcmills

could you kill the go command with SIGQUIT and post the goroutine dump (or send it to me)?

https://gist.github.com/myitcv/b7939fcc511a007af00297ceb39847b3

I triggered this about 7 seconds after the last line of output shown (didn't want to wait too much longer in case I missed the opportunity). What's interesting is that in this time the precise path hasn't returned.

What's even more interesting is that these calls shouldn't be necessary because I have this module in my module cache.

@bcmills
Copy link
Member

@bcmills bcmills commented Dec 15, 2021

Hmm. There isn't much actually running in that goroutine dump — one in a read syscall via modfetch.(*dbClient).ReadCache, and what appears to be a whole cluster of goroutines waiting on that same tile.

Nothing seems to have been blocked for a long time, so it isn't obvious to me whether we're bottlenecked on reading the module graph one time (or reading checksums one time), or reloading the graph (or reloading the subdb tiles) repeatedly for some reason.

A trace file from --debug-trace=go.trace might be interesting to see, but it might just tell us some functions that need more detailing trace instrumentation. 😅

@myitcv myitcv changed the title cmd/go: get of module already present in proxy can be very slow cmd/go: get of module already present in proxy can be slow Dec 17, 2021
@myitcv
Copy link
Member Author

@myitcv myitcv commented Dec 17, 2021

@bcmills - an update from my side. Turns out some of the issues I have been seeing are related to local disk errors (thankfully not actual disk errors, just problems with a NVM disk and the Linux kernel I am using).

So I'm going to attribute the unexplained long delays (15-20secs) to those disk errors. This is reasonable and consistent with other behaviour I was able to reliably reproduce.

That leaves two issues to my mind (hence I've retitled the issue):

  1. why go get cuelang.org/go@v0.4.1-beta.4 in a new module is hitting the network at all when that exact version is available in my local module cache
  2. whether cmd/go get should short-circuit when it gets a response from a more precise module path (per discussion above). Because when the proxy hasn't seen a request for cuelang.org/go "for some time" it ends up hitting the remote VCS to check whether a module exists at the path cuelang.org, and the client's cmd/go instance blocks until that returns. This delay can be measured in seconds, and exacerbates point 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants