Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: TLS handshake timeout running in qemu #50584

Open
ppenguin opened this issue Jan 12, 2022 · 12 comments
Open

cmd/go: TLS handshake timeout running in qemu #50584

ppenguin opened this issue Jan 12, 2022 · 12 comments

Comments

@ppenguin
Copy link

@ppenguin ppenguin commented Jan 12, 2022

What version of Go are you using (go version)?

$ go version
go version go1.17.2 linux/arm
# in a build container based on debian stretch

Does this issue reproduce with the latest release?

untested

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="arm"
GOBIN=""
GOCACHE="/builds/r/rcnode/.gocache-arm/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="arm"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/builds/r/rcnode/.gomodcache-arm/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/builds/r/rcnode/.go-arm"
GOPRIVATE=""
GOPROXY="https://goproxy.io,https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_arm"
GOVCS=""
GOVERSION="go1.17.2"
GCCGO="gccgo"
GOARM="6"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -marm -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build3509342896=/tmp/go-build -gno-record-gcc-switches"

(Note the directories are in a container, and /builds is mounted into it from the project dir.)

What did you do?

I built a program with a largish number of dependencies.

The build consistently fails during getting these with go get -v -u ./..., notably mostly on

cloud.google.com/go/storage@v1.14.0: Get "https://goproxy.io/cloud.google.com/go/storage/@v/v1.14.0.mod": net/http: TLS handshake timeout

and

pkg/config/config.go:12:2: gopkg.in/yaml.v2@v2.2.1: Get "https://goproxy.io/gopkg.in/yaml.v2/@v/v2.2.1.zip": net/http: TLS handshake timeout

It is suspicious that this only happens with arm32 builds (likely only with with CGO=1), and it can indeed be seen that the qemu container process has a high load before go get fails.

My suspicion is that go get can't find a binary package for arm32 arch and starts building it automatically, which is of course good and expected behaviour. However: it seems that go get wrongly applies a networking timeout to this situation, where it should handle package builds from source with a different timeout (these usually take much longer, especially under qemu, which should be allowed).

What did you expect to see?

A successful build, independent of architecture. (As a reference: other non-native builds, e.g. arm64, in a container which is built in the same way except for the arch, do not have this issue, but they are also significantly faster, which leads to the suspicion that the issue is indeed the application of an unreasonable timeout).

What did you see instead?

Networking timeouts during go get, where the bottleneck appears to be not the network but a local package build (high processor load).

@ppenguin ppenguin changed the title affected/package: cmd/go affected/package: cmd/go (consistent networking timeout during go get for certain packages in qemu container build, suspected bug in timeout behaviour) Jan 12, 2022
@seankhliao
Copy link
Contributor

@seankhliao seankhliao commented Jan 12, 2022

what if you only download without building? -d

@ppenguin
Copy link
Author

@ppenguin ppenguin commented Jan 13, 2022

I tried with -d, and still see a high processor load (specifically qemu-arm-static /usr/local/go/bin/go mod tidy and later qemu-arm-static /usr/local/go/bin/go install -v) during go: downloading ..., during which I get the same mentioned net/http: TLS handshake timeout messages as before.

I guess the reason it's building in the first place is because for my arch (arm) the/some binary packages are not cached, so there wouldn't really be a choice to "not build" anyway.

@seankhliao
Copy link
Contributor

@seankhliao seankhliao commented Jan 13, 2022

go get -d ./... will only resolve and download the dependencies without building. If it still fails there, then it's a network error.
There are no such things as binary packages (anymore) all dependencies are downloaded as source and compiled locally.
Also, does it still occur if you use a different GOPROXY?

@ppenguin
Copy link
Author

@ppenguin ppenguin commented Jan 13, 2022

Ah, thanks for clarifying.
If I do a vanilla go get -v -d ./... without anything else, I still get those timeouts.

What I don't understand is the high processor load of the parallel qemu-arm-static /usr/local/go/bin/go get -v -d ./... and qemu-arm-static /usr/local/go/bin/go install -v processes if they are "just" downloading... (Also: keep in mind that this only happens for my non-native arm container build, which is by itself not network related?)

I have tried with different values for GOPROXY, (GOPROXY=direct to GOPROXY=https://goproxy.io,https://proxy.golang.org,direct and all variations in between), the same packages fail in all cases, but with a different message:

  • for GOPROXY=direct:
    github.com/go-delve/delve/pkg/config imports
         gopkg.in/yaml.v2: unrecognized import path "gopkg.in/yaml.v2": https fetch: Get "https://gopkg.in/yaml.v2?go-get=1": net/http: TLS handshake timeout
    
  • for other GOPROXY:
    github.com/go-delve/delve/pkg/config imports
         gopkg.in/yaml.v2: gopkg.in/yaml.v2@v2.2.1: Get "https://proxy.golang.org/gopkg.in/yaml.v2/@v/v2.2.1.zip": net/http: TLS handshake timeout
    

@seankhliao seankhliao changed the title affected/package: cmd/go (consistent networking timeout during go get for certain packages in qemu container build, suspected bug in timeout behaviour) cmd/go: TLS handshake timeout running in qemu Jan 13, 2022
@seankhliao
Copy link
Contributor

@seankhliao seankhliao commented Jan 13, 2022

go install does compile, but I haven't asked you to do that.
You say parallel, what do you mean by that?
How much resources are these processes given?

@ppenguin
Copy link
Author

@ppenguin ppenguin commented Jan 13, 2022

go install appears to be triggered by go get in more involved cases.
go get appears to behave exactly the same in terms of processor load as go mod tidy and go install processes I have observed for different variations of (part of) the container builds I have been testing on.

For go get specifically (in the simpler case where I did go get -d ./... in a clone of this one inside an arm build container) the load is between 30-50% (according to htop) during about 1m30s on 24 cores, or in summary around "800% sustained CPU load" during this time (each process qemu-arm-static /usr/local/go/bin/go get -d ./...):

root@4527a4c64888:/builds# time go get -d ./...
go: downloading gopkg.in/yaml.v2 v2.2.1
go: downloading golang.org/x/arch v0.0.0-20190927153633-4e8777c89be4
go: downloading go.starlark.net v0.0.0-20200821142938-949cc6f4b097
go: downloading golang.org/x/sys v0.0.0-20200625212154-ddb9806d33ae
github.com/go-delve/delve/pkg/config imports
        gopkg.in/yaml.v2: gopkg.in/yaml.v2@v2.2.1: Get "https://goproxy.io/gopkg.in/yaml.v2/@v/v2.2.1.zip": net/http: TLS handshake timeout
github.com/go-delve/delve/pkg/proc imports
        golang.org/x/arch/arm/armasm: golang.org/x/arch@v0.0.0-20190927153633-4e8777c89be4: Get "https://goproxy.io/golang.org/x/arch/@v/v0.0.0-20190927153633-4e8777c89be4.zip": net/http: TLS handshake timeout
github.com/go-delve/delve/pkg/proc imports
        golang.org/x/arch/arm64/arm64asm: golang.org/x/arch@v0.0.0-20190927153633-4e8777c89be4: Get "https://goproxy.io/golang.org/x/arch/@v/v0.0.0-20190927153633-4e8777c89be4.zip": net/http: TLS handshake timeout
github.com/go-delve/delve/pkg/proc imports
        golang.org/x/arch/x86/x86asm: golang.org/x/arch@v0.0.0-20190927153633-4e8777c89be4: Get "https://goproxy.io/golang.org/x/arch/@v/v0.0.0-20190927153633-4e8777c89be4.zip": net/http: TLS handshake timeout
github.com/go-delve/delve/pkg/proc/native imports
        golang.org/x/sys/unix: golang.org/x/sys@v0.0.0-20200625212154-ddb9806d33ae: Get "https://goproxy.io/golang.org/x/sys/@v/v0.0.0-20200625212154-ddb9806d33ae.zip": net/http: TLS handshake timeout
github.com/go-delve/delve/pkg/terminal/starbind imports
        go.starlark.net/resolve: go.starlark.net@v0.0.0-20200821142938-949cc6f4b097: Get "https://goproxy.io/go.starlark.net/@v/v0.0.0-20200821142938-949cc6f4b097.zip": net/http: TLS handshake timeout
github.com/go-delve/delve/pkg/terminal/starbind imports
        go.starlark.net/starlark: go.starlark.net@v0.0.0-20200821142938-949cc6f4b097: Get "https://goproxy.io/go.starlark.net/@v/v0.0.0-20200821142938-949cc6f4b097.zip": net/http: TLS handshake timeout
github.com/go-delve/delve/pkg/terminal/starbind imports
        go.starlark.net/syntax: go.starlark.net@v0.0.0-20200821142938-949cc6f4b097: Get "https://goproxy.io/go.starlark.net/@v/v0.0.0-20200821142938-949cc6f4b097.zip": net/http: TLS handshake timeout

real    1m26.730s
user    1m27.045s
sys     9m10.221s

That seems pretty fishy if it would be only network related...

@bcmills
Copy link
Member

@bcmills bcmills commented Jan 13, 2022

Note that we currently lack a QEMU builder (#1508), and per https://go.dev/wiki/PortingPolicy, supported platforms require builders. (User-mode QEMU is not supported, although it may work in some cases.)

Does this failure mode also reproduce on physical ARM hardware?

@ppenguin
Copy link
Author

@ppenguin ppenguin commented Jan 13, 2022

I only have "semi-embedded" arm hardware (which are my targets), on which I can't easily run full builds.

Interestingly though, the same repo with the same Makefile and the same build-image (arm) does build in a gitlab CI. I'm even using the rest of my go env (incl. e.g. GOMODCACHE) in the same way in my CI and for the local dev build, and the runner-host is also very similar to my dev host (both nixos based). So I'm really at a loss what is causing this...

I managed to consistently build relatively complex go arm binaries on that gitlab-runner with container builds, but they also consistently fail with the mentioned errors with "local" container builds.

For reference: for amd64, arm64, arm the duration of the CI is respectively a few minutes for a*64 vs. approx. >30min for arm. Sometimes (luckily seldom) it even times out above 90min.

@ppenguin
Copy link
Author

@ppenguin ppenguin commented Jan 13, 2022

@bcmills oh wait, I just checked and since linux/arm is a first class port, and I'm building within an qemu-arm container, isn't that completely transparent in that case (i.e. as if I'm running on arm hardware)? FYI: my build-image contains a linux/arm binary of go itself, so considering a "cross-builder" in this context would be moot? Or am I missing something?

@bcmills
Copy link
Member

@bcmills bcmills commented Jan 13, 2022

@ppenguin, QEMU sometimes differs in substantial ways from a native kernel running on native hardware (see, for example, #33746, #42080, #44572, #50188).

@ppenguin
Copy link
Author

@ppenguin ppenguin commented Jan 13, 2022

@bcmills thanks for the info, I guess I'm lucky that I'm getting away with it pretty ok, except for this strange issue (though note that my CI is also qemu-arm and works). The image is "native" pretty far down (down to ld and libc), but indeed anything directly kernel related would be different...

I'm wondering though whether it could make sense to somehow set the network timeout for go get to a much higher value, and then check whether the error disappears. That would at least eliminate the uncertainty whether (slow emulated) pkg build could be the actual cause of the network timeouts.

@jauderho
Copy link

@jauderho jauderho commented Jan 20, 2022

Commenting here since my issue was closed. linux/ppc64le is not a first class port AFAIK and neither is linux/s390x. But it seems strange to me that I only ever see issues with ppc64le but not s390x.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants