Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: occasional "failed to cache compiled Go files" when 'go list -compiled' runs in parallel #29667

Closed
cespare opened this issue Jan 10, 2019 · 18 comments

Comments

@cespare
Copy link
Contributor

commented Jan 10, 2019

What version of Go are you using (go version)?

Go 1.11.4

Does this issue reproduce with the latest release?

It does not reproduce with Go 1.12beta1. It reproduces at tip at a somewhat lower rate. See @dominikh's comment, below.

What operating system and processor architecture are you using (go env)?

linux/amd64

What did you do?

@dominikh wrote a small reproducer: https://play.golang.org/p/ghOtMwBnGu7

It may take a few runs to fail. Clean out your build cache or use GOCACHE=/tmp/empty to make it more reliably fail.

What did you expect to see?

Calling go list in parallel should succeed.

What did you see instead?

go build <some package>: failed to cache compiled Go files

I think this happens only when -compiled is provided to go list (which is new in Go 1.11?)

The context in which this comes up is multiple tools running golang.org/x/tools/go/packages.Load at the same time. I believe the issue is in go list but possibly the bug (or at least the workaround) should be in golang.org/x/tools/go/packages.

I also believe this is much more likely to happen with an empty GOCACHE. We've mainly been seeing this our CI environment. github.com/google/wire has also been seeing this in Travis: google/wire#66.

The effects of this issue tend to be pretty far away from the go list invocation. We've run into it via staticcheck (http://staticcheck.io/) as well as our own go/packages-using code.

I can't get this to reproduce in 1.12beta1. It would be helpful if someone can find the CL that fixed it. I think it would also be valuable if the fix could be backported to Go 1.11 in the meantime. (It does reproduce at tip.)

The ecosystem is rapidly converging on the new, go list-based way of writing tooling (typically via go/packages) so I expect this will arise often.

/cc @bcmills @dominikh @zombiezen @ianthehat

@cespare

This comment has been minimized.

Copy link
Contributor Author

commented Jan 10, 2019

Note that while this bug resembles #26794, it occurs when modules aren't involved (GOPATH mode).

(I haven't tried to reproduce it in conjunction with modules.)

@bcmills bcmills modified the milestones: Go1.12, Go1.11.5 Jan 11, 2019

@bcmills

This comment has been minimized.

Copy link
Member

commented Jan 11, 2019

I can't get this to reproduce in 1.12beta1.

Nor can I, and I really have no idea how it would have been fixed.

If someone wants to bisect we can at least consider a backport, but given that we can't reproduce it in 1.12, and that it occurs only rarely (and under unusual load) in 1.11, I can't justify putting a lot of effort into tracking it down.

@dominikh

This comment has been minimized.

Copy link
Member

commented Jan 11, 2019

I'll bisect it.

@dominikh

This comment has been minimized.

Copy link
Member

commented Jan 11, 2019

Actually, there is no need to bisect it. The issue still occurs as of go version devel +a2bb68de4d Fri Jan 11 16:08:59 2019 +0000 linux/amd64. When GOCACHE is on a tmpfs, the error occurs at a rate of roughly 1% on my system. It is easier to hit the issue on slower media, at a rate of anywhere from 53% to 93% on an NFS mount, which isn't very surprising if we assume that this is a race condition. It seems that 1.11.4 has vastly more instances of this error at a rate of 80% when on NFS, versus tip at 52%

@bcmills bcmills modified the milestones: Go1.11.5, Go1.13 Jan 11, 2019

@bcmills

This comment has been minimized.

Copy link
Member

commented Jan 11, 2019

Good to know — thanks for the investigation!

CC @jayconrod

@jayconrod jayconrod added the GoCommand label Jan 11, 2019

@jayconrod jayconrod self-assigned this Jan 11, 2019

@cespare

This comment has been minimized.

Copy link
Contributor Author

commented Jan 11, 2019

@dominikh thanks for that info. I updated the original description too.

@fho

This comment has been minimized.

Copy link

commented Jan 15, 2019

We are using Go 1.11.2 in our CI system and running parallel jobs that are accessing the same GOCACHE dir by the command:

go list -e -json -compiled -test=false -export=false -deps=true -- ./...

Recently we implemented to boot additional CI workers on demand. The worker are started with an empty GOCACHE directory.
Since this change we run into the failed to cache compiled Go files multiples times a day.

mariash added a commit to cloudfoundry/diego-release that referenced this issue Jan 18, 2019

fix for GOCACHE failing in parallel
golang/go#29667

Co-authored-by: Maria Shaldibina <mshaldibina@pivotal.io>
@cespare

This comment has been minimized.

Copy link
Contributor Author

commented Feb 2, 2019

This bug has been biting us a lot as we run more and more tools in CI environments which use go/packages. In case it helps other people, the workaround we recently started using is a shim binary that passes its arguments to the go tool. If it sees that it is being called as go list, it uses a global lock file to prevent simultaneous go list executions. In CI and script environments where the workaround is needed, the fake go is put in a directory at the front of the $PATH.

I also looked into writing a GOPACKAGESDRIVER tool which would apply the same lockfile-based mutual exclusion around go list but it looked like that would require copy-pasting a lot of code out of go/packages.

@nmiyake

This comment has been minimized.

Copy link

commented Mar 8, 2019

I'm still seeing this failure even after upgrading to Go 1.12 -- we're encountering this error when running checks in parallel in a CI environment. The CI environment uses Go 1.12 and the checks are compiled with Go 1.12 as well. The projects use GOPATH mode (it is not a module project).

Is there anything that we can do to help with the investigation or fix for this issue?

@jayconrod

This comment has been minimized.

Copy link
Contributor

commented Mar 8, 2019

@nmiyake If you could describe your CI environment file system, operating system, architecture and any other platform information relevant to file locking, that would be a useful data point.

@nmiyake

This comment has been minimized.

Copy link

commented Mar 8, 2019

This is on an enterprise deployment of CircleCI 2. The Docker image used to run the CI is the official golang:1.12 image.

Output of go env:

root@1965526c4de7:/go# go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build212534544=/tmp/go-build -gno-record-gcc-switches"
@viveklak

This comment has been minimized.

Copy link

commented Mar 19, 2019

Also seeing this locally as well as on CircleCI (cloud hosted offering). Running on official Go 1.12.1 image.

@ConradKurth

This comment has been minimized.

Copy link

commented May 24, 2019

Having this same problem. Our CI/CD is running golangci-lint, which underlying is the go list. Strange part is when I ssh into the box, i can run it fine, but when the program runs on its normal workflow it works

env is

GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/go/src/github.com/HeadspaceMeditation/go-service-template/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build389910482=/tmp/go-build -gno-record-gcc-switches"
@jwm

This comment has been minimized.

Copy link

commented Jun 11, 2019

We're seeing this with go 1.11.5 and 1.12.5 (both the official Docker images).

After reading @cespare's comment about putting a mutex on go list, we added this go wrapper to our build process, which works around the problem nicely, letting us perform parallel builds.

#!/bin/sh

set -e

for i in "$@"; do
	if [ "$i" = list ]; then
		set -o noclobber
		while ! echo "$$" >/tmp/go-list-lock; do
			sleep 0.25
		done
		trap 'rm -f /tmp/go-list-lock' EXIT
	fi
done

/usr/local/go/bin/go "$@"
@bcmills

This comment has been minimized.

Copy link
Member

commented Jun 28, 2019

@andybons andybons modified the milestones: Go1.13, Go1.14 Jul 8, 2019

@gopherbot

This comment has been minimized.

Copy link

commented Jul 30, 2019

Change https://golang.org/cl/188157 mentions this issue: cmd/go/internal/cache: avoid ioutil.WriteFile for writing cache entries

@gopherbot

This comment has been minimized.

Copy link

commented Jul 30, 2019

Change https://golang.org/cl/188020 mentions this issue: cmd/go/internal/cache: verify that timestamp is positive

@gopherbot

This comment has been minimized.

Copy link

commented Jul 30, 2019

Change https://golang.org/cl/188021 mentions this issue: cmd/go/internal/cache: include detail in errors from (*cache).Get

@gopherbot gopherbot closed this in 2fc7574 Jul 30, 2019

gopherbot pushed a commit that referenced this issue Jul 30, 2019

cmd/go/internal/cache: verify that timestamp is positive
An apparent typo caused us to re-check size instead.

Updates #29667

Change-Id: Icc1fb9cef1ba77b8490c8fbe6c343be06ff48d8d
Reviewed-on: https://go-review.googlesource.com/c/go/+/188020
Run-TryBot: Bryan C. Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Jay Conrod <jayconrod@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.