Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: cmd/compile/internal/gc.buildssa panic #38463

Open
howardjohn opened this issue Apr 15, 2020 · 19 comments
Open

cmd/compile: cmd/compile/internal/gc.buildssa panic #38463

howardjohn opened this issue Apr 15, 2020 · 19 comments

Comments

@howardjohn
Copy link

@howardjohn howardjohn commented Apr 15, 2020

What version of Go are you using (go version)?

$ go version
go version go1.14.2 linux/amd64

Does this issue reproduce with the latest release?

Yes, this was done on 1.14.2

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="on"
GOARCH="amd64"
GOBIN="/gobin"
GOCACHE="/gocache"
GOENV="/home/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build048816231=/tmp/go-build -gno-record-gcc-switches"

This was built inside a docker image: gcr.io/istio-testing/build-tools:master-2020-04-10T20-55-56.

What did you do?

Tried to build our application.

What did you expect to see?

Build succeeds

What did you see instead?

STATIC=0 GOOS=windows LDFLAGS='-extldflags -static -s -w' common/scripts/gobuild.sh /tmp/tmp.W9NXZv7L5L/build/work/src/istio.io/istio/out/linux_amd64/release/istioctl-win.exe ./istioctl/cmd/istioctl
go: downloading github.com/inconshreveable/mousetrap v1.0.0
# k8s.io/apimachinery/pkg/util/runtime
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x74 pc=0xce6f4f]
goroutine 8 [running]:
cmd/compile/internal/gc.buildssa(0x0, 0x3, 0x0)
	/usr/local/go/src/cmd/compile/internal/gc/ssa.go:297 +0xdf
cmd/compile/internal/gc.compileSSA(0x0, 0x3)
	/usr/local/go/src/cmd/compile/internal/gc/pgen.go:298 +0x5d
cmd/compile/internal/gc.compileFunctions.func2(0xc0000ef4a0, 0xc000018e10, 0x3)
	/usr/local/go/src/cmd/compile/internal/gc/pgen.go:363 +0x49
created by cmd/compile/internal/gc.compileFunctions
	/usr/local/go/src/cmd/compile/internal/gc/pgen.go:361 +0x128

Full CI log: https://prow.istio.io/view/gcs/istio-prow/pr-logs/pull/istio_istio/22958/release-test_istio/11141
Rerunning the job with the same code the build succeeds: https://prow.istio.io/view/gcs/istio-prow/pr-logs/pull/istio_istio/22958/release-test_istio/11153

The referenced gobuild.sh script is https://github.com/istio/istio/blob/master/common/scripts/gobuild.sh.

I think the fact its in the compile code path means its a golang bug and not an issue with our code, but maybe not. Let me know if more info is needed.

@randall77
Copy link
Contributor

@randall77 randall77 commented Apr 15, 2020

Definitely looks like a bug in our compiler. The pointer to the function to be compiled is nil.

Is this reproducible? Does it happen every time, or is it intermittent?

Did this happen in earlier versions of Go? (1.4.{0,1}, and 1.13.10 would be good ones to try if you haven't yet.)

I'll try to reproduce this myself.

@randall77
Copy link
Contributor

@randall77 randall77 commented Apr 15, 2020

How do you populate ./istioctl/cmd/istioctl? I don't see anything in the log that makes that directory tree.

@howardjohn
Copy link
Author

@howardjohn howardjohn commented Apr 15, 2020

@randall77 I have only see this a single time in our CI, and rerunning on the exact same job it did not occur. So its intermittent, I am not sure if its occurred more than once since we don't have a way to query through all of our test logs and I have only seen it once.

We upgrading our CI to go 1.14.2 yesterday and previously were using 1.13.10, so its possible its related to that change - I'll update this if we see this again.

The CI job is calling this make target: https://github.com/istio/istio/blob/e33092b5c59d1c13fb11d86ab6345f45194133d3/Makefile.core.mk#L403 which calls https://github.com/istio/istio/blob/e33092b5c59d1c13fb11d86ab6345f45194133d3/Makefile.core.mk#L385 which calls gobuild.sh ./istioctl/cmd/istioctl which calls go build ./istioctl/cmd/istioctl with some flags

@randall77
Copy link
Contributor

@randall77 randall77 commented Apr 15, 2020

Sorry, I was asking about the source code. Presumably there are some .go files in an istioctl/cmd/istioctl/ directory somewhere. Where can I get that?

If this is intermittent, it is going to be hard to debug. There aren't any obvious ways a nil pointer might make its way into a the compile queue.

@cuonglm
Copy link
Contributor

@cuonglm cuonglm commented Apr 15, 2020

@randall77 Maybe this's a sign for #33485

@howardjohn
Copy link
Author

@howardjohn howardjohn commented Apr 15, 2020

Ah I see. That would be https://github.com/istio/istio/tree/master/istioctl/cmd/istioctl. I'll keep an eye out for any re-occurrences as well, totally understand this isn't a lot of info to go on.

@randall77 randall77 modified the milestones: Unplanned, Go1.16 Aug 27, 2020
@mdempsky
Copy link
Member

@mdempsky mdempsky commented Aug 27, 2020

Based on my comments in #33485, I wonder if it's related to the call to dtypesym in gc/ssa.go? In particular, could calling dtypesym be causing functions to get pushed onto compilequeue, which would be bad since that slice is being concurrently accessed by the goroutine responsible for distributing functions to the worker goroutines?

Edit: Maybe not. There's already a check for Siggen just before the call to dtypesym.

@toothrot
Copy link
Contributor

@toothrot toothrot commented Sep 10, 2020

@randall77 @mdempsky Checking in on this as a 1.16 release blocker. Has anybody managed to reproduce this yet?

@randall77
Copy link
Contributor

@randall77 randall77 commented Sep 10, 2020

Nope. If I had, I would post it here.
Any ideas are welcome. This is something of a stumper.

@cuonglm
Copy link
Contributor

@cuonglm cuonglm commented Sep 12, 2020

This happens again in https://build.golang.org/log/e062e1dff6ab680d17246672205cecd010d98e83

##### ../misc/reboot
Building Go cmd/dist using /workdir/go. (devel 2c95e3a6a8377ca9c72608c25b4cf2506baf782f linux/amd64)
Building Go toolchain1 using /workdir/go.
# bootstrap/cmd/link/internal/riscv64
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x70 pc=0xabac1c]

goroutine 12 [running]:
cmd/compile/internal/gc.buildssa(0x0, 0x1, 0x0)
	/workdir/go/src/cmd/compile/internal/gc/ssa.go:315 +0xfc
cmd/compile/internal/gc.compileSSA(0x0, 0x1)
	/workdir/go/src/cmd/compile/internal/gc/pgen.go:317 +0x5d
cmd/compile/internal/gc.compileFunctions.func2(0xc000767380, 0xc0005054a0, 0x1)
	/workdir/go/src/cmd/compile/internal/gc/pgen.go:382 +0x4d
created by cmd/compile/internal/gc.compileFunctions
	/workdir/go/src/cmd/compile/internal/gc/pgen.go:380 +0x129
go tool dist: FAILED: /workdir/go/bin/go install -gcflags=-l -tags=math_big_pure_go compiler_bootstrap bootstrap/cmd/...: exit status 2
--- FAIL: TestRepeatBootstrap (17.64s)
    reboot_test.go:50: exit status 2
FAIL
@gopherbot
Copy link

@gopherbot gopherbot commented Sep 14, 2020

Change https://golang.org/cl/254617 mentions this issue: cmd/compile: make funccompile un-reentrant

gopherbot pushed a commit that referenced this issue Sep 15, 2020
Currently, there's awkward reentrancy issue with funccompile:

    funccompile -> compile -> dtypesym -> geneq/genhash/genwrapper -> funccompile

Though it's not a problem at this moment, some attempts by @mdempsky to
move order/walk/instrument into buildssa was failed, due to SSA cache
corruption.

This commit fixes that reentrancy issue, by making generated functions
to be pumped through the same compile workqueue that normal functions
are compiled. We do this by adding them to xtop, instead of calling
funccompile directly in geneq/genhash/genwrapper. In dumpdata, we look
for uncompiled functions in xtop instead of compilequeue, then finish
compiling them.

Updates #38463
Fixes #33485

Change-Id: Ic9f0ce45b56ae2ff3862f17fd979253ddc144bb5
Reviewed-on: https://go-review.googlesource.com/c/go/+/254617
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Keith Randall <khr@golang.org>
Trust: Cuong Manh Le <cuong.manhle.vn@gmail.com>
@randall77
Copy link
Contributor

@randall77 randall77 commented Sep 25, 2020

If this was caused by some general corruption problem in the compiler, it might be related to #41257.

@bancek
Copy link

@bancek bancek commented Oct 1, 2020

I've also triggered this with an Go 1.14.4 in CI (go build running in Docker). I was not able to reproduce this because rerunning fixed it.

$ go version
go version go1.14.4 linux/amd64

$ go build main.go
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x74 pc=0xce6fff]

goroutine 7 [running]:
cmd/compile/internal/gc.buildssa(0x0, 0x2, 0x0)
	/usr/local/go/src/cmd/compile/internal/gc/ssa.go:297 +0xdf
cmd/compile/internal/gc.compileSSA(0x0, 0x2)
	/usr/local/go/src/cmd/compile/internal/gc/pgen.go:298 +0x5d
cmd/compile/internal/gc.compileFunctions.func2(0xc007104240, 0xc0011ab390, 0x2)
	/usr/local/go/src/cmd/compile/internal/gc/pgen.go:363 +0x49
created by cmd/compile/internal/gc.compileFunctions
	/usr/local/go/src/cmd/compile/internal/gc/pgen.go:361 +0x128
@toothrot
Copy link
Contributor

@toothrot toothrot commented Oct 15, 2020

@randall77 Should we remove the release-blocker label if this is not reproducible?

@randall77
Copy link
Contributor

@randall77 randall77 commented Oct 15, 2020

@toothrot No, I think we've seen enough instances that this is definitely something worth figuring out.

@cuonglm
Copy link
Contributor

@cuonglm cuonglm commented Oct 16, 2020

@toothrot No, I think we've seen enough instances that this is definitely something worth figuring out.

Can we check if this still happens in builds after CL 254617 above?

@aclements
Copy link
Member

@aclements aclements commented Oct 29, 2020

@randall77 , would it be worth stress testing the compiler under the race detector to see if this is a race?

@randall77
Copy link
Contributor

@randall77 randall77 commented Oct 29, 2020

I've already tried running the compiler under the race detector and didn't get anything. I didn't really stress it though, maybe worth banging on it harder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
10 participants
You can’t perform that action at this time.