Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: fatal error "runqsteal: runq overflow" #39518

Closed
shawn-xdji opened this issue Jun 11, 2020 · 6 comments
Closed

runtime: fatal error "runqsteal: runq overflow" #39518

shawn-xdji opened this issue Jun 11, 2020 · 6 comments
Milestone

Comments

@shawn-xdji
Copy link
Contributor

@shawn-xdji shawn-xdji commented Jun 11, 2020

What version of Go are you using (go version)?

$ go version

synced to 7b872b6d955d3e749ea62dbfced68ab5c61eae91

Does this issue reproduce with the latest release?

Seems to be random issue, not able to reproduce locally, it's observed in several attempts which ran all the benchmarks of 'std' package only.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env


GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/xiaji01/.cache/go-build"
GOENV="/home/xiaji01/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/xiaji01/.go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/xiaji01/.go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/xiaji01/util/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/xiaji01/util/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/xiaji01/util/go/src/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build372041418=/tmp/go-build -gno-record-gcc-switches"

What did you do?

run "go test -count 3 -timeout 180m -run=^$ -bench=. std" under GOROOT/src

What did you expect to see?

No errors and crashes.

What did you see instead?

A crash, here is part of the screenshot, full log is attached.

BenchmarkReset-32 317536 3796 ns/op
BenchmarkReset-32 306330 3622 ns/op
BenchmarkReset-32 323361 3759 ns/op
BenchmarkSleep-32 1203 843502 ns/op
BenchmarkSleep-32 1442 827265 ns/op
BenchmarkSleep-32 fatal error: runqsteal: runq overflow

runtime stack:
runtime.throw(0x5e822b, 0x18)
/home/ent-user/ci-scripts/golang/src/runtime/panic.go:1116 +0x72
runtime.runqsteal(0xc000096000, 0xc000091000, 0x0, 0x0)
/home/ent-user/ci-scripts/golang/src/runtime/proc.go:5300 +0xe5
runtime.findrunnable(0xc000096000, 0x0)
/home/ent-user/ci-scripts/golang/src/runtime/proc.go:2227 +0xf7
runtime.schedule()
/home/ent-user/ci-scripts/golang/src/runtime/proc.go:2636 +0x2d7
runtime.goexit0(0xc004f40780)
/home/ent-user/ci-scripts/golang/src/runtime/proc.go:2963 +0x1d6
runtime.mcall(0x0)
/home/ent-user/ci-scripts/golang/src/runtime/asm_amd64.s:318 +0x5b

@shawn-xdji
Copy link
Contributor Author

@shawn-xdji shawn-xdji commented Jun 11, 2020

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jun 11, 2020

There are certainly a lot of goroutines running Benchmarksleep.func1.1. I count 9743 in the crash dump. And it looks like your system has 32 CPUs (or GOMAXPROCS is set to 32): is that correct?

I have not been able to reproduce the problem. But then the number of goroutines that run on my system is not as high:

BenchmarkSleep
BenchmarkSleep-32                    	    7806	    184198 ns/op
BenchmarkSleep-32                    	    6654	    177551 ns/op
BenchmarkSleep-32                    	    8204	    173919 ns/op

Still, I don't see how this can happen. Can you reproduce it by running just the time package benchmarks, or does it only happen when running all the standard library benchmarks?

@ianlancetaylor ianlancetaylor added this to the Go1.15 milestone Jun 11, 2020
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jun 11, 2020

@shawn-xdji
Copy link
Contributor Author

@shawn-xdji shawn-xdji commented Jun 11, 2020

@ianlancetaylor Yes, the machine has 32 physical cores and 32 threads, so far it's only observed when running all the benchmarks of the standard libraries. I failed to reproduce it locally, but my local machine is of different model, let me try if I could narrow down it.

@shawn-xdji
Copy link
Contributor Author

@shawn-xdji shawn-xdji commented Jun 16, 2020

Hi @ianlancetaylor I'm afraid it's a transient breakage and no longer reproducible, would suggest to close the issue. Thanks.

@shawn-xdji shawn-xdji closed this Jun 16, 2020
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jun 17, 2020

Thanks for following up on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.