Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: divide-by-zero panic in (*randomOrder).start #49689

Closed
bcmills opened this issue Nov 19, 2021 · 7 comments
Closed

runtime: divide-by-zero panic in (*randomOrder).start #49689

bcmills opened this issue Nov 19, 2021 · 7 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@bcmills
Copy link
Member

bcmills commented Nov 19, 2021

panic: runtime.errorString("integer divide by zero")
fatal error: panic on system stack

runtime stack:
runtime.throw({0x667014?, 0x7aa3f0?})
	/tmp/workdir/go/src/runtime/panic.go:992 +0x71 fp=0x7fffde1eec70 sp=0x7fffde1eec40 pc=0x478531
panic({0x638720, 0x7aa3f0})
	/tmp/workdir/go/src/runtime/panic.go:764 +0x6f0 fp=0x7fffde1eed30 sp=0x7fffde1eec70 pc=0x478430
runtime.panicdivide()
	/tmp/workdir/go/src/runtime/panic.go:199 +0x45 fp=0x7fffde1eed50 sp=0x7fffde1eed30 pc=0x476805
runtime.(*randomOrder).start(...)
	/tmp/workdir/go/src/runtime/proc.go:6313
runtime.stealWork(0x7fffde1eee48?)
	/tmp/workdir/go/src/runtime/proc.go:3018 +0x357 fp=0x7fffde1eedd0 sp=0x7fffde1eed50 pc=0x480877
runtime.findrunnable()
	/tmp/workdir/go/src/runtime/proc.go:2773 +0x20c fp=0x7fffde1eeec0 sp=0x7fffde1eedd0 pc=0x47fccc
runtime.schedule()
	/tmp/workdir/go/src/runtime/proc.go:3361 +0x239 fp=0x7fffde1eef08 sp=0x7fffde1eeec0 pc=0x481239
runtime.park_m(0xc0001831e0?)
	/tmp/workdir/go/src/runtime/proc.go:3510 +0x14d fp=0x7fffde1eef38 sp=0x7fffde1eef08 pc=0x48176d
runtime.mcall()
	/tmp/workdir/go/src/runtime/asm_amd64.s:433 +0x43 fp=0x7fffde1eef48 sp=0x7fffde1eef38 pc=0x4a8743

greplogs --dashboard -md -l -e \(\?m\)runtime\\.panicdivide.\*\\n\\t.\*\\nruntime\\.\\\(\\\*randomOrder --since=2021-01-01

2021-11-12T03:54:29-ea63613-b954f58/freebsd-amd64-race
2021-11-10T05:08:25-51b60fd-17980df/freebsd-amd64-race
2021-11-09T22:14:19-cb908f1/freebsd-amd64-race
2021-11-04T15:43:59-036812b-6ba68a0/freebsd-amd64-race
2021-09-16T23:50:23-1a7ca93-4efdaa7/freebsd-amd64-race
2021-06-22T11:31:57-d040287-63daa77/freebsd-amd64-race
2021-04-08T02:17:19-89ca1ce/plan9-arm

@bcmills
Copy link
Member Author

bcmills commented Nov 19, 2021

This panic has mostly been observed on the freebsd-amd64-race builder, but the line where the panic occurs is not freebsd-specific.

Marking as release-blocker at least until we can determine whether other platforms are affected.

(CC @jeremyfaller)

@bcmills bcmills added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker labels Nov 19, 2021
@bcmills bcmills added this to the Go1.18 milestone Nov 19, 2021
@ianlancetaylor
Copy link
Contributor

ianlancetaylor commented Nov 19, 2021

This suggests that something has somehow zeroed out runtime.stealOrder.count. That should be impossible.

@jeremyfaller
Copy link
Contributor

jeremyfaller commented Nov 19, 2021

CCing @prattmic because he's seeing zeroing in #49453

@mknyszek
Copy link
Contributor

mknyszek commented Nov 30, 2021

Maybe a new failure mode of #46272?

@aclements
Copy link
Member

aclements commented Dec 1, 2021

2021-11-12T03:54:29-ea63613-b954f58 is from release-branch.go1.17, so I think this doesn't actually qualify as a release-blocker because it's not new. The other failures are from the master branch. We also haven't seen any failures since the 2021-11-12T03:54:29-ea63613-b954f58 one.

It's really weird that most of the failures happened in a cluster over a few days, but on both the 1.17 and master branches. This suggests an external factor is involved.

5 of the failures are in cmd/go. One is in the cmd/fix tests. Apropos of nothing, cmd/go does a ton of forking, and it turns out the cmd/fix tests do some forking, too.

Because this is a race build, it requires cgo, so to build the binary:

git checkout b954f58e9db73853b05839363b3fbe4d1d0d8f54
VM=$(gomote create freebsd-amd64-race)
gomote push $VM
gomote run $VM go/src/make.bash
gomote run $VM go/bin/go build -race -o /tmp/workdir/x cmd/go
gomote run $VM /bin/cat x > /tmp/x

There's nothing subtle in the machine code:

MOVL	runtime.stealOrder(SB), R8
...
TESTL	R8, R8
JE	runtime.stealWork+0x445(SB)
...
0x445:
CALL	runtime.panicdivide(SB)

The most likely explanation is that runtime.stealOrder.count really is 0. In theory this should only happen very early in runtime bootstrap, and we're clearly well past runtime bootstrap in all of the failures.

@ianlancetaylor
Copy link
Contributor

ianlancetaylor commented Jan 29, 2022

Does not seem to have happened since 2021-11-12. Optimistically closing.

@aclements
Copy link
Member

aclements commented Jan 31, 2022

I believe this was a dup of #46272, and just got missed when we were closing that batch of related issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

5 participants