Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: test timeout with truncated stack trace on linux-ppc64le-power9osu #43175

bcmills opened this issue Dec 14, 2020 · 4 comments

runtime: test timeout with truncated stack trace on linux-ppc64le-power9osu #43175

bcmills opened this issue Dec 14, 2020 · 4 comments


Copy link

@bcmills bcmills commented Dec 14, 2020

This looks suspiciously like a deadlock in the runtime's stack-dumping code: the test times out after 3 minutes, but hasn't finished dumping stacks after 4m, and appears to have been interrupted in the middle of a GC.

CC @danscales @mknyszek @aclements


panic: test timed out after 3m0s

goroutine 92156 [running]:
panic(0x10209800, 0xc0003ce000)
	/workdir/go/src/runtime/panic.go:1060 +0x540 fp=0xc00052af58 sp=0xc00052ae78 pc=0x10039da0
	/workdir/go/src/testing/testing.go:1701 +0xcc fp=0xc00052afc0 sp=0xc00052af58 pc=0x100ef2dc
	/workdir/go/src/runtime/asm_ppc64x.s:869 +0x4 fp=0xc00052afc0 sp=0xc00052afc0 pc=0x100774f4
created by time.goFunc
	/workdir/go/src/time/sleep.go:169 +0x44

goroutine 1 [chan receive, 2 minutes, locked to thread]:
runtime.gopark(0x10277560, 0xc00016e118, 0x170e, 0x2)
	/workdir/go/src/runtime/proc.go:336 +0xec fp=0xc0000409b0 sp=0xc000040980 pc=0x1003cc9c
runtime.chanrecv(0xc00016e0c0, 0x0, 0xc00030e201, 0xc000080101)
	/workdir/go/src/runtime/chan.go:576 +0x388 fp=0xc000040a50 sp=0xc0000409b0 pc=0x10007ab8
runtime.chanrecv1(0xc00016e0c0, 0x0)
	/workdir/go/src/runtime/chan.go:439 +0x24 fp=0xc000040a90 sp=0xc000040a50 pc=0x100076d4
	/workdir/go/src/testing/testing.go:1160 +0x230 fp=0xc000040b90 sp=0xc000040a90 pc=0x100eee40
testing.tRunner(0xc000084480, 0xc000040cc8)
	/workdir/go/src/testing/testing.go:1198 +0x108 fp=0xc000040bf0 sp=0xc000040b90 pc=0x100ea8a8
testing.runTests(0xc0000a4048, 0x103eaee0, 0x15f, 0x15f, 0xbfedfe69b91e7fd5, 0x29e8e5d854, 0x10404ee0, 0x71ad96470680)
	/workdir/go/src/testing/testing.go:1510 +0x2b4 fp=0xc000040ce8 sp=0xc000040bf0 pc=0x100ec0e4
testing.(*M).Run(0xc0000d6000, 0x0)
	/workdir/go/src/testing/testing.go:1418 +0x1a0 fp=0xc000040e28 sp=0xc000040ce8 pc=0x100eb1b0
	/workdir/go/src/runtime/crash_test.go:27 +0x2c fp=0xc000040e88 sp=0xc000040e28 pc=0x1016579c
	_testmain.go:1245 +0x134 fp=0xc000040f50 sp=0xc000040e88 pc=0x101e7774
	/workdir/go/src/runtime/proc.go:225 +0x270 fp=0xc000040fc0 sp=0xc000040f50 pc=0x1003c830
	/workdir/go/src/runtime/asm_ppc64x.s:869 +0x4 fp=0xc000040fc0 sp=0xc000040fc0 pc=0x100774f4

goroutine 2 [force gc (idle), 2 minutes]:
runtime.gopark(0x10277918, 0x10404600, 0x1411, 0x1)
	/workdir/go/src/runtime/proc.go:336 +0xec fp=0xc000032f80 sp=0xc000032f50 pc=0x1003cc9c
	/workdir/go/src/runtime/proc.go:276 +0xcc fp=0xc000032fc0 sp=0xc000032f80 pc=0x1003cb0c
	/workdir/go/src/runtime/asm_ppc64x.s:869 +0x4 fp=0xc000032fc0 sp=0xc000032fc0 pc=0x100774f4
created by runtime.init.5
	/workdir/go/src/runtime/proc.go:264 +0x34

goroutine 3 [GC sweep wait]:
runtime.gopark(0x10277918, 0x10404aa0, 0x140c, 0x1)
	/workdir/go/src/runtime/proc.go:336 +0xec fp=0xc000033778 sp=0xc000033748 pc=0x1003cc9c
	/workdir/go/src/runtime/mgcsweep.go:182 +0x188 fp=0xc0000337b8 sp=0xc000033778 pc=0x10027028
	/workdir/go/src/runtime/asm_ppc64x.s:869 +0x4 fp=0xc0000337b8 sp=0xc0000337b8 pc=0x100774f4
created by runtime.gcenable
	/workdir/go/src/runtime/mgc.go:217 +0x58

goroutine 4 [GC scavenge wait]:
runtime.gopark(0x10277918, 0x10404be0, 0x140d, 0x1)
	/workdir/go/src/runtime/proc.go:336 +0xec fp=0xc000041f38 sp=0xc000041f08 pc=0x1003cc9c
	/workdir/go/src/runtime/mgcscavenge.go:314 +0x29c fp=0xc000041fb8 sp=0xc000041f38 pc=0x1002535c
	/workdir/go/src/runtime/asm_ppc64x.s:869 +0x4 fp=0xc000041fb8 sp=0xc000041fb8 pc=0x100774f4
created by runtime.gcenable
	/workdir/go/src/runtime/mgc.go:218 +0x78

goroutine 18 [finalizer wait]:
runtime.gopark(0x10277918, 0x10438350, 0xc000011410, 0x1)
	/workdir/go/src/runtime/proc.go:336 +0xec fp=0xc000270f18 sp=0xc000270ee8 pc=0x1003cc9c
	/workdir/go/src/runtime/mfinal.go:175 +0xc8 fp=0xc000270fc0 sp=0xc000270f18 pc=0x1001c158
	/workdir/go/src/runtime/asm_ppc64x.s:869 +0x4 fp=0xc000270fc0 sp=0xc000270fc0 pc=0x100774f4
created by runtime.createfing
	/workdir/go/src/runtime/mfinal.go:156 +0x90

goroutine 161 [GC worker (idle), 2 minutes]:
runtime.gopark(0x10277670, 0xc00001c080, 0x10071418, 0x0)
	/workdir/go/src/runtime/proc.go:336 +0xec fp=0xc000034718 sp=0xc0000346e8 pc=0x1003cc9c
	/workdir/go/src/runtime/mgc.go:1911 +0xfc fp=0xc0000347c0 sp=0xc000034718 pc=0x1001ff1c
	/workdir/go/src/runtime/asm_ppc64x.s:869 +0x4 fp=0xc0000347c0 sp=0xc0000347c0 pc=0x100774f4
created by runtime.gcBgMarkStartWorkers
	/workdir/go/src/runtime/mgc.go:1835 +0x38

goroutine 92151 [runnable]:
syscall.Syscall(0x3, 0xa, 0xc0012c6071, *** Test killed: ran too long (4m0s).
@bcmills bcmills added this to the Backlog milestone Dec 14, 2020
Copy link

@mknyszek mknyszek commented Dec 14, 2020

Unfortunately there is a known possibility for deadlock in the stack-dumping code. (See #42062 (comment).)

The static lock checker has known about this for a while. @prattmic has a fix which is basically to say "if we're going down anyway, it's probably OK to just race; better than a deadlock, anyway." (

I think this is a duplicate of #42669 as a result.


Copy link
Member Author

@bcmills bcmills commented Dec 14, 2020

Ah, so now we have a new data point for this part of the comment!

I don't think in practice (2) even hurts diagnostics for us because in the worst case it causes a deadlock, and eventually we'll try to dump everything much more forcefully without acquiring locks (e.g. someone sends the process a SIGABRT) since we'll freeze the world (or try and dump it all anyway... maybe this is only true with GOTRACEBACK=crash).

I think this is a case where the deadlock does hurt diagnostics for us: the test process itself is not in a position to dump everything more forcefully, and the go command sends SIGKILL (via cmd.Process.Kill), not SIGABRT or any other signal that the runtime can handle. (It assumes that if the test process has deadlocked hard enough that its self-signal doesn't work, another catchable signal won't help.)


Copy link

@gopherbot gopherbot commented Jan 5, 2021

Change mentions this issue: runtime: don't take allglock in tracebackothers


Copy link

@prattmic prattmic commented Jan 5, 2021

Since we think this is caused by the allglock deadlock fixed in, I'm going to close this for now. We can reopen if we find this still occurring.


@prattmic prattmic closed this Jan 5, 2021
gopherbot pushed a commit that referenced this issue Jan 5, 2021
tracebackothers is called from fatal throw/panic.

A fatal throw may be taken with allglock held (notably in the allocator
when allglock is held), which would cause a deadlock in tracebackothers
when we try to take allglock again. Locking allglock here is also often
a lock order violation w.r.t. the locks held when throw was called.

Avoid the deadlock and ordering issues by skipping locking altogether.
It is OK to miss concurrently created Gs (which are generally avoided by
freezetheworld(), and which were possible previously anyways if created
after the loop).

Fatal throw/panic freezetheworld(), which should freeze other threads
that may be racing to modify allgs. However, freezetheworld() does _not_
guarantee that it stops all other threads, so we can't simply drop the

Fixes #42669
Updates #43175

Change-Id: I657aec46ed35fd5d1b3f1ba25b500128ab26b088
Reviewed-by: Michael Knyszek <>
Trust: Michael Pratt <>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
@mknyszek @prattmic @bcmills @gopherbot and others