Skip to content

runtime: hang in runtime test #13642

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rsc opened this issue Dec 16, 2015 · 7 comments
Closed

runtime: hang in runtime test #13642

rsc opened this issue Dec 16, 2015 · 7 comments
Milestone

Comments

@rsc
Copy link
Contributor

rsc commented Dec 16, 2015

From all.bash on my Mac at 61eb705 (Dec 14). Killed runtime.test after 30+ minutes.

##### GOMAXPROCS=2 runtime -cpu=1,2,4
SIGABRT: abort
PC=0x63bbb m=0

goroutine 0 [idle]:
runtime.mach_semaphore_wait(0x1003, 0xe, 0x45c340, 0x23a99, 0x45f040, 0x45c340, 0x56d29, 0xffffffffffffffff, 0x0, 0x7fff5fbff514, ...)
    /Users/rsc/go/src/runtime/sys_darwin_amd64.s:407 +0xb
runtime.semasleep1(0xffffffffffffffff, 0x0)
    /Users/rsc/go/src/runtime/os1_darwin.go:392 +0xdf
runtime.semasleep.func1()
    /Users/rsc/go/src/runtime/os1_darwin.go:408 +0x29
runtime.systemstack(0x7fff5fbff518)
    /Users/rsc/go/src/runtime/asm_amd64.s:307 +0xab
runtime.semasleep(0xffffffffffffffff, 0x0)
    /Users/rsc/go/src/runtime/os1_darwin.go:409 +0x36
runtime.notesleep(0x45c5c8)
    /Users/rsc/go/src/runtime/lock_sema.go:166 +0xe8
runtime.stopm()
    /Users/rsc/go/src/runtime/proc.go:1515 +0x101
runtime.findrunnable(0xc82001c000, 0x0)
    /Users/rsc/go/src/runtime/proc.go:1941 +0x713
runtime.schedule()
    /Users/rsc/go/src/runtime/proc.go:2040 +0x245
runtime.goexit0(0xc8203c3380)
    /Users/rsc/go/src/runtime/proc.go:2172 +0x1a2
runtime.mcall(0x7fff5fbff690)
    /Users/rsc/go/src/runtime/asm_amd64.s:233 +0x5b

goroutine 1 [chan receive]:
testing.RunTests(0x35db98, 0x458840, 0xa2, 0xa2, 0x229c01)
    /Users/rsc/go/src/testing/testing.go:576 +0x8b9
testing.(*M).Run(0xc820039f08, 0xd49c2)
    /Users/rsc/go/src/testing/testing.go:508 +0x81
main.main()
    runtime/_test/_testmain.go:924 +0x117

goroutine 17 [syscall, 31 minutes, locked to thread]:
runtime.goexit()
    /Users/rsc/go/src/runtime/asm_amd64.s:1998 +0x1

goroutine 232349 [sleep]:
time.Sleep(0x4c4b40)
    /Users/rsc/go/src/runtime/time.go:59 +0xf9
runtime_test.TestPeriodicGC(0xc82009a360)
    /Users/rsc/go/src/runtime/gc_test.go:220 +0xb1
testing.tRunner(0xc82009a360, 0x458d38)
    /Users/rsc/go/src/testing/testing.go:466 +0x98
created by testing.RunTests
    /Users/rsc/go/src/testing/testing.go:575 +0x879
@rsc rsc added this to the Go1.6 milestone Dec 16, 2015
@aclements
Copy link
Member

The sleep in goroutine 232349 is only 5 ms (it's in a loop, but we limit it to 20 iterations), so while it's perfectly legitimate for the system to not have anything to do, the M blocked in timerproc's semasleep should have woken up and kicked the system back in to action.

This smells like a scheduler issue to me, maybe related to the recent spinning changes? @dvyukov

It's unfortunate we don't have tracebacks from the other Ms. I'm working on reproducing this, but I don't have high hopes.

@rsc
Copy link
Contributor Author

rsc commented Dec 16, 2015 via email

@aclements
Copy link
Member

No luck reproducing this yet.

It's interesting that the test didn't time itself out after 5 minutes. That suggests that the testing package's alarm is stuck in the same limbo as the test's 5ms sleep.

There have been a few recent instances of this test timing out on the dashboard, but they're all 5 minute time-outs from the testing package:
2015-11-03T19:12:10-b6c0934/dragonfly-amd64
2015-11-24T17:07:45-a150c4f/dragonfly-amd64
2015-11-25T17:04:43-3f6b91b/plan9-386
2015-12-11T16:57:53-62226fa/plan9-386
2015-12-15T20:13:19-01baf13/linux-amd64-nocgo

The most recent instances of the build system timing out this test were several months ago:
2015-08-18T17:41:44-d9dc2f2/netbsd-386-minux
2015-08-19T03:28:41-5457563/netbsd-386-minux
2015-09-08T22:59:09-0ec72b6/linux-arm-arm5
2015-09-11T06:02:39-f5c53e0/linux-arm-arm5
2015-09-11T20:50:46-0726931/linux-arm-arm5

@rsc
Copy link
Contributor Author

rsc commented Dec 17, 2015 via email

@rsc
Copy link
Contributor Author

rsc commented Dec 17, 2015

One output with GOTRACEBACK=crash. Normally a run takes about 200 seconds. This one timed out (on its own) after 300 (edit: no, 1000). Because it wasn't a signal-induced exit the thread stacks were not dumped.

##### GOMAXPROCS=2 runtime -cpu=1,2,4
panic: test timed out after 5m0s

goroutine 232209 [running]:
runtime.gopanic(0x2293e0, 0xc820010300)
    /Users/rsc/go/src/runtime/panic.go:464 +0x3e6 fp=0xc82003bf28 sp=0xc82003bea8
testing.startAlarm.func1()
    /Users/rsc/go/src/testing/testing.go:717 +0x132 fp=0xc82003bfb0 sp=0xc82003bf28
runtime.goexit()
    /Users/rsc/go/src/runtime/asm_amd64.s:1998 +0x1 fp=0xc82003bfb8 sp=0xc82003bfb0
created by time.goFunc
    /Users/rsc/go/src/time/sleep.go:129 +0x3a

goroutine 1 [chan receive, 1 minutes]:
runtime.gopark(0x35e1a0, 0xc82049c0b8, 0x2ebdc0, 0xc, 0x17, 0x3)
    /Users/rsc/go/src/runtime/proc.go:257 +0x163 fp=0xc82003ab70 sp=0xc82003ab48
runtime.goparkunlock(0xc82049c0b8, 0x2ebdc0, 0xc, 0x509a17, 0x3)
    /Users/rsc/go/src/runtime/proc.go:263 +0x54 fp=0xc82003aba8 sp=0xc82003ab70
runtime.chanrecv(0x21c6c0, 0xc82049c060, 0xc82003ad88, 0xc8202b5c01, 0x80000)
    /Users/rsc/go/src/runtime/chan.go:470 +0x49f fp=0xc82003ac30 sp=0xc82003aba8
runtime.chanrecv1(0x21c6c0, 0xc82049c060, 0xc82003ad88)
    /Users/rsc/go/src/runtime/chan.go:355 +0x2b fp=0xc82003ac60 sp=0xc82003ac30
testing.RunTests(0x35db98, 0x458840, 0xa2, 0xa2, 0x229c01)
    /Users/rsc/go/src/testing/testing.go:576 +0x8b9 fp=0xc82003ade0 sp=0xc82003ac60
testing.(*M).Run(0xc82003af08, 0xd49c2)
    /Users/rsc/go/src/testing/testing.go:508 +0x81 fp=0xc82003ae68 sp=0xc82003ade0
main.main()
    runtime/_test/_testmain.go:924 +0x117 fp=0xc82003af60 sp=0xc82003ae68
runtime.main()
    /Users/rsc/go/src/runtime/proc.go:183 +0x2b0 fp=0xc82003afb0 sp=0xc82003af60
runtime.goexit()
    /Users/rsc/go/src/runtime/asm_amd64.s:1998 +0x1 fp=0xc82003afb8 sp=0xc82003afb0

goroutine 17 [syscall, 16 minutes, locked to thread]:
runtime.goexit()
    /Users/rsc/go/src/runtime/asm_amd64.s:1998 +0x1 fp=0xc820038fb8 sp=0xc820038fb0

goroutine 2 [runnable]:
runtime.gopark(0x35e1a0, 0x45bab0, 0x2ecbc0, 0xf, 0x100000000000014, 0x1)
    /Users/rsc/go/src/runtime/proc.go:257 +0x163 fp=0xc820024758 sp=0xc820024730
runtime.goparkunlock(0x45bab0, 0x2ecbc0, 0xf, 0xc820000114, 0x1)
    /Users/rsc/go/src/runtime/proc.go:263 +0x54 fp=0xc820024790 sp=0xc820024758
runtime.forcegchelper()
    /Users/rsc/go/src/runtime/proc.go:224 +0xb8 fp=0xc8200247c0 sp=0xc820024790
runtime.goexit()
    /Users/rsc/go/src/runtime/asm_amd64.s:1998 +0x1 fp=0xc8200247c8 sp=0xc8200247c0
created by runtime.init.4
    /Users/rsc/go/src/runtime/proc.go:213 +0x2b

goroutine 3 [GC sweep wait]:
runtime.gopark(0x35e1a0, 0x45bce0, 0x2e88f0, 0xd, 0x56514, 0x1)
    /Users/rsc/go/src/runtime/proc.go:257 +0x163 fp=0xc820024f48 sp=0xc820024f20
runtime.goparkunlock(0x45bce0, 0x2e88f0, 0xd, 0x14, 0x1)
    /Users/rsc/go/src/runtime/proc.go:263 +0x54 fp=0xc820024f80 sp=0xc820024f48
runtime.bgsweep(0xc82001a070)
    /Users/rsc/go/src/runtime/mgcsweep.go:79 +0x14d fp=0xc820024fb8 sp=0xc820024f80
runtime.goexit()
    /Users/rsc/go/src/runtime/asm_amd64.s:1998 +0x1 fp=0xc820024fc0 sp=0xc820024fb8
created by runtime.gcenable
    /Users/rsc/go/src/runtime/mgc.go:191 +0x53

goroutine 4 [finalizer wait, 1 minutes]:
runtime.gopark(0x35e1a0, 0x476670, 0x2ec7e0, 0xe, 0x14, 0x1)
    /Users/rsc/go/src/runtime/proc.go:257 +0x163 fp=0xc8202d5718 sp=0xc8202d56f0
runtime.goparkunlock(0x476670, 0x2ec7e0, 0xe, 0x2000000014, 0x1)
    /Users/rsc/go/src/runtime/proc.go:263 +0x54 fp=0xc8202d5750 sp=0xc8202d5718
runtime.runfinq()
    /Users/rsc/go/src/runtime/mfinal.go:158 +0xaa fp=0xc8202d57c0 sp=0xc8202d5750
runtime.goexit()
    /Users/rsc/go/src/runtime/asm_amd64.s:1998 +0x1 fp=0xc8202d57c8 sp=0xc8202d57c0
created by runtime.createfing
    /Users/rsc/go/src/runtime/mfinal.go:139 +0x60

goroutine 5 [timer goroutine (idle)]:
runtime.gopark(0x35e1a0, 0x45bdc0, 0x31da20, 0x16, 0x14, 0x1)
    /Users/rsc/go/src/runtime/proc.go:257 +0x163 fp=0xc820025f00 sp=0xc820025ed8
runtime.goparkunlock(0x45bdc0, 0x31da20, 0x16, 0x14, 0x1)
    /Users/rsc/go/src/runtime/proc.go:263 +0x54 fp=0xc820025f38 sp=0xc820025f00
runtime.timerproc()
    /Users/rsc/go/src/runtime/time.go:202 +0x123 fp=0xc820025fc0 sp=0xc820025f38
runtime.goexit()
    /Users/rsc/go/src/runtime/asm_amd64.s:1998 +0x1 fp=0xc820025fc8 sp=0xc820025fc0
created by runtime.addtimerLocked
    /Users/rsc/go/src/runtime/time.go:116 +0x11f

goroutine 208 [GC worker (idle)]:
runtime.gopark(0x35dfe8, 0xc82002b400, 0x312a20, 0x10, 0xc820233714, 0x0)
    /Users/rsc/go/src/runtime/proc.go:257 +0x163 fp=0xc820233748 sp=0xc820233720
runtime.gcBgMarkWorker(0xc82001c000)
    /Users/rsc/go/src/runtime/mgc.go:1379 +0x12c fp=0xc8202337b8 sp=0xc820233748
runtime.goexit()
    /Users/rsc/go/src/runtime/asm_amd64.s:1998 +0x1 fp=0xc8202337c0 sp=0xc8202337b8
created by runtime.gcBgMarkStartWorkers
    /Users/rsc/go/src/runtime/mgc.go:1326 +0x93

goroutine 232206 [runnable]:
runtime.gcMarkDone()
    /Users/rsc/go/src/runtime/mgc.go:1037 fp=0xc8202d2f48 sp=0xc8202d2f40
runtime.gcBgMarkWorker(0xc820095500)
    /Users/rsc/go/src/runtime/mgc.go:1456 +0x4d8 fp=0xc8202d2fb8 sp=0xc8202d2f48
runtime.goexit()
    /Users/rsc/go/src/runtime/asm_amd64.s:1998 +0x1 fp=0xc8202d2fc0 sp=0xc8202d2fb8
created by runtime.gcBgMarkStartWorkers
    /Users/rsc/go/src/runtime/mgc.go:1326 +0x93

goroutine 232345 [semacquire]:
runtime.gopark(0x35e1a0, 0x464600, 0x2efc70, 0xa, 0x19, 0x4)
    /Users/rsc/go/src/runtime/proc.go:257 +0x163 fp=0xc82031f0c8 sp=0xc82031f0a0
runtime.goparkunlock(0x464600, 0x2efc70, 0xa, 0x31a19, 0x4)
    /Users/rsc/go/src/runtime/proc.go:263 +0x54 fp=0xc82031f100 sp=0xc82031f0c8
runtime.semacquire(0x44213c, 0x0)
    /Users/rsc/go/src/runtime/sema.go:104 +0x2f7 fp=0xc82031f160 sp=0xc82031f100
runtime.stopTheWorld(0x2eefc0, 0xe)
    /Users/rsc/go/src/runtime/proc.go:867 +0x28 fp=0xc82031f178 sp=0xc82031f160
runtime.ReadMemStats(0xc82031f1e8)
    /Users/rsc/go/src/runtime/mstats.go:162 +0x2c fp=0xc82031f1a0 sp=0xc82031f178
runtime_test.TestPeriodicGC(0xc820189dd0)
    /Users/rsc/go/src/runtime/gc_test.go:223 +0xbf fp=0xc820321f48 sp=0xc82031f1a0
testing.tRunner(0xc820189dd0, 0x458d38)
    /Users/rsc/go/src/testing/testing.go:466 +0x98 fp=0xc820321f80 sp=0xc820321f48
runtime.goexit()
    /Users/rsc/go/src/runtime/asm_amd64.s:1998 +0x1 fp=0xc820321f88 sp=0xc820321f80
created by testing.RunTests
    /Users/rsc/go/src/testing/testing.go:575 +0x879

goroutine 209934 [GC worker (idle)]:
runtime.gopark(0x35dfe8, 0xc820526000, 0x312a20, 0x10, 0xc8202d1f14, 0x0)
    /Users/rsc/go/src/runtime/proc.go:257 +0x163 fp=0xc8202d1f48 sp=0xc8202d1f20
runtime.gcBgMarkWorker(0xc82001d500)
    /Users/rsc/go/src/runtime/mgc.go:1379 +0x12c fp=0xc8202d1fb8 sp=0xc8202d1f48
runtime.goexit()
    /Users/rsc/go/src/runtime/asm_amd64.s:1998 +0x1 fp=0xc8202d1fc0 sp=0xc8202d1fb8
created by runtime.gcBgMarkStartWorkers
    /Users/rsc/go/src/runtime/mgc.go:1326 +0x93

goroutine 232311 [GC worker (idle)]:
runtime.gopark(0x35dfe8, 0x45c4c0, 0x312a20, 0x10, 0xc8202d3714, 0x0)
    /Users/rsc/go/src/runtime/proc.go:257 +0x163 fp=0xc8202d3748 sp=0xc8202d3720
runtime.gcBgMarkWorker(0xc820094000)
    /Users/rsc/go/src/runtime/mgc.go:1379 +0x12c fp=0xc8202d37b8 sp=0xc8202d3748
runtime.goexit()
    /Users/rsc/go/src/runtime/asm_amd64.s:1998 +0x1 fp=0xc8202d37c0 sp=0xc8202d37b8
created by runtime.gcBgMarkStartWorkers
    /Users/rsc/go/src/runtime/mgc.go:1326 +0x93
FAIL    runtime 985.660s

goroutine 232206 is calling gcMarkDone and has been preempted on entry to the function.
goroutine 232345 is calling ReadMemStats which is waiting on worldsema and won't get it until the GC finishes.
The GC won't finish until gcMarkDone moves things along.

I can't tell if things had been stuck like this for long. It's possible this is just the state at this point, but it's also possible that regular (non-timer-triggered) goroutines were not being run at all.

@rsc
Copy link
Contributor Author

rsc commented Dec 17, 2015

Current theory is that the various hardware resets changed the Mac sleep settings. Now it goes to sleep even when plugged in. When it gets woken up, lots of time has gone by and the test "times out". Or the test gets stuck, although I can't explain that part. Anyway, closing.

@rsc rsc closed this as completed Dec 17, 2015
@rsc
Copy link
Contributor Author

rsc commented Dec 17, 2015

There may be a real problem. Discussion of non-rsc-broken-Mac failures contineus on #13645.

@golang golang locked and limited conversation to collaborators Dec 29, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants