runtime: livelock near runtime.sysmon #11019

rhysh · 2015-06-01T15:44:15Z

$ go version
go version devel +8cd191b Sat May 30 12:21:56 2015 +0000 linux/amd64
$ uname -a | awk '$2="host"'
Linux host 3.13.0-52-generic #86~precise1-Ubuntu SMP Tue May 5 18:08:21 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

I have a process that receives data over a few hundred concurrent TCP connections and writes them to files. It's been locking up on recent versions of tip (it was stable on 1.4.1). When it locks up, top shows it consuming 10-30% of a core, perf top doesn't show it at all, and strace shows it repeatedly calling select(2) with a NULL fd list (maybe used just as a sleep?). SIGABRT causes a stack dump which starts with a g0 stack (followed by goroutines from my app that have been blocked for many minutes).

$ strace -ttt -f -p 47135
[snip]
[pid 47156] 1433139204.018227 <... select resumed> ) = 0 (Timeout)
[pid 47156] 1433139204.018256 select(0, NULL, NULL, NULL, {0, 100} <unfinished ...>
[pid 47155] 1433139204.018400 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out)
[pid 47155] 1433139204.018432 clock_gettime(CLOCK_MONOTONIC, {437662, 793520114}) = 0
[pid 47156] 1433139204.018494 <... select resumed> ) = 0 (Timeout)
[pid 47155] 1433139204.018508 clock_gettime(CLOCK_MONOTONIC,  <unfinished ...>
[pid 47156] 1433139204.018525 select(0, NULL, NULL, NULL, {0, 100} <unfinished ...>
[pid 47155] 1433139204.018541 <... clock_gettime resumed> {437662, 793595479}) = 0
[pid 47155] 1433139204.018575 futex(0xa7ee70, FUTEX_WAIT, 0, {0, 100000} <unfinished ...>
[pid 47156] 1433139204.018772 <... select resumed> ) = 0 (Timeout)
[pid 47156] 1433139204.018803 select(0, NULL, NULL, NULL, {0, 100} <unfinished ...>
[pid 47155] 1433139204.018821 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out)
[pid 47155] 1433139204.018854 clock_gettime(CLOCK_MONOTONIC, {437662, 793941489}) = 0
[pid 47155] 1433139204.018914 clock_gettime(CLOCK_MONOTONIC, {437662, 794001909}) = 0
[pid 47155] 1433139204.018974 futex(0xa7ee70, FUTEX_WAIT, 0, {0, 100000} <unfinished ...>
[snip]

SIGABRT: abort
PC=0x463393 m=1

goroutine 0 [idle]:
runtime.futex(0xa7ee80, 0x0, 0xc208047ee0, 0x0, 0x0, 0x4631ec, 0x3c, 0x0, 0x416994, 0xa7ee80, ...)
    /usr/local/go/src/runtime/sys_linux_amd64.s:283 +0x23
runtime.futexsleep(0xa7ee80, 0x0, 0xdf8475800)
    /usr/local/go/src/runtime/os1_linux.go:56 +0xf0
runtime.notetsleep_internal(0xa7ee80, 0xdf8475800, 0xc2080003c0)
    /usr/local/go/src/runtime/lock_futex.go:167 +0x134
runtime.notetsleep(0xa7ee80, 0xdf8475800, 0x45d964b800)
    /usr/local/go/src/runtime/lock_futex.go:187 +0x78
runtime.sysmon()
    /usr/local/go/src/runtime/proc1.go:2954 +0x480
runtime.mstart1()
    /usr/local/go/src/runtime/proc1.go:703 +0xfa
runtime.mstart()
    /usr/local/go/src/runtime/proc1.go:673 +0x72

goroutine 1 [chan receive, 794 minutes]:
main.main()
    /redacted.go:230 +0xe7b

goroutine 18 [chan send, 741 minutes]:
redacted(0xc2080c8000)
    /redacted.go:199 +0x1f0
created by redacted.init.1
    /redacted.go:184 +0x6e

[snip]

The text was updated successfully, but these errors were encountered:

rsc · 2015-06-08T06:14:23Z

I believe sysmon was the unlucky recipient of the signal but otherwise uninvolved.

The only places with consistent 100µs sleeps are the scheduler stealing and the GC work coordination (getfull). One of these seems to be running away / waiting for something that's not going to happen.

@aclements @RLH

rhysh · 2015-06-23T05:30:40Z

I haven't seen this on recent versions, I think this is resolved. Thanks!

sagar8192 · 2016-03-06T10:49:51Z

I am having the same issue.

golang version:
go version go1.5.2 linux/amd64

OS version:
Linux host 3.13.0-65-generic #106-Ubuntu SMP Fri Oct 2 22:08:27 UTC 2015 x86_64 GNU/Linux

[pid 1052205] 1457261183.644962 <... select resumed> ) = 0 (Timeout)
[pid 1052205] 1457261183.644990 futex(0x780f60, FUTEX_WAIT, 0, {60, 0} <unfinished ...>
[pid 1052206] 1457261183.645062 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.645081 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.645269 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.645456 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.645643 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.645829 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.646017 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.646203 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.646392 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.646577 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.646765 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.646951 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.647138 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.647326 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.647512 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.647699 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.647886 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.648073 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.648259 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.648445 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.648633 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.648818 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)

bradfitz · 2016-03-06T15:49:00Z

@sagar8192, this issue is closed. We don't re-use old bugs.

Please file a new issue if you found a bug. Please use Go 1.6 or master. You can reference this bug in your new bug report.

Thanks.

sagar8192 · 2016-03-06T18:35:06Z

@bradfitz Lemme try my code with go 1.6. If it fails with the same error then I will open a new issue. Thanks!

ianlancetaylor added this to the Go1.5 milestone Jun 3, 2015

rhysh closed this as completed Jun 23, 2015

sagar8192 mentioned this issue Mar 6, 2016

Go process is getting stuck. #14673

Closed

golang locked and limited conversation to collaborators Mar 13, 2017

gopherbot added the FrozenDueToAge label Mar 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: livelock near runtime.sysmon #11019

runtime: livelock near runtime.sysmon #11019

rhysh commented Jun 1, 2015

rsc commented Jun 8, 2015

rhysh commented Jun 23, 2015

sagar8192 commented Mar 6, 2016

bradfitz commented Mar 6, 2016

sagar8192 commented Mar 6, 2016

runtime: livelock near runtime.sysmon #11019

runtime: livelock near runtime.sysmon #11019

Comments

rhysh commented Jun 1, 2015

rsc commented Jun 8, 2015

rhysh commented Jun 23, 2015

sagar8192 commented Mar 6, 2016

bradfitz commented Mar 6, 2016

sagar8192 commented Mar 6, 2016