Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: livelock near runtime.sysmon #11019

Closed
rhysh opened this issue Jun 1, 2015 · 5 comments
Closed

runtime: livelock near runtime.sysmon #11019

rhysh opened this issue Jun 1, 2015 · 5 comments
Milestone

Comments

@rhysh
Copy link
Contributor

rhysh commented Jun 1, 2015

$ go version
go version devel +8cd191b Sat May 30 12:21:56 2015 +0000 linux/amd64
$ uname -a | awk '$2="host"'
Linux host 3.13.0-52-generic #86~precise1-Ubuntu SMP Tue May 5 18:08:21 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

I have a process that receives data over a few hundred concurrent TCP connections and writes them to files. It's been locking up on recent versions of tip (it was stable on 1.4.1). When it locks up, top shows it consuming 10-30% of a core, perf top doesn't show it at all, and strace shows it repeatedly calling select(2) with a NULL fd list (maybe used just as a sleep?). SIGABRT causes a stack dump which starts with a g0 stack (followed by goroutines from my app that have been blocked for many minutes).

$ strace -ttt -f -p 47135
[snip]
[pid 47156] 1433139204.018227 <... select resumed> ) = 0 (Timeout)
[pid 47156] 1433139204.018256 select(0, NULL, NULL, NULL, {0, 100} <unfinished ...>
[pid 47155] 1433139204.018400 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out)
[pid 47155] 1433139204.018432 clock_gettime(CLOCK_MONOTONIC, {437662, 793520114}) = 0
[pid 47156] 1433139204.018494 <... select resumed> ) = 0 (Timeout)
[pid 47155] 1433139204.018508 clock_gettime(CLOCK_MONOTONIC,  <unfinished ...>
[pid 47156] 1433139204.018525 select(0, NULL, NULL, NULL, {0, 100} <unfinished ...>
[pid 47155] 1433139204.018541 <... clock_gettime resumed> {437662, 793595479}) = 0
[pid 47155] 1433139204.018575 futex(0xa7ee70, FUTEX_WAIT, 0, {0, 100000} <unfinished ...>
[pid 47156] 1433139204.018772 <... select resumed> ) = 0 (Timeout)
[pid 47156] 1433139204.018803 select(0, NULL, NULL, NULL, {0, 100} <unfinished ...>
[pid 47155] 1433139204.018821 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out)
[pid 47155] 1433139204.018854 clock_gettime(CLOCK_MONOTONIC, {437662, 793941489}) = 0
[pid 47155] 1433139204.018914 clock_gettime(CLOCK_MONOTONIC, {437662, 794001909}) = 0
[pid 47155] 1433139204.018974 futex(0xa7ee70, FUTEX_WAIT, 0, {0, 100000} <unfinished ...>
[snip]
SIGABRT: abort
PC=0x463393 m=1

goroutine 0 [idle]:
runtime.futex(0xa7ee80, 0x0, 0xc208047ee0, 0x0, 0x0, 0x4631ec, 0x3c, 0x0, 0x416994, 0xa7ee80, ...)
    /usr/local/go/src/runtime/sys_linux_amd64.s:283 +0x23
runtime.futexsleep(0xa7ee80, 0x0, 0xdf8475800)
    /usr/local/go/src/runtime/os1_linux.go:56 +0xf0
runtime.notetsleep_internal(0xa7ee80, 0xdf8475800, 0xc2080003c0)
    /usr/local/go/src/runtime/lock_futex.go:167 +0x134
runtime.notetsleep(0xa7ee80, 0xdf8475800, 0x45d964b800)
    /usr/local/go/src/runtime/lock_futex.go:187 +0x78
runtime.sysmon()
    /usr/local/go/src/runtime/proc1.go:2954 +0x480
runtime.mstart1()
    /usr/local/go/src/runtime/proc1.go:703 +0xfa
runtime.mstart()
    /usr/local/go/src/runtime/proc1.go:673 +0x72

goroutine 1 [chan receive, 794 minutes]:
main.main()
    /redacted.go:230 +0xe7b

goroutine 18 [chan send, 741 minutes]:
redacted(0xc2080c8000)
    /redacted.go:199 +0x1f0
created by redacted.init.1
    /redacted.go:184 +0x6e

[snip]
@ianlancetaylor ianlancetaylor added this to the Go1.5 milestone Jun 3, 2015
@rsc
Copy link
Contributor

rsc commented Jun 8, 2015

I believe sysmon was the unlucky recipient of the signal but otherwise uninvolved.

The only places with consistent 100µs sleeps are the scheduler stealing and the GC work coordination (getfull). One of these seems to be running away / waiting for something that's not going to happen.

@aclements @RLH

@rhysh
Copy link
Contributor Author

rhysh commented Jun 23, 2015

I haven't seen this on recent versions, I think this is resolved. Thanks!

@rhysh rhysh closed this as completed Jun 23, 2015
@sagar8192
Copy link

I am having the same issue.

golang version:
go version go1.5.2 linux/amd64

OS version:
Linux host 3.13.0-65-generic #106-Ubuntu SMP Fri Oct 2 22:08:27 UTC 2015 x86_64 GNU/Linux

[pid 1052205] 1457261183.644962 <... select resumed> ) = 0 (Timeout)
[pid 1052205] 1457261183.644990 futex(0x780f60, FUTEX_WAIT, 0, {60, 0} <unfinished ...>
[pid 1052206] 1457261183.645062 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.645081 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.645269 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.645456 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.645643 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.645829 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.646017 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.646203 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.646392 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.646577 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.646765 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.646951 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.647138 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.647326 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.647512 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.647699 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.647886 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.648073 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.648259 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.648445 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.648633 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)
[pid 1052206] 1457261183.648818 futex(0x780f50, FUTEX_WAIT, 0, {0, 100000}) = -1 ETIMEDOUT (Connection timed out)

@bradfitz
Copy link
Contributor

bradfitz commented Mar 6, 2016

@sagar8192, this issue is closed. We don't re-use old bugs.

Please file a new issue if you found a bug. Please use Go 1.6 or master. You can reference this bug in your new bug report.

Thanks.

@sagar8192
Copy link

@bradfitz Lemme try my code with go 1.6. If it fails with the same error then I will open a new issue. Thanks!

@golang golang locked and limited conversation to collaborators Mar 13, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants