Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: "signal: segmentation fault (core dumped)" on several builders #35248

Closed
eliasnaur opened this issue Oct 30, 2019 · 12 comments
Closed

runtime: "signal: segmentation fault (core dumped)" on several builders #35248

eliasnaur opened this issue Oct 30, 2019 · 12 comments

Comments

@eliasnaur eliasnaur changed the title runtime: "signal: segmentation fault (core dumped)" on several builder runtime: "signal: segmentation fault (core dumped)" on several builders Oct 30, 2019
@mvdan mvdan added this to the Go1.14 milestone Oct 30, 2019
@mvdan

This comment has been minimized.

Copy link
Member

@mvdan mvdan commented Oct 30, 2019

I was able to reproduce on the first try on my laptop:

$ go version
go version devel +f4e32aeed1 Wed Oct 30 08:17:29 2019 +0000 linux/amd64
$ GOMAXPROCS=2 go test runtime -cpu=1,2,4 -quick
signal: segmentation fault (core dumped)
FAIL    runtime 36.302s
FAIL

I'll continue digging, and I'll post again when I find something useful.

Edit: Second and third attempts also resulted in a segfault, happening at 190s and 120s respectively.

@thanm

This comment has been minimized.

Copy link
Member

@thanm thanm commented Oct 30, 2019

I've done a little experimenting with this as well. My suspicion is that the problem is with the runtime's "TestSignalM" test, which is relatively recently added -- when I do repeated runs in parallel of

GOMAXPROCS=2 go test -test.v runtime -cpu=1,2,4 -quick

that's the last testpoint mentioned before the crash. On the other hand, when I run

go test -i -o runtime.test
stress ./runtime.test -test.run=TestSignalM -test.cpu=10

I don't see any failures, so it's possible that my theory isn't valid.

@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Oct 30, 2019

@danscales

This comment has been minimized.

Copy link

@danscales danscales commented Oct 30, 2019

I tried, but I haven't been able to reproduce at all on my workstation (using the commands above).

@mvdan Did you actually get a core that might have a stack trace? (I see '(core dumped)' above).

@mvdan

This comment has been minimized.

Copy link
Member

@mvdan mvdan commented Oct 30, 2019

Yes, systemd has multiple of these core dumps, but I don't know what to do with them. If anyone has the magic coredumpctl command I can run to get a stack trace, I'm happy to run it. coredump info just gives a stack trace with no source information:

Stack trace of thread 50334:
#0  0x000000000046898c n/a (/tmp/go-build494200099/b001/runtime.test)
#1  0x0000000000466311 n/a (/tmp/go-build494200099/b001/runtime.test)
#2  0x00000000004482f0 n/a (/tmp/go-build494200099/b001/runtime.test)
#3  0x0000000000447923 n/a (/tmp/go-build494200099/b001/runtime.test)
#4  0x000000000046a673 n/a (/tmp/go-build494200099/b001/runtime.test)
#5  0x00007fba8808f930 __restore_rt (libpthread.so.0)
#6  0x000000000046a8d3 n/a (/tmp/go-build494200099/b001/runtime.test)
#7  0x000000000040d17e n/a (/tmp/go-build494200099/b001/runtime.test)
#8  0x00000000008221a0 n/a (/tmp/go-build494200099/b001/runtime.test)
@aclements

This comment has been minimized.

Copy link
Member

@aclements aclements commented Oct 31, 2019

I haven't been able to reproduce any TestSignalM failures locally either, but the theory seems fairly likely. If you get a SIGSEGV while in a signal handler, this is exactly what you would see.

@mvdan, you should be able to build that exactly runtime.test binary with go test -c runtime and then resolve those PCs yourself by pasting them into addr2line -Cfipe runtime.test.

@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Oct 31, 2019

@cuonglm

This comment has been minimized.

@cuonglm

This comment has been minimized.

@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Nov 1, 2019

@cuonglm, I suspect that one is related to #35272.

@aclements

This comment has been minimized.

Copy link
Member

@aclements aclements commented Nov 2, 2019

Reproduced. The problem is the write barrier in the testSigusr1 callback, which runs on the signal stack and thus may not have a P. If GC is active, and the signal arrives without a P, it crashes. The fix is easy, since all we really care about is the M's ID.

Program terminated with signal SIGSEGV, Segmentation fault.
#0  runtime.gcWriteBarrier ()
    at /usr/local/google/home/austin/go.dev/src/runtime/asm_amd64.s:1407
1407            MOVQ    (p_wbBuf+wbBuf_next)(R13), R14
[Current thread is 1 (Thread 0x7fd9a90c6700 (LWP 193547))]
Loading Go Runtime support.
(gdb) bt
#0  runtime.gcWriteBarrier ()
    at /usr/local/google/home/austin/go.dev/src/runtime/asm_amd64.s:1407
#1  0x0000000000466e31 in runtime.WaitForSigusr1.func1 (gp=0xc0015fb680, 
    ~r1=<optimized out>)
    at /usr/local/google/home/austin/go.dev/src/runtime/export_unix_test.go:44
#2  0x0000000000448e55 in runtime.sighandler (sig=10, info=0xc0000f7bf0, 
    ctxt=0xc0000f7ac0, gp=0xc0015fb680)
    at /usr/local/google/home/austin/go.dev/src/runtime/signal_unix.go:508
#3  0x0000000000448453 in runtime.sigtrampgo (sig=10, info=0xc0000f7bf0, 
    ctx=0xc0000f7ac0)
    at /usr/local/google/home/austin/go.dev/src/runtime/signal_unix.go:421
#4  0x000000000046b343 in runtime.sigtramp ()
    at /usr/local/google/home/austin/go.dev/src/runtime/sys_linux_amd64.s:384
#5  <signal handler called>
#6  runtime.futex ()
    at /usr/local/google/home/austin/go.dev/src/runtime/sys_linux_amd64.s:563
#7  0x000000000042fc24 in runtime.futexsleep (
    addr=0x828200 <runtime.waitForSigusr1>, val=0, ns=1000000000)
    at /usr/local/google/home/austin/go.dev/src/runtime/os_linux.go:50
#8  0x000000000040d0ae in runtime.notetsleep_internal (
    n=0x828200 <runtime.waitForSigusr1>, ns=1000000000, ~r2=<optimized out>)
    at /usr/local/google/home/austin/go.dev/src/runtime/lock_futex.go:193
#9  0x000000000040d20c in runtime.notetsleepg (
    n=0x828200 <runtime.waitForSigusr1>, ns=1000000000, ~r2=<optimized out>)
    at /usr/local/google/home/austin/go.dev/src/runtime/lock_futex.go:228
#10 0x00000000004621e9 in runtime.WaitForSigusr1 (
    ready={void (runtime.m *)} 0xc001603780, timeoutNS=1000000000, 
    ~r2=<optimized out>, ~r3=<optimized out>)
    at /usr/local/google/home/austin/go.dev/src/runtime/export_unix_test.go:47
#11 0x00000000005c983b in runtime_test.TestSignalM.func1 (ready=0xc0003188a0, 
    &want=0xc000016a68, &got=0xc000016a70, &wg=0xc000016a80)
    at /usr/local/google/home/austin/go.dev/src/runtime/crash_unix_test.go:321
#12 0x0000000000469471 in runtime.goexit ()
    at /usr/local/google/home/austin/go.dev/src/runtime/asm_amd64.s:1375
#13 0x000000c0003188a0 in ?? ()
#14 0x000000c000016a68 in ?? ()
#15 0x000000c000016a70 in ?? ()
#16 0x000000c000016a80 in ?? ()
#17 0x0000000000000000 in ?? ()
(gdb) print $r13
$1 = 0
@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Nov 2, 2019

Change https://golang.org/cl/204620 mentions this issue: runtime: remove write barrier in WaitForSigusr1

@FiloSottile FiloSottile added the NeedsFix label Nov 4, 2019
@gopherbot gopherbot closed this in b824048 Nov 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants
You can’t perform that action at this time.