New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: crash on Plan 9 since CL 46033 #22227

Open
0intro opened this Issue Oct 12, 2017 · 14 comments

Comments

Projects
None yet
4 participants
@0intro
Member

0intro commented Oct 12, 2017

Since the runtime.LockOSThread change in CL 46033, Go programs are crashing on Plan 9 when running commands with os/exec.

fatal error: unexpected signal during runtime execution
[signal sys: trap: fault read code=0x0 addr=0x0 pc=0x5ef]

runtime stack:
runtime.throw(0x36dca9, 0x2a)
	/tmp/workdir/go/src/runtime/panic.go:616 +0x6a
runtime.sigpanic()
	/tmp/workdir/go/src/runtime/os_plan9.go:79 +0x240

goroutine 1 [chan receive]:
syscall.startProcess(0x1087b310, 0x7, 0x1085db60, 0x4, 0x4, 0x107b7290, 0x10, 0x1087bc80, 0x107b7290)
	/tmp/workdir/go/src/syscall/exec_plan9.go:514 +0xac
syscall.StartProcess(0x1087b310, 0x7, 0x1085db60, 0x4, 0x4, 0x107b7290, 0x2, 0x4, 0x0, 0x0)
	/tmp/workdir/go/src/syscall/exec_plan9.go:525 +0x48
os.startProcess(0x1087b310, 0x7, 0x1085db60, 0x4, 0x4, 0x108a1c3c, 0x1084ca00, 0x3d, 0x3d)
	/tmp/workdir/go/src/os/exec_plan9.go:33 +0x191
os.StartProcess(0x1087b310, 0x7, 0x1085db60, 0x4, 0x4, 0x108a1c3c, 0x0, 0x0, 0x41dbde8)
	/tmp/workdir/go/src/os/exec.go:94 +0x48
os/exec.(*Cmd).Start(0x107c0580, 0x1086b201, 0x1076f7a0)
	/tmp/workdir/go/src/os/exec/exec.go:363 +0x3c1
os/exec.(*Cmd).Run(0x107c0580, 0x1076f7a0, 0x1086b858)
	/tmp/workdir/go/src/os/exec/exec.go:286 +0x20
os/exec.(*Cmd).CombinedOutput(0x107c0580, 0x1, 0x1, 0x1086b4a0, 0x3c, 0x3c)
	/tmp/workdir/go/src/os/exec/exec.go:506 +0xed
cmd/go/internal/work.(*Builder).gccSupportsFlag(0x107e2230, 0x10766ea8, 0x1, 0x1, 0x3661a1, 0x18, 0x0)
	/tmp/workdir/go/src/cmd/go/internal/work/build.go:3514 +0x437
cmd/go/internal/work.(*Builder).compilerCmd(0x107e2230, 0x35d3b1, 0x2, 0x35d387, 0x2, 0x35d2b1, 0x1, 0x0, 0x0, 0x302ce0, ...)
	/tmp/workdir/go/src/cmd/go/internal/work/build.go:3454 +0x61b
cmd/go/internal/work.(*Builder).GccCmd(0x107e2230, 0x35d2b1, 0x1, 0x0, 0x0, 0x107c29a0, 0xb, 0x16)
	/tmp/workdir/go/src/cmd/go/internal/work/build.go:3390 +0x64
cmd/go/internal/envcmd.MkEnv(0x1086a000, 0x3b, 0x3b)
	/tmp/workdir/go/src/cmd/go/internal/envcmd/env.go:76 +0x31a
main.main()
	/tmp/workdir/go/src/cmd/go/main.go:122 +0x479

goroutine 5 [syscall]:
os/signal.signal_recv(0x0, 0x0)
	/tmp/workdir/go/src/runtime/sigqueue_plan9.go:109 +0x69
os/signal.loop()
	/tmp/workdir/go/src/os/signal/signal_plan9.go:27 +0x19
created by os/signal.init.0
	/tmp/workdir/go/src/os/signal/signal_plan9.go:22 +0x36

goroutine 13 [runnable, locked to thread]:
syscall.Close(0x7, 0x1085dbe0, 0x5)
	/tmp/workdir/go/src/syscall/zsyscall_plan9_386.go:245 +0x86
syscall.forkExec(0x1087b310, 0x7, 0x1085db60, 0x4, 0x4, 0x107b7290, 0x108159d0, 0x0, 0x0)
	/tmp/workdir/go/src/syscall/exec_plan9.go:434 +0x4c0
syscall.startProcess.func1(0x1087b310, 0x7, 0x1085db60, 0x4, 0x4, 0x107b7290, 0x1084f600)
	/tmp/workdir/go/src/syscall/exec_plan9.go:488 +0x51
created by syscall.startProcess
	/tmp/workdir/go/src/syscall/exec_plan9.go:484 +0x80
2017/10/11 15:05:51 Failed: exit status: 'go 2462: 2'

See https://build.golang.org/log/c14eefc1c70d14ad2d0f768ff9f13222fa8d435c

@0intro 0intro added the OS-Plan9 label Oct 12, 2017

@0intro 0intro added this to the Go1.10 milestone Oct 12, 2017

@0intro 0intro self-assigned this Oct 12, 2017

@0intro

This comment has been minimized.

Member

0intro commented Oct 12, 2017

After disabling the use of template thread in newm, the build passes. However, I get another issue when running the net tests:

fatal error: runtime·lock: lock count
fatal error: runtime·unlock: lock count
fatal error: fatal error: systemstack called from unexpected goroutineruntime·lock: lock count

This issue seems to disappear once I add a call to runtime.UnlockOSThread in newAsyncIO. Why is it necessary?

Also, why the template thread doesn't work on Plan 9?

@aclements

@0intro 0intro added the help wanted label Oct 12, 2017

@gopherbot

This comment has been minimized.

gopherbot commented Oct 12, 2017

Change https://golang.org/cl/70231 mentions this issue: runtime: disable use of template thread on Plan 9

@gopherbot

This comment has been minimized.

gopherbot commented Oct 12, 2017

Change https://golang.org/cl/70450 mentions this issue: internal/poll: call UnlockOSThread in Plan 9 netpoller

@0intro

This comment has been minimized.

Member

0intro commented Oct 12, 2017

I propose CL 70231 and CL 70450 which should get the Plan 9 builders back to "ok".

@0intro

This comment has been minimized.

Member

0intro commented Oct 13, 2017

The "lock count" panics are related to the change to terminate locked OS threads in CL 46038 (see goexit0).

As I said earlier, this issue can be worked around by calling runtime.UnlockOSThread in newAsyncIO (see CL 70450), so the OS threads are reused instead of being terminated.

I'm not sure why I get "lock count" panics when locked OS threads are terminated.

@0intro

This comment has been minimized.

Member

0intro commented Oct 13, 2017

Any help would be kindly appreciated.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Oct 13, 2017

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Oct 13, 2017

The "lock count" panic means that the lock count of the m is negative. That clearly shouldn't happen, so the question is why it does happen. Can you get a stack trace of where it happens?

@0intro

This comment has been minimized.

Member

0intro commented Oct 13, 2017

I couldn't manage to get a proper stack trace. It looks like the stacks are corrupted and it usually ends with a recursive panic.

The output usually looks like:

fatal error: runtime·lock: lock count
panic during panic
fatal error: systemstack called from unexpected goroutine

runtime stack:

stack trace unavailable
fatal error: systemstack called from unexpected goroutine
fatal error: runtime·lock: lock count
fatal error: systemstack called from unexpected goroutine
fatal error: systemstack called from unexpected goroutine
[...]
fatal error: systemstack called from unexpected goroutine
fatal error: systemstack called from unexpected goroutine
runtime: newstack called from g=0x1061ee00
	m=0x10620000 m->curg=0x10746fc0 m->g0=0x1061e0e0 m->gsignal=0x1061e000
runtime.recordForPanic(0x1ae92f, 0xd, 0xd)
	/usr/go/src/runtime/print.go:40 +0x1d
runtime.gwrite(0x1ae92f, 0xd, 0xd)
	/usr/go/src/runtime/print.go:89 +0x3c
runtime.printstring(0x1ae92f, 0xd)
	/usr/go/src/runtime/print.go:236 +0x6c
runtime.throw(0x1b7412, 0x2c)
	/usr/go/src/runtime/panic.go:610 +0x1e
runtime.badsystemstack()
	/usr/go/src/runtime/stubs.go:57 +0x2a
runtime.systemstack(0x1bb398)
	/usr/go/src/runtime/asm_386.s:371 +0x2e
runtime.startpanic()
	/usr/go/src/runtime/panic.go:589 +0x11
runtime.throw(0x1b7412, 0x2c)
	/usr/go/src/runtime/panic.go:615 +0x5e
[...]
fatal error: runtime: wrong goroutine in newstack

It looks like there is an issue in the way OS threads are terminated on Plan 9 and the stacks end being corrupted.

@golang golang deleted a comment Oct 14, 2017

@gopherbot

This comment has been minimized.

gopherbot commented Oct 16, 2017

Change https://golang.org/cl/71230 mentions this issue: runtime: don't terminate locked OS threads on Plan 9

@0intro

This comment has been minimized.

Member

0intro commented Oct 16, 2017

I've replaced CL 70450 by CL 71230 so it makes more clear what is the problem worked around in this change.

I still don't know why termination of OS threads isn't working properly on Plan9. I'm investigating.

@0intro

This comment has been minimized.

Member

0intro commented Oct 16, 2017

To sum it up, there are two issues on Plan 9 caused by the recent runtime changes.

  • CL 46033, which implements "template threads", lead to fault read/write panic when running os.StartProcess (worked around in CL 70231).
  • CL 46037 and CL 46038, which implement termination of locked OS thread, lead to lock count panic when using runtime.LockOSThread without calling runtime.UnlockOSThread (notably in the net tests) (worked around in CL 71230).

The first issue prevents Go to be compiled because cmd/go uses os.StartProcess, which uses runtime.LockOSThread. Once the first issue worked around, the second issues prevents Go to complete the net tests, because implementation of network timelines use runtime.LockOSThread.

gopherbot pushed a commit that referenced this issue Oct 17, 2017

runtime: disable use of template thread on Plan 9
CL 46033 added a "template thread" mechanism to
allow creation of thread with a known-good state
from a thread of unknown state.

However, we are experiencing issues on Plan 9
with programs using the os/exec and net package.
These package are relying on runtime.LockOSThread.

Updates #22227.

Change-Id: I85b71580a41df9fe8b24bd8623c064b6773288b0
Reviewed-on: https://go-review.googlesource.com/70231
Run-TryBot: David du Colombier <0intro@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>

gopherbot pushed a commit that referenced this issue Oct 17, 2017

runtime: don't terminate locked OS threads on Plan 9
CL 46037 and CL 46038 implemented termination of
locked OS threads when the goroutine exits.

However, this behavior leads to crashes of Go programs
using runtime.LockOSThread on Plan 9. This is notably
the case of the os/exec and net packages.

This change disables termination of locked OS threads
on Plan 9.

Updates #22227.

Change-Id: If9fa241bff1c0b68e7e9e321e06e5203b3923212
Reviewed-on: https://go-review.googlesource.com/71230
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>

@rsc rsc modified the milestones: Go1.10, Go1.11 Nov 22, 2017

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Jul 9, 2018

@0intro Do you know what the status of this issue is? Thanks.

@0intro

This comment has been minimized.

Member

0intro commented Jul 10, 2018

The issue has been worked around with CL 70231 and CL 71230. Unfortunately, I've not been able to find a better solution yet.

Maybe we can postpone this issue to Go 1.12.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment