runtime: signal arriving during needm/dropm invokes Go signal handler in bad state #12277

Closed
anacrolix opened this Issue Aug 23, 2015 · 13 comments

Comments

Projects
None yet
8 participants
Contributor

anacrolix commented Aug 23, 2015

While running go-fuzz, I sent it SIGINT, and this happened:

runtime: newstack called from g=859541700608
    m=0xc8201e2000 m->curg=0xc8202f2f00 m->g0=0xc820ad8180 m->gsignal=0xc820ad8000
runtime.sigtrampgo(0x63e70, 0x20000001e, 0xc82018ef88, 0xc82018eff0)
    /Users/anacrolix/src/go.master/src/runtime/signal_darwin.go:68 +0xc1
runtime.sigtramp(0xf, 0x84, 0x40000000e, 0xc82026a360, 0x0, 0xc82026a360, 0x6, 0x32, 0xc82026a360, 0x0, ...)
    /Users/anacrolix/src/go.master/src/runtime/sys_darwin_amd64.s:237 +0x23
fatal error: runtime: wrong goroutine in newstack

runtime stack:
runtime.throw(0x511be0, 0x24)
    /Users/anacrolix/src/go.master/src/runtime/panic.go:527 +0x90
runtime.newstack()
    /Users/anacrolix/src/go.master/src/runtime/stack1.go:675 +0x266
runtime.morestack()
    /Users/anacrolix/src/go.master/src/runtime/asm_amd64.s:330 +0x7f

$ go version
go version go1.5 darwin/amd64

@ianlancetaylor ianlancetaylor added this to the Go1.5.1 milestone Aug 25, 2015

Contributor

ianlancetaylor commented Aug 25, 2015

Member

minux commented Aug 25, 2015

Contributor

randall77 commented Aug 25, 2015

Maybe, but then would all code called from sigtramp need to be marked nosplit? Just hunting for an example, tracebacktrap is not marked nosplit as well.

Contributor

rsc commented Aug 25, 2015

sighandler runs on the signal stack with g = the signal goroutine. If the handler is running off the signal stack, there is a bug. We should not mark things nosplit; that will just hide the bug.

Contributor

rsc commented Aug 25, 2015

Can you reproduce this?
Was this in go-fuzz itself or in a binary go-fuzz built?
Can you share steps to reproduce? Or the binary that broke?

If this is in a go-fuzz-built binary, not go-fuzz itself, I wonder if go-fuzz is trying to instrument package runtime and broke the signal handlers somehow.

The decimal g= below really is 0xc820ad8000 aka m->gsignal, so that part is right. But obviously the stack bounds are not. That means that somehow the code is either on the wrong stack or overflowed the signal stack. The latter could happen if the signal handler somehow caused a new signal to arrive and be handled. I thought we had signals masked off such that that isn't allowed and would just be a plain OS-level process crash, but maybe some signals can leak through. If it happened recursively repeatedly, you'd blow out the signal stack.

@dvyukov

Member

dvyukov commented Aug 25, 2015

go-fuzz does not instument the following list of packages:

    ignore := map[string]bool{
        "runtime":       true, // lots of non-determinism and irrelevant code paths (e.g. different paths in mallocgc, chans and maps)
        "unsafe":        true, // nothing to see here (also creates import cycle with go-fuzz-dep)
        "errors":        true, // nothing to see here (also creates import cycle with go-fuzz-dep)
        "syscall":       true, // creates import cycle with go-fuzz-dep (and probably nothing to see here)
        "sync":          true, // non-deterministic and not interesting (also creates import cycle with go-fuzz-dep)
        "sync/atomic":   true, // not interesting (also creates import cycle with go-fuzz-dep)
        "time":          true, // creates import cycle with go-fuzz-dep
        "runtime/cgo":   true, // why would we instrument it?
        "runtime/pprof": true, // why would we instrument it?
        "runtime/race":  true, // why would we instrument it?
    }
    if runtime.GOOS == "windows" {
        // Cross-compilation is not implemented.
        ignore["unicode/utf16"] = true                     // syscall depends on unicode/utf16
        ignore["internal/syscall/windows/registry"] = true // time depends on this
        ignore["io"] = true                                // internal/syscall/windows/registry depends on this
    }
Contributor

anacrolix commented Aug 26, 2015

I think go-fuzz was slow to terminate, and I sent several SIGINTs (ctrl+C) in quick succession. I think I was go-fuzzing github.com/anacrolix/torrent/bencode on OSX, though there's nothing unusual about that package. I don't currently have time to try and reproduce, I can revisit this in a week if the issue hasn't progressed.

Contributor

jnjackins commented Aug 29, 2015

I am getting this bug, without using go-fuzz.

go version devel +805e56e Sat Aug 29 02:43:57 2015 +0000 darwin/amd64

Darwin Kernel Version 14.4.0 (rsc's pprof patch applied)

To reproduce:

  1. pull latest golang.org/x/exp (commit 574adff1580fca7a18de551f4102549e9e2ba3f4)
  2. make the following changes:
$ git diff
diff --git a/shiny/example/goban/main.go b/shiny/example/goban/main.go
index d4a1df8..bac3ec1 100644
--- a/shiny/example/goban/main.go
+++ b/shiny/example/goban/main.go
@@ -20,6 +20,8 @@ import (
        stdDraw "image/draw"
        "log"
        "math/rand"
+       "os"
+       "runtime/pprof"
        "time"

        "golang.org/x/exp/shiny/driver"
@@ -41,6 +43,16 @@ func main() {
        rand.Seed(int64(time.Now().Nanosecond()))
        board := NewBoard(9, *scale)

+       f, _ := os.Create("prof.out")
+       if err := pprof.StartCPUProfile(f); err == nil {
+               go func() {
+                       time.Sleep(1 * time.Minute)
+                       pprof.StopCPUProfile()
+                       os.Exit(0)
+               }()
+       } else {
+               log.Fatal(err)
+       }
        driver.Main(func(s screen.Screen) {
                w, err := s.NewWindow(nil)
                if err != nil {
  1. run:
$ go run board.go main.go xy.go 
2015/08/29 13:03:41 sendLifecycle: StageAlive
2015/08/29 13:03:41 sendLifecycle: StageVisible
2015/08/29 13:03:41 sendLifecycle: StageFocused
2015/08/29 13:03:51 sendLifecycle: StageVisible
runtime: newstack called from g=0xc820000300
    m=0xc82002e000 m->curg=0xc820000600 m->g0=0xc820000480 m->gsignal=0xc820000300
runtime.sigtrampgo(0x4044ac0, 0x1b0000001e, 0xb06b7970, 0xb06b79d8)
    /Users/jnj/src/go/src/runtime/signal_darwin.go:68 +0xc1
runtime.sigtramp(0x0, 0x0, 0x0, 0x200003500000085, 0x114c32000, 0x0, 0xb06b7aa8, 0xb06b7a90, 0x0, 0xb06b7aa8, ...)
    /Users/jnj/src/go/src/runtime/sys_darwin_amd64.s:237 +0x23
fatal error: runtime: wrong goroutine in newstack

runtime stack:
runtime.throw(0x420cfe0, 0x24)
    /Users/jnj/src/go/src/runtime/panic.go:527 +0x90
runtime.newstack()
    /Users/jnj/src/go/src/runtime/stack1.go:676 +0x260
runtime.morestack()
    /Users/jnj/src/go/src/runtime/asm_amd64.s:330 +0x7f

goroutine 17 [syscall, locked to thread]:
runtime.goexit()
    /Users/jnj/src/go/src/runtime/asm_amd64.s:1739 +0x1 fp=0xc82003cfb8 sp=0xc82003cfb0

goroutine 1 [syscall, locked to thread]:
golang.org/x/exp/shiny/driver/gldriver._Cfunc_startDriver()
    golang.org/x/exp/shiny/driver/gldriver/_obj/_cgo_gotypes.go:140 +0x31
golang.org/x/exp/shiny/driver/gldriver.main(0xc821edaf50, 0x0, 0x0)
    /Users/jnj/src/golang.org/x/exp/shiny/driver/gldriver/cocoa.go:72 +0x193
golang.org/x/exp/shiny/driver/gldriver.Main(0xc821edaf50)
    /Users/jnj/src/golang.org/x/exp/shiny/driver/gldriver/gldriver.go:26 +0x30
golang.org/x/exp/shiny/driver.main(0xc821edaf50)
    /Users/jnj/src/golang.org/x/exp/shiny/driver/driver_darwin.go:15 +0x21
golang.org/x/exp/shiny/driver.Main(0xc821edaf50)
    /Users/jnj/src/golang.org/x/exp/shiny/driver/driver.go:24 +0x21
main.main()
    /Users/jnj/src/golang.org/x/exp/shiny/example/goban/main.go:122 +0x14e

goroutine 34 [syscall]:
runtime.CPUProfile(0x0, 0x0, 0x0)
    /Users/jnj/src/go/src/runtime/cpuprof.go:418 +0x34
runtime/pprof.profileWriter(0x47a4210, 0xc820090070)
    /Users/jnj/src/go/src/runtime/pprof/pprof.go:599 +0x1c
created by runtime/pprof.StartCPUProfile
    /Users/jnj/src/go/src/runtime/pprof/pprof.go:593 +0x145

goroutine 35 [sleep]:
time.Sleep(0xdf8475800)
    /Users/jnj/src/go/src/runtime/time.go:59 +0xf9
main.main.func1()
    /Users/jnj/src/golang.org/x/exp/shiny/example/goban/main.go:49 +0x26
created by main.main
    /Users/jnj/src/golang.org/x/exp/shiny/example/goban/main.go:52 +0x10f

goroutine 36 [chan receive]:
main.main.func2(0x59108d0, 0x42c9360)
    /Users/jnj/src/golang.org/x/exp/shiny/example/goban/main.go:74 +0x35e
golang.org/x/exp/shiny/driver/gldriver.driverStarted.func1()
    /Users/jnj/src/golang.org/x/exp/shiny/driver/gldriver/cocoa.go:79 +0x47
created by golang.org/x/exp/shiny/driver/gldriver.driverStarted
    /Users/jnj/src/golang.org/x/exp/shiny/driver/gldriver/cocoa.go:81 +0x2b

goroutine 4 [select]:
golang.org/x/exp/shiny/driver/internal/pump.(*Pump).run(0xc82000e1a0)
    /Users/jnj/src/golang.org/x/exp/shiny/driver/internal/pump/pump.go:62 +0x4df
created by golang.org/x/exp/shiny/driver/internal/pump.Make
    /Users/jnj/src/golang.org/x/exp/shiny/driver/internal/pump/pump.go:15 +0x116

goroutine 5 [select, locked to thread]:
golang.org/x/exp/shiny/driver/gldriver.drawLoop(0xc82006c0e0, 0x4402a60)
    /Users/jnj/src/golang.org/x/exp/shiny/driver/gldriver/cocoa.go:124 +0x2a6
created by golang.org/x/exp/shiny/driver/gldriver.(*screenImpl).NewWindow
    /Users/jnj/src/golang.org/x/exp/shiny/driver/gldriver/screen.go:103 +0x28e

goroutine 50 [syscall, locked to thread]:
runtime.goexit()
    /Users/jnj/src/go/src/runtime/asm_amd64.s:1739 +0x1

goroutine 51 [syscall, locked to thread]:
runtime.goexit()
    /Users/jnj/src/go/src/runtime/asm_amd64.s:1739 +0x1
exit status 2

It seems to take a random amount of time for the bug to manifest (up to 10 seconds).

@ianlancetaylor ianlancetaylor modified the milestones: Go1.5.2, Go1.5.1 Sep 8, 2015

Contributor

rsc commented Nov 13, 2015

@jnjackins, thanks for the goban example!

CL https://golang.org/cl/16915 mentions this issue.

@rsc rsc changed the title from runtime: fatal error: wrong goroutine in newstack to runtime: signal arriving during needm/dropm invokes Go signal handler in bad state Nov 17, 2015

@rsc rsc closed this in f8e6418 Nov 18, 2015

CL https://golang.org/cl/17072 mentions this issue.

rsc added a commit that referenced this issue Nov 20, 2015

runtime: make asmcgocall work without a g
Solaris needs to make system calls without a g,
and Solaris uses asmcgocall to make system calls.
I know, I know.

I hope this makes CL 16915, fixing #12277, work on Solaris.

Change-Id: If988dfd37f418b302da9c7096f598e5113ecea87
Reviewed-on: https://go-review.googlesource.com/17072
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Aram Hăvărneanu <aram@mgk.ro>
Run-TryBot: Russ Cox <rsc@golang.org>

CL https://golang.org/cl/17141 mentions this issue.

CL https://golang.org/cl/17129 mentions this issue.

rsc added a commit that referenced this issue Nov 23, 2015

[release-branch.go1.5] runtime: make asmcgocall work without a g
Solaris needs to make system calls without a g,
and Solaris uses asmcgocall to make system calls.
I know, I know.

I hope this makes CL 16915, fixing #12277, work on Solaris.

Change-Id: If988dfd37f418b302da9c7096f598e5113ecea87
Reviewed-on: https://go-review.googlesource.com/17072
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Aram Hăvărneanu <aram@mgk.ro>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-on: https://go-review.googlesource.com/17129

rsc added a commit that referenced this issue Nov 23, 2015

[release-branch.go1.5] runtime: fix bad signal stack when using cgo-c…
…reated threads and async signals

Cgo-created threads transition between having associated Go g's and m's and not.
A signal arriving during the transition could think it was safe and appropriate to
run Go signal handlers when it was in fact not.
Avoid the race by masking all signals during the transition.

Fixes #12277.

Change-Id: Ie9711bc1d098391d58362492197a7e0f5b497d14
Reviewed-on: https://go-review.googlesource.com/16915
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-on: https://go-review.googlesource.com/17141

@gopherbot gopherbot locked and limited conversation to collaborators Nov 27, 2016

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.