Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: "morestack on g0" in TestSegv on darwin-amd64 builders #39457

Open
bcmills opened this issue Jun 8, 2020 · 6 comments
Open

runtime: "morestack on g0" in TestSegv on darwin-amd64 builders #39457

bcmills opened this issue Jun 8, 2020 · 6 comments

Comments

@bcmills
Copy link
Member

@bcmills bcmills commented Jun 8, 2020

2020-06-08T17:59:37-2603d9a/darwin-amd64-race

--- FAIL: TestSegv (0.00s)
    --- FAIL: TestSegv/Segv (0.02s)
        crash_test.go:105: /var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/tmp/go-build172134279/testprogcgo.exe SegvInCgo exit status: exit status 2
        crash_cgo_test.go:569: fatal: morestack on g0
            SIGTRAP: trace trap
            PC=0x406b702 m=0 sigcode=1
            
            goroutine 0 [idle]:
            runtime.abort()
            	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/runtime/asm_amd64.s:860 +0x2
            runtime.morestack()
            	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/runtime/asm_amd64.s:416 +0x25
            
            goroutine 19 [syscall]:
            runtime.cgocall(0x4123600, 0xc00003a7c0, 0x4123600)
            	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/runtime/cgocall.go:133 +0x5b fp=0xc00003a790 sp=0xc00003a758 pc=0x400503b
            main._Cfunc_nop()
            	_cgo_gotypes.go:329 +0x45 fp=0xc00003a7c0 sp=0xc00003a790 pc=0x411a2a5
            main.SegvInCgo.func1(0xc00008e120)
            	/private/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/runtime/testdata/testprogcgo/segv.go:46 +0x30 fp=0xc00003a7d8 sp=0xc00003a7c0 pc=0x41224b0
            runtime.goexit()
            	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc00003a7e0 sp=0xc00003a7d8 pc=0x406b8e1
            created by main.SegvInCgo
            	/private/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/runtime/testdata/testprogcgo/segv.go:43 +0x5c
            
            goroutine 1 [sleep]:
            time.Sleep(0x3b9aca00)
            	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/runtime/time.go:188 +0xbf
            main.SegvInCgo()
            	/private/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/runtime/testdata/testprogcgo/segv.go:55 +0x9c
            main.main()
            	/private/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/runtime/testdata/testprogcgo/main.go:34 +0x1da
            
            rax    0x17
            rbx    0xc00003a710
            rcx    0x4265d40
            rdx    0x0
            rdi    0x2
            rsi    0xc00003a6b0
            rbp    0xc00003a780
            rsp    0xc00003a738
            r8     0x4265d40
            r9     0x0
            r10    0xc00003a710
            r11    0x202
            r12    0xf1
            r13    0x0
            r14    0x418de44
            r15    0x0
            rip    0x406b702
            rflags 0x202
            cs     0x2b
            fs     0x0
            gs     0x0
            
        crash_cgo_test.go:571: expected crash from signal
FAIL
FAIL	runtime	69.144s

CC @aclements @ianlancetaylor @cherrymui

@bcmills bcmills added this to the Backlog milestone Jun 8, 2020
@golang golang deleted a comment Jun 10, 2020
@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Jun 17, 2020

On darwin/amd64, to work around a kernel issue we rewrite SI_USER SIGEGV to kernel-generated: https://tip.golang.org/src/runtime/signal_darwin_amd64.go#L72 . So, in this case, an actual user-sent SIGSEGV will be treated as kernel-generated signal, and cause it to inject a sigpanic. If the signal lands at a bad time, e.g. we're right in the middle of a stack switch, where the g and the stack don't match, bad things will happen.

I'm not sure what the best solution is. A few possibilities:

  • do nothing (maybe skip/relax the test). It isn't too bad in that it will crash the program anyway (sigpanic will throw for this particular bad address), unless PanicOnFault is set.
  • remove the workaround (at least the sigcode part, we could still change the faulting address). A malformed address will be treated as user-sent SIGSEGV, which will crash the program now. PanicOnFault is still a problem.

Not sure what to do with PanicOnFault. Due to the kernel issue, it seems we cannot distinguish malformed address vs. user-sent SIGSEGV. We have to make both recoverable or non-recoverable...

(The workaround was added for OS X 10.9. The kernel issue seems still there for macOS 10.15...)

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Jun 17, 2020

Another possibility: when switching from user stack to system stack (e.g. in systemstack, asmcgocall, etc.), we always do (1) set user g's g.throwsplit to true, (2) change SP, (3) change the g register to g0. And do it in the opposite order when switching back. This might solve the immediate SIGSEGV-landing-in-stack-switch problem. Not sure if there is any other problem. Seems pretty complicated, though.

@bcmills bcmills changed the title runtime: "morestack on g0" in TestSegv on darwin-amd64-race builder runtime: "morestack on g0" in TestSegv on darwin-amd64 builders Aug 31, 2020
@bcmills
Copy link
Member Author

@bcmills bcmills commented Aug 31, 2020

Hmm... Why do we ignore user-generated SIGSEGV signals in the first place? I explicitly sent a program SIGSEGV on the command line, I would generally expect to get a core dump (since that is the SIG_DFL behavior of the signal to begin with).

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Aug 31, 2020

We don't ignore user-generated SIGSEGV signals. That's the point of the test. I'm not sure what you are saying here.

The test failure logs suggest that the problem is that we somehow think that we are out of stack space while handling a signal. I'm not sure how that could happen.

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Aug 31, 2020

In my experience, "morestack on g0" is usually not we are actually running out of stack space on g0, but somehow the SP and and the G (and so the stack bounds) don't match. My comment above mentioned some possibilities, e.g. signal lands right in the middle of a stack switch.

As @ianlancetaylor said, we don't ignore user-generated SIGSEGV (we did it in the past, but not now). The specialness of darwin is that we treat user-generated SIGSEGV (which should crash the runtime) as kernel-generated (which causes a panic), due to a kernel bug ( https://tip.golang.org/src/runtime/signal_darwin_amd64.go#L72 ). Because of that, we inject a sigpanic call, instead of just throw, and somewhere down the panic path there are non-nosplit functions that check stack bounds. If the G and stack bounds don't match, it could crash like this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants