Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: unexpected return pc for runtime.sigpanic #23576

Closed
josharian opened this issue Jan 27, 2018 · 15 comments

Comments

Projects
None yet
6 participants
@josharian
Copy link
Contributor

commented Jan 27, 2018

This is probably a duplicate of some other recent runtime issues, but it happens to be very easily reproduced using Go 1.10rc1. I'm sure it can be minimized further, but I am almost out of dev time for several days, so I wanted to get it up here ASAP in case it was helpful for @aclements.

Repro: Grab https://gist.github.com/josharian/cec3dbe91b2cffaf7419f53757cce3d3 and run it from inside any git repo. (See https://github.com/libgit2/git2go if there are questions about how to install git2go.) Pasted code:

package main

import (
	"os"

	git "gopkg.in/libgit2/git2go.v26"
)

func main() {
	pwd, err := os.Getwd()
	check(err)

	repo, err := git.OpenRepositoryExtended(pwd, 0, "")
	check(err)

	rw, err := repo.Walk()
	check(err)

	rw.Push(nil)
}

func check(err error) {
	if err != nil {
		panic(err)
	}
}

The resulting crash is obviously programmer error (rw.Push expects a non-nil parameter), but the "unexpected return pc" part shouldn't be present.

Result:

$ go run x.go
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x42cafcd]

runtime stack:
runtime: unexpected return pc for runtime.sigpanic called from 0x42cafcd
stack: frame={sp:0x7ffeefbff510, fp:0x7ffeefbff560} stack=[0x7ffeefb80350,0x7ffeefbff7d0)
00007ffeefbff410:  00007ffeefbff480  000000000402aeec <runtime.dopanic_m+540> 
00007ffeefbff420:  000000000402a991 <runtime.throw+129>  00007ffeefbff4f0 
00007ffeefbff430:  0000000000000000  0000000004214f80 
00007ffeefbff440:  010000000402b448  0000000000000004 
00007ffeefbff450:  000000000000001f  00000000042cafcd 
00007ffeefbff460:  0000000000000000  0000000000000001 
00007ffeefbff470:  000000000415e881  000000c420000180 
00007ffeefbff480:  00007ffeefbff4a8  000000000404f7bc <runtime.dopanic.func1+60> 
00007ffeefbff490:  0000000004214f80  000000000402a991 <runtime.throw+129> 
00007ffeefbff4a0:  00007ffeefbff4f0  00007ffeefbff4e0 
00007ffeefbff4b0:  000000000402a8ba <runtime.dopanic+74>  00007ffeefbff4c0 
00007ffeefbff4c0:  000000000404f780 <runtime.dopanic.func1+0>  0000000004214f80 
00007ffeefbff4d0:  000000000402a991 <runtime.throw+129>  00007ffeefbff4f0 
00007ffeefbff4e0:  00007ffeefbff500  000000000402a991 <runtime.throw+129> 
00007ffeefbff4f0:  0000000000000000  000000000000002a 
00007ffeefbff500:  00007ffeefbff550  000000000403d39e <runtime.sigpanic+654> 
00007ffeefbff510: <0000000004160a85  000000000000002a 
00007ffeefbff520:  0000000000000000  0000000000000000 
00007ffeefbff530:  0000000000000000  0000000004214f80 
00007ffeefbff540:  0000000000000000  0000000000000000 
00007ffeefbff550:  00007ffeefbff560 !00000000042cafcd 
00007ffeefbff560: >00007ffeefbff600  00000000042c7287 
00007ffeefbff570:  0000000000000000  0000000000000000 
00007ffeefbff580:  0000000000000000  0000000000000000 
00007ffeefbff590:  0000000000000000  0000000000000000 
00007ffeefbff5a0:  0000000000000000  0000000004611750 
00007ffeefbff5b0:  0000000000000000  0000020100000280 
00007ffeefbff5c0:  0000000000000000  0000000004611758 
00007ffeefbff5d0:  909597a7fa2e00c2  00000000046369a0 
00007ffeefbff5e0:  0000000004611730  0000000000000000 
00007ffeefbff5f0:  0000000000000000  00007ffeefbff658 
00007ffeefbff600:  00007ffeefbff630  00000000042c71fb 
00007ffeefbff610:  0000000000000000  0000000000000028 
00007ffeefbff620:  0000000000000000  00007ffeefbff6d8 
00007ffeefbff630:  00007ffeefbff6b0  00000000042c568e 
00007ffeefbff640:  00000000ffffffff  00000000046369a0 
00007ffeefbff650:  fffffffeefbff6a0  0000000000000000 
runtime.throw(0x4160a85, 0x2a)
	/Users/josh/go/1.10/src/runtime/panic.go:619 +0x81
runtime: unexpected return pc for runtime.sigpanic called from 0x42cafcd
stack: frame={sp:0x7ffeefbff510, fp:0x7ffeefbff560} stack=[0x7ffeefb80350,0x7ffeefbff7d0)
00007ffeefbff410:  00007ffeefbff480  000000000402aeec <runtime.dopanic_m+540> 
00007ffeefbff420:  000000000402a991 <runtime.throw+129>  00007ffeefbff4f0 
00007ffeefbff430:  0000000000000000  0000000004214f80 
00007ffeefbff440:  010000000402b448  0000000000000004 
00007ffeefbff450:  000000000000001f  00000000042cafcd 
00007ffeefbff460:  0000000000000000  0000000000000001 
00007ffeefbff470:  000000000415e881  000000c420000180 
00007ffeefbff480:  00007ffeefbff4a8  000000000404f7bc <runtime.dopanic.func1+60> 
00007ffeefbff490:  0000000004214f80  000000000402a991 <runtime.throw+129> 
00007ffeefbff4a0:  00007ffeefbff4f0  00007ffeefbff4e0 
00007ffeefbff4b0:  000000000402a8ba <runtime.dopanic+74>  00007ffeefbff4c0 
00007ffeefbff4c0:  000000000404f780 <runtime.dopanic.func1+0>  0000000004214f80 
00007ffeefbff4d0:  000000000402a991 <runtime.throw+129>  00007ffeefbff4f0 
00007ffeefbff4e0:  00007ffeefbff500  000000000402a991 <runtime.throw+129> 
00007ffeefbff4f0:  0000000000000000  000000000000002a 
00007ffeefbff500:  00007ffeefbff550  000000000403d39e <runtime.sigpanic+654> 
00007ffeefbff510: <0000000004160a85  000000000000002a 
00007ffeefbff520:  0000000000000000  0000000000000000 
00007ffeefbff530:  0000000000000000  0000000004214f80 
00007ffeefbff540:  0000000000000000  0000000000000000 
00007ffeefbff550:  00007ffeefbff560 !00000000042cafcd 
00007ffeefbff560: >00007ffeefbff600  00000000042c7287 
00007ffeefbff570:  0000000000000000  0000000000000000 
00007ffeefbff580:  0000000000000000  0000000000000000 
00007ffeefbff590:  0000000000000000  0000000000000000 
00007ffeefbff5a0:  0000000000000000  0000000004611750 
00007ffeefbff5b0:  0000000000000000  0000020100000280 
00007ffeefbff5c0:  0000000000000000  0000000004611758 
00007ffeefbff5d0:  909597a7fa2e00c2  00000000046369a0 
00007ffeefbff5e0:  0000000004611730  0000000000000000 
00007ffeefbff5f0:  0000000000000000  00007ffeefbff658 
00007ffeefbff600:  00007ffeefbff630  00000000042c71fb 
00007ffeefbff610:  0000000000000000  0000000000000028 
00007ffeefbff620:  0000000000000000  00007ffeefbff6d8 
00007ffeefbff630:  00007ffeefbff6b0  00000000042c568e 
00007ffeefbff640:  00000000ffffffff  00000000046369a0 
00007ffeefbff650:  fffffffeefbff6a0  0000000000000000 
runtime.sigpanic()
	/Users/josh/go/1.10/src/runtime/signal_unix.go:372 +0x28e

goroutine 1 [syscall, locked to thread]:
runtime.cgocall(0x410adb0, 0xc420067eb8, 0x4162d70)
	/Users/josh/go/1.10/src/runtime/cgocall.go:128 +0x64 fp=0xc420067e88 sp=0xc420067e50 pc=0x4004b64
gopkg.in/libgit2/git2go%2ev26._Cfunc_git_revwalk_push(0x4636760, 0x0, 0x0)
	_cgo_gotypes.go:6303 +0x4d fp=0xc420067eb8 sp=0xc420067e88 pc=0x40fcd3d
gopkg.in/libgit2/git2go%2ev26.(*RevWalk).Push.func1(0x4636760, 0x0, 0xc42000e030)
	/Users/josh/src/gopkg.in/libgit2/git2go.v26/walk.go:44 +0x60 fp=0xc420067ef0 sp=0xc420067eb8 pc=0x4102dd0
gopkg.in/libgit2/git2go%2ev26.(*RevWalk).Push(0xc4200105a0, 0x0, 0x0, 0x0)
	/Users/josh/src/gopkg.in/libgit2/git2go.v26/walk.go:44 +0x65 fp=0xc420067f18 sp=0xc420067ef0 pc=0x4101ee5
main.main()
	/Users/josh/go/tip/src/x.go:19 +0xe3 fp=0xc420067f88 sp=0xc420067f18 pc=0x4104023
runtime.main()
	/Users/josh/go/1.10/src/runtime/proc.go:198 +0x212 fp=0xc420067fe0 sp=0xc420067f88 pc=0x402c202
runtime.goexit()
	/Users/josh/go/1.10/src/runtime/asm_amd64.s:2361 +0x1 fp=0xc420067fe8 sp=0xc420067fe0 pc=0x4053491
exit status 2

System details

go version go1.10rc1 darwin/amd64
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/josh/Library/Caches/go-build"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/josh"
GORACE=""
GOROOT="/Users/josh/go/1.10"
GOTMPDIR=""
GOTOOLDIR="/Users/josh/go/1.10/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/1t/n61cbvls5bl293bbb0zyypqw0000gn/T/go-build301881037=/tmp/go-build -gno-record-gcc-switches -fno-common"
GOROOT/bin/go version: go version go1.10rc1 darwin/amd64
GOROOT/bin/go tool compile -V: compile version go1.10rc1
uname -v: Darwin Kernel Version 17.3.0: Thu Nov  9 18:09:22 PST 2017; root:xnu-4570.31.3~1/RELEASE_X86_64
ProductName:	Mac OS X
ProductVersion:	10.13.2
BuildVersion:	17C205
lldb --version: lldb-900.0.64
  Swift-4.0
gdb --version: GNU gdb (GDB) 8.0.1

@josharian josharian added this to the Go1.10 milestone Jan 27, 2018

@aclements

This comment has been minimized.

Copy link
Member

commented Jan 27, 2018

Interesting. Thanks for the repro. This happened recently on linux/arm on the build dashboard, too: https://build.golang.org/log/4552a24f91dc893759cce93184c415f3986fcbc6

I haven't dug into this very deeply, but it sort of looks like we're injecting a sigpanic call into C code and then attempting to do a stack walk, which of course fails when it tries to unwind beyond the sigpanic. I thought we were supposed to just forward signals that came in during C execution, but perhaps that broke.

/cc @ianlancetaylor @bcmills

@odeke-em

This comment has been minimized.

Copy link
Member

commented Jan 27, 2018

Thank you for the repro @josharian and my apologies for totally forgetting to work on diagnosing and reproducing this bug last year, but on the bright side no one could have done it better than you :)

@josharian

This comment has been minimized.

Copy link
Contributor Author

commented Jan 27, 2018

@odeke-em want to bisect? :)

@odeke-em

This comment has been minimized.

Copy link
Member

commented Jan 27, 2018

@josharian gladly, you've done the heavy lifting already :)

@odeke-em

This comment has been minimized.

Copy link
Member

commented Jan 27, 2018

Obviously commit dbd8f3d aka CL https://go-review.googlesource.com/#/c/89016/ exposes this problem, now my next step is to bisect
where we are perhaps mishandling the panic signal from C back to Go.

And also good news, here is a standalone repro

package main

/*
#include <stdio.h>

void printIt(void *ptr) {
  char *s = (char *)ptr;
  printf(":%c\n", *s);
}
*/
import "C"

func main() {
    C.printIt(nil)
}
@aclements

This comment has been minimized.

Copy link
Member

commented Jan 27, 2018

@odeke-em, thanks for looking into this! I'm not positive that your reproducer is quite the same since it unwinds to asmcgocall, while Josh's only gets to the sigpanic. If it is the same, then I think we've had the underlying issue for a long time. Go 1.8 also cuts off the traceback at asmcgocall and presumably was just silently failing to unwind past there. It would make sense to cut off the traceback at asmcgocall since that performs a system stack switch and thus there's nowhere to unwind to from it most of the time. (But it wouldn't make sense to cut it off at sigpanic, which is what makes me think your reproducer might be different from Josh's in an important way, though it still highlights a bug.)

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Jan 28, 2018

We only forward signals if there is a C signal handler to which to forward them. If there is no C signal handler (the normal case) and if the Go runtime expects to handle the signal itself (the normal case) we don't forward. So, yes, we can definitely insert a call to sigpanic in C code.

Perhaps we should undo the hex dump for 1.10, or skip it when we think we may be executing C code.

@odeke-em

This comment has been minimized.

Copy link
Member

commented Jan 28, 2018

@odeke-em, thanks for looking into this! I'm not positive that your reproducer is quite the same since it unwinds to asmcgocall, while Josh's only gets to the sigpanic.

Thank you @aclements, I'll take a deeper look sometime when I get more bandwidth, it's an interesting issue but I am finding myself having to tear up many CLs and commits :)

We only forward signals if there is a C signal handler to which to forward them. If there is no C signal handler (the normal case) and if the Go runtime expects to handle the signal itself (the normal case) we don't forward.

@ianlancetaylor nice, in deed :)

package main

/*
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>

void sigHandler(const int sig) {
  printf("got this signal: %d\n", sig);
  exit(0);
}

void printIt(void *ptr, int handleSig) {
  if (handleSig) {
    struct sigaction sa;
    sa.sa_handler = sigHandler;
    sigaction(SIGSEGV, &sa, NULL);
  }
  char *s = (char *)ptr;
  printf(":%c\n", *s);
}
*/
import "C"

func main() {
    C.printIt(nil, 1)
}
@aclements

This comment has been minimized.

Copy link
Member

commented Jan 30, 2018

For the original reproducer, the problem is indeed that we're injecting a sigpanic call directly into C, so gentraceback is seeing a return PC from sigpanic that's in C code. For that, I think Ian's right that we should just suppress the hex dump if there might be C code on the stack. I'm not sure what else we could do. Maybe we should only do that if we just saw a sigpanic frame since I think that's the only time this should happen.

For the second reproducer, after switching to the system stack, asmcgocall aligns the stack before calling C code, so gentraceback is reading an uninitialized stack slot and treating it as a return PC. That's not so great and I think we should add asmcgocall to the set of "top of stack" functions since it does the system stack switch.

The difference between the reproducers is rather amusing. In general we should see things like in the original reproducer. However, two GCC optimizations combine to make it appear like asmcgocall called sigpanic in the traceback of the second reproducer: because printIt returns void, GCC optimizes the the generated cgo wrapper into a tail call to printIt, so the wrapper doesn't appear on the stack; and because printIt has a zero-sized frame, printIt's own return PC (which is asmcgocall) appears in the position of sigpanic's return PC, so we also lose printIt from the stack. At that point, all of the actual C code is gone from the stack, so we don't wind up trying to unwind into it. Even changing printIt's return type to int is enough to inhibit the tail-call optimization and make the second reproducer fail like the first one.

@aclements

This comment has been minimized.

Copy link
Member

commented Jan 30, 2018

I think we should add asmcgocall to the set of "top of stack" functions since it does the system stack switch.

It turns out this doesn't work. If the C code calls back in to Go, we switch back to the regular G stack and if we try to unwind from there, we'll see the call to asmcgocall from before it did the system stack switch. We want to be able to unwind through that to the rest of the regular Go stack, so we can't just treat asmcgocall as a top-of-stack function all the time.

@gopherbot

This comment has been minimized.

Copy link

commented Jan 30, 2018

Change https://golang.org/cl/90896 mentions this issue: runtime: fail silently if we unwind over sigpanic into C code

@gopherbot

This comment has been minimized.

Copy link

commented Jan 30, 2018

Change https://golang.org/cl/90895 mentions this issue: runtime: don't unwind past asmcgocall

@bcmills

This comment has been minimized.

Copy link
Member

commented Jan 30, 2018

asmcgocall aligns the stack before calling C code, so gentraceback is reading an uninitialized stack slot and treating it as a return PC.

That's surprising: doesn't asmcgocall leave a valid frame (with a valid saved base-pointer and return address) at the root of the system stack?

I guess the base-pointer and return address can't be saved there because they might be moved during a callback from C. But why does the callback from C need to run on the same G at all? (I think we've talked before about treating C calls as implicitly noescape and picking up a new goroutine stack instead of resuming on the old one.)

It seems like we could save the G in the system stack frame and zero it out in TLS, so that if we get a callback from C we run it on a fresh goroutine. What would that break? (LockOSThread, perhaps?)

At any rate, I think all the things I'm speculating about here are too invasive for a fix yet in 1.10...

@aclements

This comment has been minimized.

Copy link
Member

commented Jan 30, 2018

That's surprising: doesn't asmcgocall leave a valid frame (with a valid saved base-pointer and return address) at the root of the system stack?

It does not and, unfortunately, I'm not sure it even can. The function itself is declared with 0 stack size, but it opens up a 64 byte frame and then aligns it on the system stack. I think traceback is still using the 0 stack size (I would expect the pcsp table to include the 64 byte change, but empirically it may not be; I haven't dug in to that), so we'd have to put the return PC at 0(SP). But that 64 byte frame is some spill location for the Windows fastcall calling convention (though I wasn't able to find documentation on why it needed that), so 0(SP) may be clobbered (if that's indeed what's going on with the 64 byte frame).

I guess the base-pointer and return address can't be saved there because they might be moved during a callback from C.

They're on the system stack, so they'll never get moved.

But why does the callback from C need to run on the same G at all? ... t seems like we could save the G in the system stack frame and zero it out in TLS, so that if we get a callback from C we run it on a fresh goroutine.

I'm not sure if there's much point in running the callback on a new G, but it certainly could run on a new stack, as I think we've talked about. We may not want to pay the cost of allocating a new stack at every C callback, so perhaps a cleverer thing would be to keep using the original G stack, but if we need to grow it, only copy back as far as the last C callback and keep both segments around. Of course, then we're sort of back in segmented stack land and Russ really didn't like the sound of that. :)

gopherbot pushed a commit that referenced this issue Jan 31, 2018

runtime: don't unwind past asmcgocall
asmcgocall switches to the system stack and aligns the SP, so
gentraceback both can't unwind over it when it appears on the system
stack (it'll read some uninitialized stack slot as the return PC).
There's also no point in unwinding over it, so don't.

Updates #23576.

Change-Id: Idfcc9599c7636b80dec5451cb65ae892b4611981
Reviewed-on: https://go-review.googlesource.com/90895
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>

@gopherbot gopherbot closed this in ebe38b8 Jan 31, 2018

@bcmills

This comment has been minimized.

Copy link
Member

commented Jan 31, 2018

I guess the base-pointer and return address can't be saved there because they might be moved during a callback from C.

They're on the system stack, so they'll never get moved.

No, I mean that the stack to which they point might get moved, and the saved register values wouldn't be updated to match (since they would be on the system stack, instead of the goroutine stack being moved).

I think traceback is still using the 0 stack size (I would expect the pcsp table to include the 64 byte change, but empirically it may not be; I haven't dug in to that), so we'd have to put the return PC at 0(SP).

Ah, I was thinking we would construct something that looks like a C stack frame rather than a Go stack frame.

If I understand correctly, today the system stack in asmcgocall looks like this before the CALL instruction:

SP+64: (possible alignment padding)
SP+56: (unused?)
SP+48: g
SP+40: goroutine SP - g.stack.hi (offset into goroutine stack)
SP+32: (unused?)
SP+24: Windows spill slot for R9 / XMM3
SP+16: Windows spill slot for R8 / XMM2
SP+8:  Windows spill slot for RDX / XMM1
SP:    Windows spill slot for RCX / XMM0

If a frame pointer is present, it still points to the goroutine stack at that point, so the root of the stack appears to be an invalid frame. (We know how to restore the caller frame from the current stack frame and the TLS register, but that process doesn't follow the usual C or Go ABIs.)

We could instead make that look like a 64-byte standard (Windows-and-SysV-compatible) call frame:

    SP+64: (possible alignment padding)
    SP+56: return address within asmcgocall
BP, SP+48: saved asmcgocall BP
    SP+40: g
    SP+32: goroutine SP - g.stack.hi (offset into goroutine stack)
    SP+24: Windows spill slot for R9 / XMM3
    SP+16: Windows spill slot for R8 / XMM2
    SP+8:  Windows spill slot for RDX / XMM1
    SP:    Windows spill slot for RCX / XMM0
(BP)+8:  return address from asmcgocall
(BP):    saved caller BP
(BP)-8:  (unused)
…
(BP)-48: (unused)

Then we mark asmcgocall as having a 64-byte frame rather than a 0-byte frame. If you walk frame-pointer chains assuming the usual function epilogue, all of the frame pointers are valid, but you end up with one weird discontinuity where the frame pointer jumps to a different stack. (Fortunately, that discontinuity is almost exactly the same as the one that occurs at the return from a signal stack in ordinary C code: it just happens to trampoline through asmcgocall rather than the usual signal trampoline.)

As far as I can tell, the main problem with that approach is that the saved BP value at SP+48 can become incorrect if a callback from C to Go moves the underlying goroutine stack.

@golang golang locked and limited conversation to collaborators Jan 31, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.