Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: panic on system stack during cgo callback #12238

Closed
noxiouz opened this issue Aug 20, 2015 · 28 comments
Closed

runtime: panic on system stack during cgo callback #12238

noxiouz opened this issue Aug 20, 2015 · 28 comments

Comments

@noxiouz
Copy link

noxiouz commented Aug 20, 2015

I upgraded from 1.4 to 1.5 to have #11907 fixed. And every several minutes I get "panic during panic". That's all that I can see. It always occurs on cgo callback.
Please, let me know which information could be useful, I'll provide it.

panic: runtime error: index out of range
fatal error: panic on system stack

runtime stack:
runtime.throw(0x81b8f0, 0x15)
        /usr/local/go/src/runtime/panic.go:527 +0x90 fp=0x7f133d7f6f38 sp=0x7f133d7f6f20
runtime.gopanic(0x7653c0, 0xc820010030)
        /usr/local/go/src/runtime/panic.go:354 +0xb6 fp=0x7f133d7f6fb8 sp=0x7f133d7f6f38
runtime.panicindex()
        /usr/local/go/src/runtime/panic.go:12 +0x49 fp=0x7f133d7f6fe0 sp=0x7f133d7f6fb8
runtime.gentraceback(0x44801f, 0xc820034a58, 0x0, 0xc820000600, 0x0, 0xc820034a98, 0x20, 0x0, 0x0, 0x0, ...)
        /usr/local/go/src/runtime/traceback.go:255 +0x1206 fp=0x7f133d7f7110 sp=0x7f133d7f6fe0
runtime.callers.func1()
        /usr/local/go/src/runtime/traceback.go:566 +0xa2 fp=0x7f133d7f7190 sp=0x7f133d7f7110
runtime.systemstack(0x7f133d7f7198)
        /usr/local/go/src/runtime/asm_amd64.s:262 +0x79 fp=0x7f133d7f7198 sp=0x7f133d7f7190
runtime.mstart()
        /usr/local/go/src/runtime/proc1.go:674 fp=0x7f133d7f71a0 sp=0x7f133d7f7198

goroutine 17 [running, locked to thread]:
runtime.systemstack_switch()
        /usr/local/go/src/runtime/asm_amd64.s:216 fp=0xc8200349e8 sp=0xc8200349e0
runtime.callers(0x4, 0xc820034a98, 0x20, 0x20, 0x4736b0)
        /usr/local/go/src/runtime/traceback.go:567 +0xb0 fp=0xc820034a58 sp=0xc8200349e8
runtime.mProf_Malloc(0xc820e96000, 0x96000)
        /usr/local/go/src/runtime/mprof.go:235 +0x7f fp=0xc820034bd8 sp=0xc820034a58
runtime.profilealloc(0xc820026000, 0xc820e96000, 0x96000)
        /usr/local/go/src/runtime/malloc.go:811 +0x98 fp=0xc820034c00 sp=0xc820034bd8
runtime.mallocgc(0x96000, 0x0, 0x3, 0xc820652140)
        /usr/local/go/src/runtime/malloc.go:699 +0x5d3 fp=0xc820034cd0 sp=0xc820034c00
runtime.rawstring(0x94fd0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /usr/local/go/src/runtime/string.go:264 +0x70 fp=0xc820034d18 sp=0xc820034cd0
runtime.gostringn(0x7f13101325e8, 0x94fd0, 0x0, 0x0)
        /usr/local/go/src/runtime/string.go:330 +0x48 fp=0xc820034d78 sp=0xc820034d18
github.com/bioothod/elliptics-go/elliptics._Cfunc_GoStringN(0x7f13101325e8, 0xc800094fd0, 0x0, 0x0)
        ??:0 +0x37 fp=0xc820034da0 sp=0xc820034d78
github.com/bioothod/elliptics-go/elliptics.go_stat_callback(0x7f133d7f72a0, 0x380)
        /root/go/src/github.com/bioothod/elliptics-go/elliptics/stat.go:450 +0x240 fp=0xc820034ee0 sp=0xc820034da0
runtime.call32(0x0, 0x7f133d7f71c8, 0x7f133d7f7250, 0x10)
        /usr/local/go/src/runtime/asm_amd64.s:437 +0x3e fp=0xc820034f08 sp=0xc820034ee0
runtime.cgocallbackg1()
        /usr/local/go/src/runtime/cgocall.go:252 +0x10c fp=0xc820034f40 sp=0xc820034f08
runtime.cgocallbackg()
        /usr/local/go/src/runtime/cgocall.go:177 +0xd7 fp=0xc820034fa0 sp=0xc820034f40
panic: runtime error: index out of range
fatal error: panic on system stack
panic during panic

runtime stack:
runtime.startpanic_m()
        /usr/local/go/src/runtime/panic1.go:67 +0x141 fp=0x7f133d7f6b58 sp=0x7f133d7f6b30
runtime.systemstack(0x877230)
        /usr/local/go/src/runtime/asm_amd64.s:278 +0xab fp=0x7f133d7f6b60 sp=0x7f133d7f6b58
runtime.startpanic()
        /usr/local/go/src/runtime/panic.go:505 +0x14 fp=0x7f133d7f6b70 sp=0x7f133d7f6b60
runtime.throw(0x81b8f0, 0x15)
        /usr/local/go/src/runtime/panic.go:526 +0x83 fp=0x7f133d7f6b88 sp=0x7f133d7f6b70
runtime.gopanic(0x7653c0, 0xc820010030)
        /usr/local/go/src/runtime/panic.go:354 +0xb6 fp=0x7f133d7f6c08 sp=0x7f133d7f6b88
runtime.panicindex()
        /usr/local/go/src/runtime/panic.go:12 +0x49 fp=0x7f133d7f6c30 sp=0x7f133d7f6c08
runtime.gentraceback(0x47ee70, 0xc8200349e0, 0x0, 0xc820000600, 0x0, 0x0, 0x64, 0x0, 0x0, 0x0, ...)
        /usr/local/go/src/runtime/traceback.go:255 +0x1206 fp=0x7f133d7f6d60 sp=0x7f133d7f6c30
runtime.traceback1(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0xc820000600, 0x0)
        /usr/local/go/src/runtime/traceback.go:550 +0xc8 fp=0x7f133d7f6dc0 sp=0x7f133d7f6d60
runtime.traceback(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0xc820000600)
        /usr/local/go/src/runtime/traceback.go:527 +0x48 fp=0x7f133d7f6df0 sp=0x7f133d7f6dc0
runtime.tracebackothers(0xc820000480)
        /usr/local/go/src/runtime/traceback.go:664 +0xda fp=0x7f133d7f6e68 sp=0x7f133d7f6df0
runtime.dopanic_m(0xc820000480, 0x44f370, 0x7f133d7f6f20)
        /usr/local/go/src/runtime/panic1.go:104 +0x1f9 fp=0x7f133d7f6eb8 sp=0x7f133d7f6e68
runtime.dopanic.func1()
        /usr/local/go/src/runtime/panic.go:514 +0x32 fp=0x7f133d7f6ed8 sp=0x7f133d7f6eb8
runtime.systemstack(0x7f133d7f6ef8)
        /usr/local/go/src/runtime/asm_amd64.s:278 +0xab fp=0x7f133d7f6ee0 sp=0x7f133d7f6ed8
runtime.dopanic(0x0)
        /usr/local/go/src/runtime/panic.go:515 +0x61 fp=0x7f133d7f6f20 sp=0x7f133d7f6ee0
runtime.throw(0x81b8f0, 0x15)
        /usr/local/go/src/runtime/panic.go:527 +0x90 fp=0x7f133d7f6f38 sp=0x7f133d7f6f20
runtime.gopanic(0x7653c0, 0xc820010030)
        /usr/local/go/src/runtime/panic.go:354 +0xb6 fp=0x7f133d7f6fb8 sp=0x7f133d7f6f38
runtime.panicindex()
        /usr/local/go/src/runtime/panic.go:12 +0x49 fp=0x7f133d7f6fe0 sp=0x7f133d7f6fb8
runtime.gentraceback(0x44801f, 0xc820034a58, 0x0, 0xc820000600, 0x0, 0xc820034a98, 0x20, 0x0, 0x0, 0x0, ...)
        /usr/local/go/src/runtime/traceback.go:255 +0x1206 fp=0x7f133d7f7110 sp=0x7f133d7f6fe0
runtime.callers.func1()
        /usr/local/go/src/runtime/traceback.go:566 +0xa2 fp=0x7f133d7f7190 sp=0x7f133d7f7110
runtime.systemstack(0x7f133d7f7198)
        /usr/local/go/src/runtime/asm_amd64.s:262 +0x79 fp=0x7f133d7f7198 sp=0x7f133d7f7190
runtime.mstart()
        /usr/local/go/src/runtime/proc1.go:674 fp=0x7f133d7f71a0 sp=0x7f133d7f7198
@bradfitz
Copy link
Contributor

/cc @rsc @aclements @dvyukov

@ianlancetaylor ianlancetaylor added this to the Go1.5.1 milestone Aug 21, 2015
@ianlancetaylor
Copy link
Contributor

@aclements This is crashing in this code in traceback.go, which I believe you wrote:

        if frame.lr == stackBarrierPC {
            // Recover original PC.
            if stkbar[0].savedLRPtr != lrPtr {

As you can see from the stack trace, cgo code has called into Go code, the Go code is allocating memory, the profiler has decided to jump in and get a stack trace, and getting the stack trace crashes.

@aclements
Copy link
Member

I suspect this has something to do with the funny tricks cgocallback_gofunc plays with the frame. I'm trying to put together a reproducer, but probably won't have something until tomorrow.

@noxiouz, do you happen to know if the call from C to go_stat_callback happens on a thread that was created by C or by Go? (I suspect it doesn't matter, but might help me narrow down the reproducer.)

In the mean time, there are a few things you can do to work around this. The simplest is probably to set GODEBUG=gcstackbarrieroff=1. You can also disable memory profiling.

@aclements
Copy link
Member

BTW, anything you can tell us about the caller of go_stat_callback would be useful. Based on the name, I assume this was a function pointer passed in to C?

I tried writing the obvious reproducer where I set stack barriers to be installed at every frame, and made a call from Go to C and back to Go, then invoked runtime.Callers() while stack barriers were installed, but that wasn't enough to trigger this problem.

@noxiouz
Copy link
Author

noxiouz commented Aug 21, 2015

@aclements yes, this callback is called from C thread.
The first argument of this function is uintptr to C.struct. The second argument is uint64 which is used as key to get go function from global map (it's a kind of context). Inside go_stat_callback I get a proper context from a map, cast it to go function and call it.
There are many such callbacks in the code, but the occurs only in that callback. I assume, that it's connected with the size of frame. All other callbacks receives small data, but stat callback receives megabytes of data.
Should I attach any code links?

@noxiouz
Copy link
Author

noxiouz commented Aug 21, 2015

@aclements BTW, GODEBUG=gcstackbarrieroff=1 makes the program work without panic.

@aclements
Copy link
Member

Thanks.

There are many such callbacks in the code, but the occurs only in that callback. I assume, that it's connected with the size of frame. All other callbacks receives small data, but stat callback receives megabytes of data.

It most likely is related to the size of the frame, but it may just be bad luck (stack barriers are inserted at exponentially-spaced points in the stack, so it's hard to predict where they'll fall, but they tend to fall in the same place). What do you mean by "stat callback receives megabytes of data"? It's all on the heap, not on the stack, presumably? Though, a large GoBytes allocation could be triggering the garbage collector, which could be part of why you're repeatably seeing the failure at this point.

Should I attach any code links?

No, thanks, though if this is happening in an open source application or tests, it would be great if you could paste commands to reproduce it.

@danderson
Copy link
Contributor

I believe I'm able to produce a similar crash involving C->Go callbacks and stack barriers:

» go test -bench . -v
=== RUN   TestRealm
=== RUN   TestPrefix
=== RUN   TestLongestMatch
=== RUN   TestMatches
--- PASS: TestRealm (0.00s)
--- PASS: TestLongestMatch (0.01s)
--- PASS: TestMatches (0.01s)
--- PASS: TestPrefix (0.01s)
PASS
BenchmarkInsertions-4   at *0xc82004d520 expected stack barrier PC 0x4ebc70, found 0xc82004d5d8, goid=51
gp.stkbar=[*0xc82004d2e0=0x5d35c1 *0xc82004d520=0x4ed67c *0xc82004df18=0x50736a], gp.stkbarPos=1, gp.stack=[0xc82004c000,0xc82004dfc0)
fatal error: stack barrier lost

runtime stack:
runtime.throw(0x6bf330, 0x12)
    /usr/lib/go/src/runtime/panic.go:527 +0x90
runtime.gcRemoveStackBarrier(0xc820001980, 0xc82004d520, 0x4ed67c)
    /usr/lib/go/src/runtime/mgcmark.go:579 +0x245
runtime.gcUnwindBarriers(0xc820001980, 0xc82004d5d8)
    /usr/lib/go/src/runtime/mgcmark.go:610 +0xc1
runtime.heapBitsBulkBarrier.func1()
    /usr/lib/go/src/runtime/mbitmap.go:409 +0x29
runtime.systemstack(0xc82001e000)
    /usr/lib/go/src/runtime/asm_amd64.s:262 +0x79
runtime.mstart()
    /usr/lib/go/src/runtime/proc1.go:674

goroutine 51 [running]:
runtime.systemstack_switch()
    /usr/lib/go/src/runtime/asm_amd64.s:216 fp=0xc82004d430 sp=0xc82004d428
runtime.heapBitsBulkBarrier(0xc82004d5d8, 0x10)
    /usr/lib/go/src/runtime/mbitmap.go:410 +0x142 fp=0xc82004d4c8 sp=0xc82004d430
runtime.typedmemmove(0x612900, 0xc82004d5d8, 0xc820656cf0)
    /usr/lib/go/src/runtime/mbarrier.go:185 +0x59 fp=0xc82004d4e8 sp=0xc82004d4c8
runtime.assertE2T2(0x612900, 0x612900, 0xc820656cf0, 0xc82004d5d8, 0x0)
    /usr/lib/go/src/runtime/iface.go:242 +0x92 fp=0xc82004d508 sp=0xc82004d4e8
github.com/mattn/go-sqlite3.(*SQLiteStmt).bind(0xc820110050, 0xc82065a3c0, 0x4, 0x4, 0x0, 0x0)
    /home/dave/hack/go/src/github.com/mattn/go-sqlite3/sqlite3.go:793 +0x9de fp=0xc82004d790 sp=0xc82004d508
github.com/mattn/go-sqlite3.(*SQLiteStmt).Exec(0xc820110050, 0xc82065a3c0, 0x4, 0x4, 0x0, 0x0, 0x0, 0x0)
    /home/dave/hack/go/src/github.com/mattn/go-sqlite3/sqlite3.go:851 +0x7b fp=0xc82004d838 sp=0xc82004d790
github.com/mattn/go-sqlite3.(*SQLiteConn).Exec(0xc820110000, 0x6efae0, 0x83, 0xc82065a3c0, 0x4, 0x4, 0x0, 0x0, 0x0, 0x0)
    /home/dave/hack/go/src/github.com/mattn/go-sqlite3/sqlite3.go:515 +0x3b0 fp=0xc82004d938 sp=0xc82004d838
database/sql.(*Tx).Exec(0xc8200f3db0, 0x6efae0, 0x83, 0xc82004dcb8, 0x4, 0x4, 0x0, 0x0, 0x0, 0x0)
    /usr/lib/go/src/database/sql/sql.go:1267 +0x276 fp=0xc82004dac8 sp=0xc82004d938
github.com/danderson/gipam/db.(*Prefix).Create(0xc82004dee8, 0x0, 0x0)
    /home/dave/hack/go/src/github.com/danderson/gipam/db/prefix.go:34 +0x64c fp=0xc82004dd00 sp=0xc82004dac8
found next stack barrier at 0xc82004df18; expected [*0xc82004d520=0x4ed67c *0xc82004df18=0x50736a]
fatal error: missed stack barrier
panic during panic

runtime stack:
runtime.startpanic_m()
    /usr/lib/go/src/runtime/panic1.go:67 +0x141
runtime.systemstack(0x6ee2d8)
    /usr/lib/go/src/runtime/asm_amd64.s:278 +0xab
runtime.startpanic()
    /usr/lib/go/src/runtime/panic.go:505 +0x14
runtime.throw(0x6bd410, 0x14)
    /usr/lib/go/src/runtime/panic.go:526 +0x83
runtime.gentraceback(0x4ebb10, 0xc82004d428, 0x0, 0xc820001980, 0x0, 0x0, 0x64, 0x0, 0x0, 0x0, ...)
    /usr/lib/go/src/runtime/traceback.go:259 +0x1053
runtime.traceback1(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0xc820001980, 0x0)
    /usr/lib/go/src/runtime/traceback.go:550 +0xc8
runtime.traceback(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0xc820001980)
    /usr/lib/go/src/runtime/traceback.go:527 +0x48
runtime.tracebackothers(0xc820001200)
    /usr/lib/go/src/runtime/traceback.go:664 +0xda
runtime.dopanic_m(0xc820001200, 0x4bddc0, 0x7f32141b1e10)
    /usr/lib/go/src/runtime/panic1.go:104 +0x1f9
runtime.dopanic.func1()
    /usr/lib/go/src/runtime/panic.go:514 +0x32
runtime.systemstack(0x7f32141b1de8)
    /usr/lib/go/src/runtime/asm_amd64.s:278 +0xab
runtime.dopanic(0x0)
    /usr/lib/go/src/runtime/panic.go:515 +0x61
runtime.throw(0x6bf330, 0x12)
    /usr/lib/go/src/runtime/panic.go:527 +0x90
runtime.gcRemoveStackBarrier(0xc820001980, 0xc82004d520, 0x4ed67c)
    /usr/lib/go/src/runtime/mgcmark.go:579 +0x245
runtime.gcUnwindBarriers(0xc820001980, 0xc82004d5d8)
    /usr/lib/go/src/runtime/mgcmark.go:610 +0xc1
runtime.heapBitsBulkBarrier.func1()
    /usr/lib/go/src/runtime/mbitmap.go:409 +0x29
runtime.systemstack(0xc82001e000)
    /usr/lib/go/src/runtime/asm_amd64.s:262 +0x79
runtime.mstart()
    /usr/lib/go/src/runtime/proc1.go:674
exit status 2
FAIL    github.com/danderson/gipam/db   2.934s

This crash happens repeatably within 10s of starting this benchmark, on amd64. The crash does not happen in Go 1.4.2, or in Go 1.5.0 with GODEBUG=gcstackbarrieroff=1.

To reproduce, go get github.com/danderson/gipam; then install github.com/danderson/go-sqlite3 as github.com/mattn/go-sqlite3 (I'm pending a pull request, but my fork is essential because it implements the C->Go callbacks in question). Once that's done, go test -bench . -v on the db subpackage should reliably trigger the crash.

Note that the cgo code in my fork of go-sqlite3 is my first attempt at doing C->Go callbacks, so it's possible that I just messed that up and am corrupting memory. However, the fact that it doesn't crash in 1.4 and that this bug talks about bugs triggered by C->Go->C transitions makes me suspicious, as this benchmark is doing hundreds of thousands of those transitions.

@aclements
Copy link
Member

Hi Dave!

I tried reproducing the failure with your gipam benchmark, but I get an immediate segfault.

$ go get -d github.com/danderson/gipam
$ go get -d github.com/danderson/go-sqlite3
$ mkdir $GOPATH/src/github.com/mattn
$ mv $GOPATH/src/github.com/danderson/go-sqlite3 $GOPATH/src/github.com/mattn/
$ cd $GOPATH/src/github.com/danderson/gipam/db
$ sed -i 's/dontBenchmarkInsertions/BenchmarkInsertions/' db_test.go
$ go test -bench Insertions -v
=== RUN   TestRealm
=== RUN   TestPrefix
=== RUN   TestLongestMatch
=== RUN   TestMatches
=== RUN   TestDomain
=== RUN   TestHost
--- PASS: TestRealm (0.00s)
--- PASS: TestLongestMatch (0.01s)
--- PASS: TestMatches (0.01s)
--- PASS: TestDomain (0.00s)
--- PASS: TestHost (0.00s)
--- PASS: TestPrefix (0.01s)
PASS
BenchmarkInsertions-4   0
signal: segmentation fault (core dumped)
FAIL    github.com/danderson/gipam/db   0.163s

It looks like it's making a cgo call to a NULL function pointer, but I can't get a real backtrace out of either Go or GDB.

@aclements
Copy link
Member

@danderson, any ideas why I can't run your benchmark? If not, I can work on debugging it, but I'd rather avoid debugging something in order to debug something. :)

@noxiouz
Copy link
Author

noxiouz commented Aug 25, 2015

sorry for not answering, @aclements. My case is not easy to reproduce. As it's a part storage system and requires a lot to do to install it. I'll try to find a way to reproduce it anyway.

@danderson
Copy link
Contributor

@aclements I've seen that segfault as well now. I'm not sure what's causing it. AFAICT, it is necessary to have a Go->C->Go->C transition for it to happen, but that's all I know right now.

I'm trying to narrow it down to a small repro case that doesn't involve the entire sqlite library, I'll update if I manage to get one. Unfortunately this isn't my day job, so it may be slow going :(

@danderson
Copy link
Contributor

To be clear: in my current codebase, the crash only happens with the new code I added to go-sqlite, which introduces C->Go calls to go-sqlite. Without a C->Go transition, I cannot trigger the crash.

I've walked through the call chain on a reduced test case that still involves go-sqlite, and afaict the code is well-formed and never calls any NULL C functions.

I'll get back to you with the smallest repro case I can produce - hopefully one that has no sqlite in it at all, but if not, then I can at least remove the intermediate layers in my benchmark to narrow things down.

@aclements
Copy link
Member

@noxiouz, @danderson, I have a fix for what I believe is causing your crashes. Can you try applying https://go-review.googlesource.com/13944 and let me know if it fixes the crashes?

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/13944 mentions this issue.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/13947 mentions this issue.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/13948 mentions this issue.

@danderson
Copy link
Contributor

Checking now, I'll report back after some stress-testing.

@danderson
Copy link
Contributor

LGTM++, I'm unable to cause a crash in any of the cases that trivially blow up with an unpatched 1.5. The change looks great.

@thomasf
Copy link

thomasf commented Aug 27, 2015

Hmm I have a situation which probably is a Go -> C -> Go -> C case, all or almost all calls to panic() from inside the Go callback function seemingly stops the executing, no printing stack traces or exiting. I'll start by trying the patch..

edit: This seems to have solved my problem, I created panic calls in 40 random places in my code where it did not work before a and simple random trigger.. My app is currently being executed in a bash loop and has successfully panicked several hundred or thousands times by now..

@noxiouz
Copy link
Author

noxiouz commented Aug 27, 2015

Im checking now

@noxiouz
Copy link
Author

noxiouz commented Aug 27, 2015

@aclements my case has been working for more than 11 hours already. Before the patch it broke down every several minutes. Seems it works.

@aclements
Copy link
Member

@danderson, @noxiouz, thanks for testing!

@aclements
Copy link
Member

For posterity, the program I used to reproduce this is below. The failure mode is somewhat different from the other reports in this issue, and it's timing-dependent, though it tries hard not to be too sensitive. It should be run with GODEBUG=gcstackbarrierall=1.

package main

/*
extern void GoCallback(int);
extern void GoCheckstack(void);

static void renest() {
  // Wait for GC to finish. We should have a stack barrier in our sched.pc.
  sleep(2);
  // Call back in to Go so we put sched.pc back as the return IP.
  GoCheckstack();
}

static void CNest(int n) {
  GoCallback(n);
  if (n == 10)
    renest();
}

static void CFunc(void) {
  while(1)
    CNest(20);
}
*/
import "C"

import (
    "fmt"
    "runtime"
    "time"
)

var garbage *tree
var x []*byte

//export GoCallback
func GoCallback(n C.int) {
    if n == 0 {
        fmt.Println("triggering GC")
        x = make([]*byte, 100<<20)
        fmt.Println("waiting for stack barriers")
        time.Sleep(100 * time.Millisecond)
        fmt.Println("unwinding")
    } else {
        C.CNest(n - 1)
    }
}

//export GoCheckstack
func GoCheckstack() {
    pcs := make([]uintptr, 100)
    runtime.Callers(0, pcs)
}

type tree struct {
    l, r *tree
}

func makeGarbage(depth int) *tree {
    if depth == 0 {
        return new(tree)
    }
    return &tree{makeGarbage(depth - 1), makeGarbage(depth - 1)}
}

func main() {
    garbage = makeGarbage(25)
    C.CFunc()
}

aclements added a commit that referenced this issue Aug 30, 2015
Currently enabling the debugging mode where stack barriers are
installed at every frame requires recompiling the runtime. However,
this is potentially useful for field debugging and for runtime tests,
so make this mode a GODEBUG.

Updates #12238.

Change-Id: I6fb128f598b19568ae723a612e099c0ed96917f5
Reviewed-on: https://go-review.googlesource.com/13947
Reviewed-by: Russ Cox <rsc@golang.org>
aclements added a commit that referenced this issue Aug 30, 2015
Currently the stack barrier stub blindly unwinds the next stack
barrier from the G's stack barrier array without checking that it's
the right stack barrier. If through some bug the stack barrier array
position gets out of sync with where we actually are on the stack,
this could return to the wrong PC, which would lead to difficult to
debug crashes. To address this, this commit adds a check to the amd64
stack barrier stub that it's unwinding the correct stack barrier.

Updates #12238.

Change-Id: If824d95191d07e2512dc5dba0d9978cfd9f54e02
Reviewed-on: https://go-review.googlesource.com/13948
Reviewed-by: Russ Cox <rsc@golang.org>
@gopherbot
Copy link
Contributor

CL https://golang.org/cl/14240 mentions this issue.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/14241 mentions this issue.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/14229 mentions this issue.

aclements added a commit that referenced this issue Sep 8, 2015
…ry frame

Currently enabling the debugging mode where stack barriers are
installed at every frame requires recompiling the runtime. However,
this is potentially useful for field debugging and for runtime tests,
so make this mode a GODEBUG.

Updates #12238.

Change-Id: I6fb128f598b19568ae723a612e099c0ed96917f5
Reviewed-on: https://go-review.googlesource.com/13947
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-on: https://go-review.googlesource.com/14240
Reviewed-by: Austin Clements <austin@google.com>
aclements added a commit that referenced this issue Sep 8, 2015
…allback_gofunc's frame

Currently the runtime can install stack barriers in any frame.
However, the frame of cgocallback_gofunc is special: it's the one
function that switches from a regular G stack to the system stack on
return. Hence, the return PC slot in its frame on the G stack is
actually used to save getg().sched.pc (so tracebacks appear to unwind
to the last Go function running on that G), and not as an actual
return PC for cgocallback_gofunc.

Because of this, if we install a stack barrier in cgocallback_gofunc's
return PC slot, when cgocallback_gofunc does return, it will move the
stack barrier stub PC in to getg().sched.pc and switch back to the
system stack. The rest of the runtime doesn't know how to deal with a
stack barrier stub in sched.pc: nothing knows how to match it up with
the G's stack barrier array and, when the runtime removes stack
barriers, it doesn't know to undo the one in sched.pc. Hence, if the C
code later returns back in to Go code, it will attempt to return
through the stack barrier saved in sched.pc, which may no longer have
correct unwinding information.

Fix this by blacklisting cgocallback_gofunc's frame so the runtime
won't install a stack barrier in it's return PC slot.

Fixes #12238.

Change-Id: I46aa2155df2fd050dd50de3434b62987dc4947b8
Reviewed-on: https://go-review.googlesource.com/13944
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-on: https://go-review.googlesource.com/14229
Reviewed-by: Austin Clements <austin@google.com>
aclements added a commit that referenced this issue Sep 8, 2015
… sync

Currently the stack barrier stub blindly unwinds the next stack
barrier from the G's stack barrier array without checking that it's
the right stack barrier. If through some bug the stack barrier array
position gets out of sync with where we actually are on the stack,
this could return to the wrong PC, which would lead to difficult to
debug crashes. To address this, this commit adds a check to the amd64
stack barrier stub that it's unwinding the correct stack barrier.

Updates #12238.

Change-Id: If824d95191d07e2512dc5dba0d9978cfd9f54e02
Reviewed-on: https://go-review.googlesource.com/13948
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-on: https://go-review.googlesource.com/14241
Reviewed-by: Austin Clements <austin@google.com>
@nirbhayc
Copy link

A similar issue : #12582

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants