Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: libunwind is unable to unwind CGo to Go's stack #40044

Open
steeve opened this issue Jul 4, 2020 · 11 comments
Open

runtime: libunwind is unable to unwind CGo to Go's stack #40044

steeve opened this issue Jul 4, 2020 · 11 comments
Milestone

Comments

@steeve
Copy link
Contributor

@steeve steeve commented Jul 4, 2020

What version of Go are you using (go version)?

master as of the build

$ go version
devel +dd150176c3 Fri Jul 3 03:31:29 2020 +0000

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/steeve/Library/Caches/go-build"
GOENV="/Users/steeve/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/steeve/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/steeve/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/Users/steeve/code/github.com/znly/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/Users/steeve/code/github.com/znly/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/steeve/code/github.com/znly/go/src/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/bs/51dlb_nn5k35xq9qfsxv9wc00000gr/T/go-build842228435=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

Following @cherrymui's comment on #39524, I figured I tried to check why lots of our backtraces on iOS stop at runtime.asmcgocall.
Since I wanted to reproduce it on my computer and lldb manges to properly backtrace, I figured I'd give libunwind a try, since this is was iOS uses when a program crashes.

Unfortunately libunwind didn't manage to walk the stack past CGo generated _Cfunc_ functions.

Given this program:

package main

/*
#cgo CFLAGS: -O0

#include <libunwind.h>
#include <stdio.h>

void backtrace() {
	unw_cursor_t cursor;
	unw_context_t context;

	// Initialize cursor to current frame for local unwinding.
	unw_getcontext(&context);
	unw_init_local(&cursor, &context);

	// Unwind frames one by one, going up the frame stack.
	while (unw_step(&cursor) > 0) {
		unw_word_t offset, pc;
		unw_get_reg(&cursor, UNW_REG_IP, &pc);
		if (pc == 0) {
			break;
		}
	    printf("0x%llx:", pc);
		char sym[256];
		if (unw_get_proc_name(&cursor, sym, sizeof(sym), &offset) == 0) {
			printf(" (%s+0x%llx)\n", sym, offset);
		} else {
			printf(" -- error: unable to obtain symbol name for this frame\n");
		}
  	}
}

void two() {
	printf("two\n");
	backtrace();
}

void one() {
	printf("one\n");
	two();
}
*/
import "C"

//go:noinline
func goone() {
	C.one()
}

func main() {
	goone()
}

It prints:

one1
two2
0x40617fe: (two+0x1e)
0x406182e: (one+0x1e)
0x406168b: (_cgo_7c45d1c2feef_Cfunc_one+0x1b)

I tried doing Go(1) -> C(1) -> Go(2) -> C(2) and backtrace, and it only unwinds C(2).

Also, I tried to make set asmcgocall to have a 16 bytes stack, hoping that the generated frame pointer would help, but it didn't.

What did you expect to see?

The complete backtrace.

What did you see instead?

A backtrace for C functions only.

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Jul 4, 2020

This is different from #39524 . We switch stacks at Go/C boundaries. Go code runs on goroutine stacks (typically small), whereas C code runs on system stacks (typically large). Since they are not on the same stack, I would not expect any stack unwinding tool to work. Not sure if there is anything we could do.

Maybe we could use the frame pointer to "fake" it? Not sure this is a good idea...

@steeve
Copy link
Contributor Author

@steeve steeve commented Jul 4, 2020

Indeed. Thinking about it however, it doesn't feel that hard to do. Either by locally modifying asmcgocall, or, in a more ambitious way, via go:systemstack. Do you think that could work, at least in theory?

@steeve
Copy link
Contributor Author

@steeve steeve commented Jul 4, 2020

Also, weirdly enough, lldb is able to do it, without dwarf.

@steeve
Copy link
Contributor Author

@steeve steeve commented Jul 5, 2020

I am also realizing this is very different from #39524 indeed. But in some way, since runtime.libCCall uses asmcgocall, it could also allow for unwinding of gourtines blocked in things like pthread functions.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jul 5, 2020

If you want to unwind from C++ back into Go you may want to try github.com/ianlancetaylor/cgosymbolizer. Although that will only help from the Go side, not the C++ side.

In principle we could hand write unwind information for asmcgocall. The unwind information is basically DWARF, and it should be powerful enough to express what asmcgocall does.

@steeve
Copy link
Contributor Author

@steeve steeve commented Jul 5, 2020

@ianlancetaylor thank you. The issue, on the iOS side, is that unwinding is done locally, on the device (presumably with libunwind), without DWARF. DWARF is only added later to symbolicate the crashes.

That said, it could be useful for Android (which uses breakpad with minidumps)

@cherrymui I tried that forsaken piece of code to, in order to call the backtrace method without cgo, and alas, the unwinding still stops before it somehow. This is based on the rustgo article:

TEXT ·backtracetrampoline(SB),0,$2048
    MOVQ SP, BX        // Save SP in a callee-saved registry
    ADDQ $2048, SP     // Rollback SP to reuse this function's frame
    ANDQ $~15, SP      // Align the stack to 16-bytes
    CALL backtrace(SB)
    MOVQ BX, SP        // Restore SP
    RET
@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Jul 6, 2020

@steeve Sorry, I'm not sure exactly what you're planning to do, and why go:systemstack is relevant here. go:systemstack only enforces the marked function must run on the system stack (i.e. cannot run on a goroutine stack). It doesn't change how stack switches work.

Also, on what architecture? You mentioned iOS (presumably ARM64), but also AMD64 in your go env.

That said, does CL https://go-review.googlesource.com/c/go/+/241080 makes any difference (on ARM64)? Thanks.

@steeve
Copy link
Contributor Author

@steeve steeve commented Jul 6, 2020

@cherrymui Thank you for the CL, I wasn't hoping as much. Will definitely try and let you know.

My ultimate target is indeed iOS (and Android, to an extent).
Before trying to fix it on iOS though, I figure it'd be easier to reproduce on my computer (darwin/amd64), and since iOS uses libunwind, try to investigate it myself.
My other experiment in which I tried to call the method directly in the Go stack, is to try and narrow down if the frame pointer was trashed because of the stack switch itself.

@gopherbot
Copy link

@gopherbot gopherbot commented Jul 6, 2020

Change https://golang.org/cl/241158 mentions this issue: runtime: adjust frame pointer on stack copy on ARM64

@cagedmantis cagedmantis added this to the Backlog milestone Jul 6, 2020
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jul 6, 2020

@steeve libunwind unwinds the stack using the unwind information, which is not DWARF but is approximately the same format as a subset of DWARF. That's what I was referring to when I suggested that we could write unwind information for asmcgocall. (You can see the horrible details at https://www.airs.com/blog/archives/460).

@steeve
Copy link
Contributor Author

@steeve steeve commented Jul 9, 2020

I just tried your CL @cherrymui on a real device, and unfortunately, when I pause inside XCode's, I only see the stack up to asmcgocall.

In my case I did put a time.Sleep() and the backtrace in XCode's lldb was:

    frame #0: 0x00000001889f2bc0 libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x00000001889151e4 libsystem_pthread.dylib`_pthread_cond_wait + 680
    frame #2: 0x000000010785d5e8 Zenly`runtime.pthread_cond_wait_trampoline + 24
    frame #3: 0x000000010785c4fc Zenly`runtime.asmcgocall + 204
    frame #4: 0x000000010785c4fc Zenly`runtime.asmcgocall + 204
  * frame #5: 0x000000010785c4fc Zenly`runtime.asmcgocall + 204

Note that on amd64, lldb manages to properly unwind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.