New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/mobile: stack traces are wrong on iOS and Android #22716

Open
steeve opened this Issue Nov 14, 2017 · 20 comments

Comments

Projects
None yet
5 participants
@steeve
Contributor

steeve commented Nov 14, 2017

What version of Go are you using (go version)?

go version devel +4f178d157d Fri Oct 20 13:59:37 2017 +0000 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/steeve/go"
GORACE=""
GOROOT="/usr/local/Cellar/go/1.9.1/libexec"
GOTOOLDIR="/usr/local/Cellar/go/1.9.1/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/bs/51dlb_nn5k35xq9qfsxv9wc00000gr/T/go-build785508759=/tmp/go-build -gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"

What did you do?

Bind an iOS framework using gomobile bind and create a panic.

What did you expect to see?

The real stack trace.

What did you see instead?

As commented in #20392 (comment):

# Crashlytics - plaintext stacktrace downloaded by Steeve Morin at Tue, 14 Nov 2017 00:11:12 GMT
# Platform: ios
# Application: testapp-swift
# Version: 1.0 (1)
# Issue #: 2
# Issue ID: 591c99d0be077a4dccdaf367
# Session ID: e591a6fc5b8641d8b4571ce7df6c11f4_4c0f40a1c8d011e7a1bd56847afe9799_0_v2
# Date: 2017-11-14T00:10:00Z
# OS Version: 11.1.1 (15B150)
# Device: iPhone X
# RAM Free: 7.2%
# Disk Free: 69.1%

#0. Crashed: com.apple.main-thread
0  testapp-swift                  0x1048d5798 runtime.raiseproc + 24
1  testapp-swift                  0x1048c1fdc runtime.dieFromSignal + 60
2  testapp-swift                  0x1048c1fdc runtime.dieFromSignal + 60
3  testapp-swift                  0x1048c1fdc runtime.dieFromSignal + 60
4  testapp-swift                  0x1048c1fdc runtime.dieFromSignal + 60
5  testapp-swift                  0x1048c1fdc runtime.dieFromSignal + 60
6  testapp-swift                  0x1048c1fdc runtime.dieFromSignal + 60
7  testapp-swift                  0x1048c1fdc runtime.dieFromSignal + 60
8  testapp-swift                  0x1048c1fdc runtime.dieFromSignal + 60
9  testapp-swift                  0x1048c1fdc runtime.dieFromSignal + 60
10 testapp-swift                  0x1048c1fdc runtime.dieFromSignal + 60

--

#0. Crashed: com.apple.main-thread
0  testapp-swift                  0x1048d5798 runtime.raiseproc + 24
1  testapp-swift                  0x1048c1fdc runtime.dieFromSignal + 60
2  testapp-swift                  0x1048c1fdc runtime.dieFromSignal + 60
3  testapp-swift                  0x1048c1fdc runtime.dieFromSignal + 60
4  testapp-swift                  0x1048c1fdc runtime.dieFromSignal + 60
5  testapp-swift                  0x1048c1fdc runtime.dieFromSignal + 60
6  testapp-swift                  0x1048c1fdc runtime.dieFromSignal + 60
7  testapp-swift                  0x1048c1fdc runtime.dieFromSignal + 60
8  testapp-swift                  0x1048c1fdc runtime.dieFromSignal + 60
9  testapp-swift                  0x1048c1fdc runtime.dieFromSignal + 60
10 testapp-swift                  0x1048c1fdc runtime.dieFromSignal + 60

#1. Thread
0  testapp-swift                  0x1048d5b1c runtime.mach_semaphore_wait + 12
1  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
2  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
3  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
4  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
5  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
6  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
7  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
8  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
9  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
10 testapp-swift                  0x1048ac55c runtime.semasleep1 + 60

#2. Thread
0  libsystem_kernel.dylib         0x183c65dbc __workq_kernreturn + 8
1  libsystem_pthread.dylib        0x183d76fa0 _pthread_wqthread + 884
2  libsystem_pthread.dylib        0x183d76c20 start_wqthread + 4

#3. Thread
0  libsystem_pthread.dylib        0x183d76c1c start_wqthread + 122

#4. Thread
0  libsystem_kernel.dylib         0x183c65dbc __workq_kernreturn + 8
1  libsystem_pthread.dylib        0x183d77134 _pthread_wqthread + 1288
2  libsystem_pthread.dylib        0x183d76c20 start_wqthread + 4

#5. Thread
0  libsystem_kernel.dylib         0x183c65dbc __workq_kernreturn + 8
1  libsystem_pthread.dylib        0x183d77134 _pthread_wqthread + 1288
2  libsystem_pthread.dylib        0x183d76c20 start_wqthread + 4

#6. com.apple.uikit.eventfetch-thread
0  libsystem_kernel.dylib         0x183c44bc4 mach_msg_trap + 8
1  libsystem_kernel.dylib         0x183c44a3c mach_msg + 72
2  CoreFoundation                 0x1840f5c74 __CFRunLoopServiceMachPort + 196
3  CoreFoundation                 0x1840f3840 __CFRunLoopRun + 1424
4  CoreFoundation                 0x184013fb8 CFRunLoopRunSpecific + 436
5  Foundation                     0x184a3d6e4 -[NSRunLoop(NSRunLoop) runMode:beforeDate:] + 304
6  Foundation                     0x184a5cafc -[NSRunLoop(NSRunLoop) runUntilDate:] + 96
7  UIKit                          0x18e1472f4 -[UIEventFetcher threadMain] + 136
8  Foundation                     0x184b3e860 __NSThread__start__ + 996
9  libsystem_pthread.dylib        0x183d7831c _pthread_body + 308
10 libsystem_pthread.dylib        0x183d781e8 _pthread_body + 310
11 libsystem_pthread.dylib        0x183d76c28 thread_start + 4

#7. Thread
0  libsystem_kernel.dylib         0x183c65dbc __workq_kernreturn + 8
1  libsystem_pthread.dylib        0x183d76fa0 _pthread_wqthread + 884
2  libsystem_pthread.dylib        0x183d76c20 start_wqthread + 4

#8. Thread
0  testapp-swift                  0x1048d5b44 runtime.mach_semaphore_timedwait + 20
1  testapp-swift                  0x1048ac5e0 runtime.semasleep1 + 192
2  testapp-swift                  0x1048ac5e0 runtime.semasleep1 + 192
3  testapp-swift                  0x1048ac5e0 runtime.semasleep1 + 192
4  testapp-swift                  0x1048ac5e0 runtime.semasleep1 + 192
5  testapp-swift                  0x1048ac5e0 runtime.semasleep1 + 192
6  testapp-swift                  0x1048ac5e0 runtime.semasleep1 + 192
7  testapp-swift                  0x1048ac5e0 runtime.semasleep1 + 192
8  testapp-swift                  0x1048ac5e0 runtime.semasleep1 + 192
9  testapp-swift                  0x1048ac5e0 runtime.semasleep1 + 192
10 testapp-swift                  0x1048ac5e0 runtime.semasleep1 + 192

#9. Thread
0  testapp-swift                  0x1048d5b1c runtime.mach_semaphore_wait + 12
1  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
2  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
3  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
4  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
5  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
6  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
7  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
8  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
9  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
10 testapp-swift                  0x1048ac55c runtime.semasleep1 + 60

#10. Thread
0  testapp-swift                  0x1048d5b1c runtime.mach_semaphore_wait + 12
1  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
2  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
3  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
4  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
5  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
6  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
7  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
8  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
9  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
10 testapp-swift                  0x1048ac55c runtime.semasleep1 + 60

#11. Thread
0  testapp-swift                  0x1048d5b1c runtime.mach_semaphore_wait + 12
1  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
2  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
3  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
4  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
5  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
6  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
7  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
8  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
9  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
10 testapp-swift                  0x1048ac55c runtime.semasleep1 + 60

#12. Thread
0  testapp-swift                  0x1048d5b1c runtime.mach_semaphore_wait + 12
1  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
2  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
3  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
4  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
5  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
6  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
7  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
8  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
9  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
10 testapp-swift                  0x1048ac55c runtime.semasleep1 + 60

#13. Thread
0  testapp-swift                  0x1048d5b1c runtime.mach_semaphore_wait + 12
1  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
2  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
3  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
4  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
5  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
6  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
7  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
8  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
9  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
10 testapp-swift                  0x1048ac55c runtime.semasleep1 + 60

#14. Thread
0  testapp-swift                  0x1048d5b1c runtime.mach_semaphore_wait + 12
1  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
2  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
3  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
4  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
5  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
6  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
7  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
8  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
9  testapp-swift                  0x1048ac55c runtime.semasleep1 + 60
10 testapp-swift                  0x1048ac55c runtime.semasleep1 + 60

#15. com.twitter.crashlytics.ios.MachExceptionServer
0  libsystem_kernel.dylib         0x183c44bc4 mach_msg_trap + 8
1  libsystem_kernel.dylib         0x183c44a3c mach_msg + 72
2  testapp-swift                  0x104918aa4 CLSMachExceptionServer + 100
3  libsystem_pthread.dylib        0x183d7831c _pthread_body + 308
4  libsystem_pthread.dylib        0x183d781e8 _pthread_body + 310
5  libsystem_pthread.dylib        0x183d76c28 thread_start + 4

#16. Thread
0  libsystem_kernel.dylib         0x183c65dbc __workq_kernreturn + 8
1  libsystem_pthread.dylib        0x183d77134 _pthread_wqthread + 1288
2  libsystem_pthread.dylib        0x183d76c20 start_wqthread + 4

#17. com.apple.NSURLConnectionLoader
0  libsystem_kernel.dylib         0x183c44bc4 mach_msg_trap + 8
1  libsystem_kernel.dylib         0x183c44a3c mach_msg + 72
2  CoreFoundation                 0x1840f5c74 __CFRunLoopServiceMachPort + 196
3  CoreFoundation                 0x1840f3840 __CFRunLoopRun + 1424
4  CoreFoundation                 0x184013fb8 CFRunLoopRunSpecific + 436
5  CFNetwork                      0x18477e264 +[NSURLConnection(Loader) _resourceLoadLoop:] + 404
6  Foundation                     0x184b3e860 __NSThread__start__ + 996
7  libsystem_pthread.dylib        0x183d7831c _pthread_body + 308
8  libsystem_pthread.dylib        0x183d781e8 _pthread_body + 310
9  libsystem_pthread.dylib        0x183d76c28 thread_start + 4
@steeve

This comment has been minimized.

Contributor

steeve commented Nov 14, 2017

According to @eliasnaur at #20392 (comment) it could be due to missing frame pointers.

There is a CL to add them to arm64 at https://go-review.googlesource.com/c/go/+/61511, however, it doesn't seem to solve the issue. See https://groups.google.com/d/topic/golang-dev/v1TxCJNemPY/discussion

cc @williamweixiao

@steeve steeve changed the title from x/mobile: stack traces are wrong on iOS to x/mobile: stack traces are wrong on iOS (and likely Android) Nov 14, 2017

@steeve steeve changed the title from x/mobile: stack traces are wrong on iOS (and likely Android) to x/mobile: stack traces are wrong on iOS and Android Apr 5, 2018

@steeve

This comment has been minimized.

Contributor

steeve commented Apr 5, 2018

I spent some time to take a look at this, on Android this time.
FWIW it looks like systemstack is the culprit, as the stack trace starts from it in case of SIGABRT.

As expected however, frame pointers actually make the local variables visible by the debugger.

@steeve

This comment has been minimized.

Contributor

steeve commented Apr 5, 2018

In the case of a CGo crash, however, the stack trace seems ok, although truncated to runtime.asmcgocall.

@steeve

This comment has been minimized.

Contributor

steeve commented Apr 5, 2018

Reading systemstack, I guess it makes sense:

// systemstack runs fn on a system stack.
// If systemstack is called from the per-OS-thread (g0) stack, or
// if systemstack is called from the signal handling (gsignal) stack,
// systemstack calls fn directly and returns.
// Otherwise, systemstack is being called from the limited stack <----------
// of an ordinary goroutine. In this case, systemstack switches
// to the per-OS-thread stack, calls fn, and switches back.
// It is common to use a func literal as the argument, in order
// to share inputs and outputs with the code around the call
// to system stack:
//
//	... set up y ...
//	systemstack(func() {
//		x = bigcall(y)
//	})
//	... use x ...
//
@eliasnaur

This comment has been minimized.

Contributor

eliasnaur commented Apr 5, 2018

I'm not a Go runtime developer, but one way to move this issue forward is to make it more concrete:

  1. What is confusing Crashlytics and similar Go-oblivious crash handlers and debuggers that are not confusing gdb (and lldb and delve)? I see a lot of work going into debug information on the Go side, so I assume users of gdb, lldb, delve are happy with their crash traces.
  2. In light of 1, can a self-contained case be constructed that demonstrates the problem without having to use Xcode, Android studio or similar large and opaque dependencies? If possible, a test case directly runnable from the standard library misc/ios/go_darwin_arm_exec.go or misc/android/go_android_exec.go exec wrappers.
  3. If possible, demonstrate the failure on a android/amd64 or darwin/amd64 iOS emulator, to rule out problem with missing framepointers. Or, to demonstrate that missing framepointers is the problem.
  4. If possible, demonstrate the failure on a desktop platform (e.g. linux/amd64 or regular darwin/amd64). This will rule out any mobile specific issues. Or, demonstrate that the issue is mobile specific.
@williamweixiao

This comment has been minimized.

Member

williamweixiao commented Apr 8, 2018

Yes, a self-contained case is important for post mortem of the wrong stack traces.
As I know, frame-pointer is enabled on amd64 desktop platform (e.g. linux/amd64 and darwin/amd64) by default. If we can reproduce the failure on the platform we can rule out the problem with missing framepointers.

@eliasnaur

This comment has been minimized.

Contributor

eliasnaur commented Apr 10, 2018

It occurred to me that there is a small chance that the crash logger is somehow confused by the DWARF information the Go compiler generates. Have you tried to crash log a binary compiled with -ldflags="-w" on a GOARCH with framepointers (amd64)?

@williamweixiao

This comment has been minimized.

Member

williamweixiao commented Apr 13, 2018

hi, can you try the latest version of my patch (https://go-review.googlesource.com/c/go/+/61511) on darwin/arm64?

@steeve

This comment has been minimized.

Contributor

steeve commented Apr 15, 2018

So I've spent more time and indeed, the DWARF on darwin/arm64, while there, is empty:

$ gomobile bind -v -target=ios -prefix "" github.com/steeve/adder
$ dwarfdump -a --arch=arm64 Adder.framework/Adder
----------------------------------------------------------------------
 File: Adder.framework/Adder(go.o) (arm64)
----------------------------------------------------------------------
.debug_abbrev contents:
< EMPTY >

.debug_aranges contents:
< EMPTY >

.debug_info contents:
< EMPTY >
.debug_inlined contents:
< EMPTY >
.debug_frame contents:
< EMPTY >

.debug_line contents:
< EMPTY >

.debug_macinfo contents:
< EMPTY >

.debug_pubnames contents:
< EMPTY >

.debug_pubtypes contents:
< EMPTY >

.apple_names contents:

.apple_types contents:

.apple_exttypes contents:

.apple_namespaces contents:

.apple_objc contents:


.debug_str contents:
< EMPTY >

Indeed this may be confusing the unwinder, however, as tested by @eliasnaur, removing it doesn't seem to make the situation better either with, or without frame pointers, at least on Darwin.

@eliasnaur

This comment has been minimized.

Contributor

eliasnaur commented Apr 16, 2018

DWARF generation is completely disabled on darwin/arm and darwin/arm64. There's a

*ld.FlagW = true // disable DWARF generation

in cmd/link/internal/arm*/obj.go for objabi.Hdarwin. Removing that line for arm64 doesn't improve the Xcode backtrace in my informal crash test, however.

@gwik

This comment has been minimized.

Contributor

gwik commented Apr 24, 2018

The problem of incomplete / incorrect the stack is because the stack seems to be modified between the start of the panic (runtime.gopanic), and the sigabort being raised (runtime.dieFromSignal).

On android using the lldb debugger:
Set a breakpoint in runtime.gopanic:

(lldb) b runtime.gopanic
Breakpoint 3: where = libgojni.so`runtime.gopanic + 20 at panic.go:427, address = 0x0000007bc84963a4
(lldb) c
Process 6570 resuming
(lldb) bt
* thread #1, name = 'ly.co.crashtest', stop reason = breakpoint 2.1 3.1
  * frame #0: 0x0000007bc84963a4 libgojni.so`runtime.gopanic(e=interface {} @ 0x0000004420048e78) at panic.go:427
    frame #1: 0x0000007bc84fe590 libgojni.so`github.com/znly/crashtest/crash.CrashPanic at crash.go:65
    frame #2: 0x0000007bc84ff9d8 libgojni.so`_/tmp/gomobile-work-050620269/gomobile_bind.proxycrash__CrashPanic at go_crashmain.go:41
    frame #3: 0x0000007bc84ff588 libgojni.so`_/tmp/gomobile-work-050620269/gomobile_bind._cgoexpwrap_25f17ada05be_proxycrash__CrashPanic at _cgo_gotypes.go:216
    frame #4: 0x0000007bc84bd18c libgojni.so`runtime.call32 at asm_arm64.s:411
    frame #5: 0x0000007bc8473360 libgojni.so`runtime.cgocallbackg1(ctxt=0x0000000000000000) at cgocall.go:316
    frame #6: 0x0000007bc84731a8 li
bgojni.so`runtime.cgocallbackg(ctxt=0x0000000000000000) at cgocall.go:194
    frame #7: 0x0000007bc84bef40 libgojni.so`runtime.cgocallback_gofunc at asm_arm64.s:642
    frame #8: 0x0000007bc84bf374 libgojni.so`runtime.goexit at asm_arm64.s:1037

From there the stack is correct. We can see the user code panicking in frame 1.
Continue, waiting for the debugger to catch the SIGABRT.

(lldb) c
Process 6570 resuming
(lldb) bt
* thread #1, name = 'ly.co.crashtest', stop reason = signal SIGABRT
  * frame #0: 0x0000007bc84bfca8 libgojni.so`runtime.raise at sys_linux_arm64.s:163
    frame #1: 0x0000007bc84aac1c libgojni.so`runtime.dieFromSignal(sig=6) at signal_unix.go:422
    frame #2: 0x0000007bc84aadc0 libgojni.so`runtime.crash at signal_unix.go:520
    frame #3: 0x0000007bc8497058 libgojni.so`runtime.dopanic_m(gp=0x0000004420000600, pc=0x0000007bc8496730, sp=0x0000004420048dd0) at panic.go:751
    frame #4: 0x0000007bc84bb980 libgojni.so`runtime.dopanic.func1 at panic.go:598
    frame #5: 0x0000007bc84bcdbc libgojni.so`runtime.systemstack at asm_arm64.s:229
    frame #6: 0x0000007bc849aed0 libgojni.so`runtime.startTheWorldWithSema(emitTraceEvent=false, ~r1=548874206800) at proc.go:1092

Now the stack doesn't show the user code anymore.

With a simple program that starts a goroutine and panic on linux/amd64, it looks like this and this is what one would expect:

* thread #2, name = 'crash', stop reason = signal SIGABRT
  * frame #0: crash`runtime.raise at sys_linux_amd64.s:159
    frame #1: crash`runtime.dieFromSignal(sig=6) at signal_unix.go:422
    frame #2: crash`runtime.crash at signal_unix.go:520
    frame #3: crash`runtime.dopanic_m(gp=0x000000c420132900, pc=0x0000000000426121, sp=0x000000c42012df10) at panic.go:751
    frame #4: crash`runtime.dopanic.func1 at panic.go:598
    frame #5: crash`runtime.systemstack at asm_amd64.s:409
    frame #6: crash at proc.go:1092
    frame #7: crash`runtime.crash at signal_unix.go:520
    frame #8: crash`runtime.dopanic_m(gp=0x000000c420132900, pc=0x0000000000426121, sp=0x000000c42012df10) at panic.go:751
    frame #9: crash`runtime.dopanic.func1 at panic.go:598
    frame #10: crash`runtime.systemstack at asm_amd64.s:409
    frame #11: crash at proc.go:1092
    frame #12: crash`runtime.dopanic_m(gp=0x000000c420132900, pc=0x0000000000426121, sp=0x000000c42012df10) at panic.go:751
    frame #13: crash`runtime.dopanic.func1 at panic.go:598
    frame #14: crash`runtime.systemstack at asm_amd64.s:409
    frame #15: crash at proc.go:1092
    frame #16: crash`runtime.dopanic.func1 at panic.go:598
    frame #17: crash`runtime.systemstack at asm_amd64.s:409
    frame #18: crash`runtime.systemstack at asm_amd64.s:409
    frame #19: crash at proc.go:1092
    frame #20: crash`runtime.gopanic(e=interface {} @ 0x000000c42012dfb0) at panic.go:551
    frame #21: crash`main.parentFunc1 at crash.go:18
    frame #22: crash`main.parentFunc2 at crash.go:14
    frame #23: crash`runtime.goexit at asm_amd64.s:2361
    frame #24: crash`main.parentFunc1 at crash.go:18
    frame #25: crash`main.parentFunc2 at crash.go:14
    frame #26: crash`runtime.goexit at asm_amd64.s:2361
    frame #27: crash`main.parentFunc2 at crash.go:14
    frame #28: crash`runtime.goexit at asm_amd64.s:2361
    frame #29: crash`runtime.goexit at asm_amd64.s:2361

Also the problem that shows everytime on android/arm64 can happen on linux/amd64 too:

lldb ./tmp/crash
(lldb) target create "./tmp/crash"
Current executable set to './tmp/crash' (x86_64).
(lldb) b runtime.gopanic
Breakpoint 1: where = crash`runtime.gopanic + 47 at panic.go:426, address = 0x000000000042523f
(lldb) run
Process 23097 launched: './tmp/crash' (x86_64)
Process 23097 stopped
* thread #10, stop reason = breakpoint 1.1
    frame #0: crash`runtime.gopanic(e=interface {} @ 0x000000c420045fb0) at panic.go:426
   423     
   424     // The implementation of the predeclared function panic.
   425     func gopanic(e interface{}) {
-> 426         gp := getg()
   427         if gp.m.curg != gp {
   428             print("panic: ")
   429             printany(e)
(lldb) bt
* thread #10, stop reason = breakpoint 1.1
  * frame #0: crash`runtime.gopanic(e=interface {} @ 0x000000c420045fb0) at panic.go:426
    frame #1: crash`main.parentFunc1 at crash.go:18
    frame #2: crash`main.parentFunc2 at crash.go:14
    frame #3: crash`runtime.goexit at asm_amd64.s:2361
    frame #4: crash`main.parentFunc1 at crash.go:18
    frame #5: crash`main.parentFunc2 at crash.go:14
    frame #6: crash`runtime.goexit at asm_amd64.s:2361
    frame #7: crash`main.parentFunc2 at crash.go:14
    frame #8: crash`runtime.goexit at asm_amd64.s:2361
    frame #9: crash`runtime.goexit at asm_amd64.s:2361
* thread #2, name = 'crash', stop reason = signal SIGABRT
    frame #0: crash`runtime.raise at sys_linux_amd64.s:159
   156    MOVL  sig+0(FP), SI // arg 2
   157    MOVL  $SYS_tkill, AX
   158    SYSCALL
-> 159    RET
   160  
   161  TEXT runtime·raiseproc(SB),NOSPLIT,$0
   162    MOVL  $SYS_getpid, AX
(lldb) bt
* thread #2, name = 'crash', stop reason = signal SIGABRT
  * frame #0: crash`runtime.raise at sys_linux_amd64.s:159
    frame #1: crash`runtime.dieFromSignal(sig=6) at signal_unix.go:422
    frame #2: crash`runtime.crash at signal_unix.go:520
    frame #3: crash`runtime.dopanic_m(gp=0x000000c420001080, pc=0x00000000004255d1, sp=0x000000c420045f10) at panic.go:751
    frame #4: crash`runtime.dopanic.func1 at panic.go:598
    frame #5: crash`runtime.systemstack at asm_amd64.s:409
    frame #6: crash at proc.go:1092
    frame #7: crash`runtime.crash at signal_unix.go:520
    frame #8: crash`runtime.dopanic_m(gp=0x000000c420001080, pc=0x00000000004255d1, sp=0x000000c420045f10) at panic.go:751

We can see that program even switched thread between the start of the panic and the raise of the signal.
I can't reproduce anymore but I recall the thread 2 where the goroutine panicked didn't had the panicking stack anymore and was blocked in runtime.futex.

@eliasnaur

This comment has been minimized.

Contributor

eliasnaur commented Apr 24, 2018

Thanks, @gwik. Other than the platform difference (linux vs android), the other difference that comes to mind is the buildmode. Have you reproduced the wrong stack trace on android/arm64 (or android/arm) with a regular static binary outside the JVM? Conversely, are stack traces correct on linux/amd64 with a Go c-shared library loaded into a C main program?

@eliasnaur

This comment has been minimized.

Contributor

eliasnaur commented Apr 26, 2018

@steeve @gwik with #24883 fixed, are stack traces on iOS acceptable? If so, please update this issue or start a new one with the Android specific problems.

@steeve

This comment has been minimized.

Contributor

steeve commented Apr 27, 2018

@eliasnaur I'll do a check this afternoon (Paris time). Could you test by replacing the nil pointer with a good old panic() call ?
This might have other effects.

@gwik

This comment has been minimized.

Contributor

gwik commented Apr 27, 2018

@eliasnaur thanks for making this happen, this is great news.

@steeve

This comment has been minimized.

Contributor

steeve commented Apr 27, 2018

I'm happy to report that breakpoints work (note the local var i):
image

@steeve

This comment has been minimized.

Contributor

steeve commented Apr 27, 2018

However, in the case of a panic, the stack, while technically right, is meaningless:
image

Here is the full text version:

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
  * frame #0: 0x000000010042f9f8 testapp-swift`runtime.raiseproc at <autogenerated>:1
    frame #1: 0x000000010041b500 testapp-swift`runtime.dieFromSignal(sig=6) at signal_unix.go:424
    frame #2: 0x000000010041b6b0 testapp-swift`runtime.crash at signal_unix.go:525
    frame #3: 0x0000000100408998 testapp-swift`runtime.dopanic_m(gp=0x0000000130000600, pc=0x00000001004080ec, sp=0x0000000130046de0) at panic.go:758
    frame #4: 0x000000010042bab4 testapp-swift`runtime.fatalpanic.func1 at panic.go:657
    frame #5: 0x000000010042cd10 testapp-swift`runtime.systemstack at <autogenerated>:1
    frame #6: 0x000000010040c5f0 testapp-swift`runtime.startTheWorldWithSema(emitTraceEvent=<unavailable>, ~r1=<unavailable>) at proc.go:1094
    frame #7: 0x0000000100469708 testapp-swift`CrashCrashPanic + 12
    frame #8: 0x00000001003e23e8 testapp-swift`ViewController.buttonUp(sender=Any @ 0x000000016fa21860, self=0x0000000101614040) at ViewController.swift:79
    frame #9: 0x00000001003e24cc testapp-swift`@objc ViewController.buttonUp(_:) at ViewController.swift:0
    frame #10: 0x000000018f2826c8 UIKit`-[UIApplication sendAction:to:from:forEvent:] + 96
    frame #11: 0x000000018f3a38a4 UIKit`-[UIControl sendAction:to:forEvent:] + 80
    frame #12: 0x000000018f28877c UIKit`-[UIControl _sendActionsForEvents:withEvent:] + 440
    frame #13: 0x000000018f3be1dc UIKit`-[UIControl touchesEnded:withEvent:] + 572
    frame #14: 0x000000018f305a48 UIKit`-[UIWindow _sendTouchesForEvent:] + 2428
    frame #15: 0x000000018f2fa8f8 UIKit`-[UIWindow sendEvent:] + 3160
    frame #16: 0x000000018f2f9238 UIKit`-[UIApplication sendEvent:] + 340
    frame #17: 0x000000018fadac0c UIKit`__dispatchPreprocessedEventFromEventQueue + 2340
    frame #18: 0x000000018fadd1b8 UIKit`__handleEventQueueInternal + 4744
    frame #19: 0x000000018fad6258 UIKit`__handleHIDEventFetcherDrain + 152
    frame #20: 0x00000001854fb404 CoreFoundation`__CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 24
    frame #21: 0x00000001854fac2c CoreFoundation`__CFRunLoopDoSources0 + 276
    frame #22: 0x00000001854f879c CoreFoundation`__CFRunLoopRun + 1204
    frame #23: 0x0000000185418da8 CoreFoundation`CFRunLoopRunSpecific + 552
    frame #24: 0x00000001873fb020 GraphicsServices`GSEventRunModal + 100
    frame #25: 0x000000018f3f978c UIKit`UIApplicationMain + 236
    frame #26: 0x00000001003e3f2c testapp-swift`main at AppDelegate.swift:14
    frame #27: 0x0000000184ea9fc0 libdyld.dylib`start + 4
(lldb)

But I think that warrants another issue.

@eliasnaur

This comment has been minimized.

Contributor

eliasnaur commented Apr 29, 2018

@aclements you did CL 93658 where a systemstack() is wrapping the dieFromSignal you see in the stack trace above. Is there a way to avoid that so the backtrace instead shows the correct call sequence leading to the panic()?

@steeve

This comment has been minimized.

Contributor

steeve commented Apr 29, 2018

One might consider the value of actually printing the full stack trace to stderr in that scenario. Perhaps a new debug.SetTraceback value ?

From my user perspective, I'd be fine not having the stack trace printed at all in that scenario if it makes the LLDB backtraces good.

Ideally, a panic could make the debugger halt on the offending line. But I have no idea how to do that properly.

@gopherbot

This comment has been minimized.

gopherbot commented Apr 29, 2018

Change https://golang.org/cl/110065 mentions this issue: runtime: perform crashes outside systemstack

gopherbot pushed a commit that referenced this issue Apr 30, 2018

runtime: perform crashes outside systemstack
CL 93658 moved stack trace printing inside a systemstack call to
sidestep complexity in case the runtime is in a inconsistent state.

Unfortunately, debuggers generating backtraces for a Go panic
will be confused and come up with a technical correct but useless
stack. This CL moves just the crash performing - typically a SIGABRT
signal - outside the systemstack call to improve backtraces.

Unfortunately, the crash function now needs to be marked nosplit and
that triggers the no split stackoverflow check. To work around that,
split fatalpanic in two: fatalthrow for runtime.throw and fatalpanic for
runtime.gopanic. Only Go panics really needs crashes on the right stack
and there is enough stack for gopanic.

Example program:

package main

import "runtime/debug"

func main() {
	debug.SetTraceback("crash")
	crash()
}

func crash() {
	panic("panic!")
}

Before:
(lldb) bt
* thread #1, name = 'simple', stop reason = signal SIGABRT
  * frame #0: 0x000000000044ffe4 simple`runtime.raise at <autogenerated>:1
    frame #1: 0x0000000000438cfb simple`runtime.dieFromSignal(sig=<unavailable>) at signal_unix.go:424
    frame #2: 0x0000000000438ec9 simple`runtime.crash at signal_unix.go:525
    frame #3: 0x00000000004268f5 simple`runtime.dopanic_m(gp=<unavailable>, pc=<unavailable>, sp=<unavailable>) at panic.go:758
    frame #4: 0x000000000044bead simple`runtime.fatalpanic.func1 at panic.go:657
    frame #5: 0x000000000044d066 simple`runtime.systemstack at <autogenerated>:1
    frame #6: 0x000000000042a980 simple at proc.go:1094
    frame #7: 0x0000000000438ec9 simple`runtime.crash at signal_unix.go:525
    frame #8: 0x00000000004268f5 simple`runtime.dopanic_m(gp=<unavailable>, pc=<unavailable>, sp=<unavailable>) at panic.go:758
    frame #9: 0x000000000044bead simple`runtime.fatalpanic.func1 at panic.go:657
    frame #10: 0x000000000044d066 simple`runtime.systemstack at <autogenerated>:1
    frame #11: 0x000000000042a980 simple at proc.go:1094
    frame #12: 0x00000000004268f5 simple`runtime.dopanic_m(gp=<unavailable>, pc=<unavailable>, sp=<unavailable>) at panic.go:758
    frame #13: 0x000000000044bead simple`runtime.fatalpanic.func1 at panic.go:657
    frame #14: 0x000000000044d066 simple`runtime.systemstack at <autogenerated>:1
    frame #15: 0x000000000042a980 simple at proc.go:1094
    frame #16: 0x000000000044bead simple`runtime.fatalpanic.func1 at panic.go:657
    frame #17: 0x000000000044d066 simple`runtime.systemstack at <autogenerated>:1

After:
(lldb) bt
* thread #7, stop reason = signal SIGABRT
  * frame #0: 0x0000000000450024 simple`runtime.raise at <autogenerated>:1
    frame #1: 0x0000000000438d1b simple`runtime.dieFromSignal(sig=<unavailable>) at signal_unix.go:424
    frame #2: 0x0000000000438ee9 simple`runtime.crash at signal_unix.go:525
    frame #3: 0x00000000004264e3 simple`runtime.fatalpanic(msgs=<unavailable>) at panic.go:664
    frame #4: 0x0000000000425f1b simple`runtime.gopanic(e=<unavailable>) at panic.go:537
    frame #5: 0x0000000000470c62 simple`main.crash at simple.go:11
    frame #6: 0x0000000000470c00 simple`main.main at simple.go:6
    frame #7: 0x0000000000427be7 simple`runtime.main at proc.go:198
    frame #8: 0x000000000044ef91 simple`runtime.goexit at <autogenerated>:1

Updates #22716

Change-Id: Ib5fa35c13662c1dac2f1eac8b59c4a5824b98d92
Reviewed-on: https://go-review.googlesource.com/110065
Run-TryBot: Elias Naur <elias.naur@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>

gwik added a commit to znly/go that referenced this issue May 7, 2018

[release-branch.go1.10] runtime: perform crashes outside systemstack
CL 93658 moved stack trace printing inside a systemstack call to
sidestep complexity in case the runtime is in a inconsistent state.

Unfortunately, debuggers generating backtraces for a Go panic
will be confused and come up with a technical correct but useless
stack. This CL moves just the crash performing - typically a SIGABRT
signal - outside the systemstack call to improve backtraces.

Unfortunately, the crash function now needs to be marked nosplit and
that triggers the no split stackoverflow check. To work around that,
split fatalpanic in two: fatalthrow for runtime.throw and fatalpanic for
runtime.gopanic. Only Go panics really needs crashes on the right stack
and there is enough stack for gopanic.

Example program:

package main

import "runtime/debug"

func main() {
	debug.SetTraceback("crash")
	crash()
}

func crash() {
	panic("panic!")
}

Before:
(lldb) bt
* thread golang#1, name = 'simple', stop reason = signal SIGABRT
  * frame #0: 0x000000000044ffe4 simple`runtime.raise at <autogenerated>:1
    frame golang#1: 0x0000000000438cfb simple`runtime.dieFromSignal(sig=<unavailable>) at signal_unix.go:424
    frame golang#2: 0x0000000000438ec9 simple`runtime.crash at signal_unix.go:525
    frame golang#3: 0x00000000004268f5 simple`runtime.dopanic_m(gp=<unavailable>, pc=<unavailable>, sp=<unavailable>) at panic.go:758
    frame golang#4: 0x000000000044bead simple`runtime.fatalpanic.func1 at panic.go:657
    frame golang#5: 0x000000000044d066 simple`runtime.systemstack at <autogenerated>:1
    frame golang#6: 0x000000000042a980 simple at proc.go:1094
    frame golang#7: 0x0000000000438ec9 simple`runtime.crash at signal_unix.go:525
    frame golang#8: 0x00000000004268f5 simple`runtime.dopanic_m(gp=<unavailable>, pc=<unavailable>, sp=<unavailable>) at panic.go:758
    frame golang#9: 0x000000000044bead simple`runtime.fatalpanic.func1 at panic.go:657
    frame golang#10: 0x000000000044d066 simple`runtime.systemstack at <autogenerated>:1
    frame golang#11: 0x000000000042a980 simple at proc.go:1094
    frame golang#12: 0x00000000004268f5 simple`runtime.dopanic_m(gp=<unavailable>, pc=<unavailable>, sp=<unavailable>) at panic.go:758
    frame golang#13: 0x000000000044bead simple`runtime.fatalpanic.func1 at panic.go:657
    frame golang#14: 0x000000000044d066 simple`runtime.systemstack at <autogenerated>:1
    frame golang#15: 0x000000000042a980 simple at proc.go:1094
    frame golang#16: 0x000000000044bead simple`runtime.fatalpanic.func1 at panic.go:657
    frame golang#17: 0x000000000044d066 simple`runtime.systemstack at <autogenerated>:1

After:
(lldb) bt
* thread golang#7, stop reason = signal SIGABRT
  * frame #0: 0x0000000000450024 simple`runtime.raise at <autogenerated>:1
    frame golang#1: 0x0000000000438d1b simple`runtime.dieFromSignal(sig=<unavailable>) at signal_unix.go:424
    frame golang#2: 0x0000000000438ee9 simple`runtime.crash at signal_unix.go:525
    frame golang#3: 0x00000000004264e3 simple`runtime.fatalpanic(msgs=<unavailable>) at panic.go:664
    frame golang#4: 0x0000000000425f1b simple`runtime.gopanic(e=<unavailable>) at panic.go:537
    frame golang#5: 0x0000000000470c62 simple`main.crash at simple.go:11
    frame golang#6: 0x0000000000470c00 simple`main.main at simple.go:6
    frame golang#7: 0x0000000000427be7 simple`runtime.main at proc.go:198
    frame golang#8: 0x000000000044ef91 simple`runtime.goexit at <autogenerated>:1

Updates golang#22716

Change-Id: Ib5fa35c13662c1dac2f1eac8b59c4a5824b98d92
Reviewed-on: https://go-review.googlesource.com/110065
Run-TryBot: Elias Naur <elias.naur@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>

Conflict resolved :

getcallersp needs an argument.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment