Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: arm64 bt in Delve get endless main.main after nil pointer dereference #63862

Closed
gnlsw opened this issue Nov 1, 2023 · 8 comments
Closed
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Debugging NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@gnlsw
Copy link

gnlsw commented Nov 1, 2023

What version of Go are you using (go version)?

$ go version
go version go1.20.10 linux/arm64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

CentOS Linux release 7.9.2009 (AltArch)

go env Output
$ go env
[root@localhost demo]# go env
GO111MODULE="off"
GOARCH="arm64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="arm64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/wsl/gopath/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/wsl/gopath"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/wsl/go/go1.20.10"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/wsl/go/go1.20.10/pkg/tool/linux_arm64"
GOVCS=""
GOVERSION="go1.20.10"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
GOWORK=""
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build1165718654=/tmp/go-build -gno-record-gcc-switches"

What did you do?

package main

func main() {
        var p_value *int = nil
        *p_value = 1
}
Delve Debugger
Version: 1.21.1
Build: $Id: a358c02f24aa7047ecc562b0587dc2d08330b2cf $ 

[root@localhost demo]# /home/wsl/gopath/bin/dlv exec demo
Type 'help' for list of commands.
(dlv) break main.main
Breakpoint 1 set at 0x65230 for main.main() ./demo.go:5
(dlv) c
> main.main() ./demo.go:5 (hits goroutine(1):1 total:1) (PC: 0x65230)
Warning: debugging optimized function
     1: package main
     2:
     3: func main() {
     4:         var p_value *int = nil
=>   5:         *p_value = 1
     6: }
(dlv) regs
 PC = 0x0000000000065230
 SP = 0x0000004000160770
 X0 = 0x0000000000065230
...
X29 = 0x0000004000160768
X30 = 0x000000000003d9a0

(dlv) n
> [unrecovered-panic] runtime.fatalpanic() /home/wsl/go/go1.20.10/src/runtime/panic.go:1145 (hits goroutine(1):1 total:1) (PC: 0x3b680)
Warning: debugging optimized function
        runtime.curg._panic.arg: interface {}(string) "runtime error: invalid memory address or nil pointer dereference"
  1140: // fatalpanic implements an unrecoverable panic. It is like fatalthrow, except
  1141: // that if msgs != nil, fatalpanic also prints panic messages and decrements
  1142: // runningPanicDefers once main is blocked from exiting.
  1143: //
  1144: //go:nosplit
=>1145: func fatalpanic(msgs *_panic) {
  1146:         pc := getcallerpc()
  1147:         sp := getcallersp()
  1148:         gp := getg()
  1149:         var docrash bool
  1150:         // Switch to the system stack to avoid any stack growth, which
(dlv) bt
 0  0x000000000003b680 in runtime.fatalpanic
    at /home/wsl/go/go1.20.10/src/runtime/panic.go:1145
 1  0x000000000003afac in runtime.gopanic
    at /home/wsl/go/go1.20.10/src/runtime/panic.go:987
 2  0x0000000000039a78 in runtime.panicmem
    at /home/wsl/go/go1.20.10/src/runtime/panic.go:260
 3  0x000000000004fea4 in runtime.sigpanic
    at /home/wsl/go/go1.20.10/src/runtime/signal_unix.go:841
 4  0x0000000000065238 in main.main
    at ./demo.go:5
 5  0x0000000000065238 in main.main
    at ./demo.go:5
 6  0x0000000000065238 in main.main
    at ./demo.go:5
 7  0x0000000000065238 in main.main
    at ./demo.go:5
 8  0x0000000000065238 in main.main
    at ./demo.go:5
(dlv) frame 4
> [unrecovered-panic] runtime.fatalpanic() /home/wsl/go/go1.20.10/src/runtime/panic.go:1145 (hits goroutine(1):1 total:1) (PC: 0x3b680)
Warning: debugging optimized function
Frame 4: ./demo.go:5 (PC: 65238)
     1: package main
     2:
     3: func main() {
     4:         var p_value *int = nil
=>   5:         *p_value = 1
     6: }
(dlv) regs
 PC = 0x0000000000065238
 SP = 0x0000004000160760
X29 = 0x0000004000160768
X30 = 0x0000000000065238

(dlv) frame 5
> [unrecovered-panic] runtime.fatalpanic() /home/wsl/go/go1.20.10/src/runtime/panic.go:1145 (hits goroutine(1):1 total:1) (PC: 0x3b680)
Warning: debugging optimized function
Frame 5: ./demo.go:5 (PC: 65238)
     1: package main
     2:
     3: func main() {
     4:         var p_value *int = nil
=>   5:         *p_value = 1
     6: }
(dlv) regs
 PC = 0x0000000000065238
 SP = 0x0000004000160760
X29 = 0x0000004000160768
X30 = 0x0000000000065238

(dlv)

What did you expect to see?

After nil pointer dereference, bt can get the righe stack.

What did you see instead?

After nil pointer dereference, stack is endless main.main

Before nil pointer dereference happens, we get the SP, named SP_value_1 = 0x0000004000160770.
After nil pointer dereference happens, we get the SP again on the same frame, named SP_value_2 = 0x0000004000160760.
Two SP value for same function is different. This may be the cause of coredump endless main.main.

This problem does not reproduce using x86.

@mauri870 mauri870 added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. compiler/runtime Issues related to the Go compiler and/or runtime. labels Nov 1, 2023
@mauri870
Copy link
Member

mauri870 commented Nov 1, 2023

/cc @golang/compiler

@mauri870 mauri870 changed the title arm64 bt get endless main.main after nil pointer dereference runtime: arm64 bt get endless main.main after nil pointer dereference Nov 1, 2023
@cherrymui cherrymui changed the title runtime: arm64 bt get endless main.main after nil pointer dereference runtime: arm64 bt in Delve get endless main.main after nil pointer dereference Nov 1, 2023
@cherrymui cherrymui added this to the Backlog milestone Nov 1, 2023
@mknyszek
Copy link
Contributor

mknyszek commented Nov 8, 2023

CC @aarzilli @derekparker

@derekparker
Copy link
Contributor

There is already a Delve issue opened regarding this: go-delve/delve#3545.

Haven't determined quite yet where the issue lies, so feel free to close this until we can definitively point to an upstream root cause.

@thanm
Copy link
Contributor

thanm commented Nov 8, 2023

@derekparker thanks for that -- will close this out for now.

@thanm thanm closed this as completed Nov 8, 2023
@derekparker
Copy link
Contributor

derekparker commented Nov 8, 2023

Sorry for the whiplash, but I believe this may actually be a compiler bug.

The following is the .debug_frame information generated for the following architectures:

AMD64 (which is working):

0000c96c 000000000000001c 00000000 FDE cie=00000000 pc=00000000004575e0..00000000004575ea 
     DW_CFA_def_cfa_offset_sf: 8 
     DW_CFA_advance_loc: 9 to 00000000004575e9 
     DW_CFA_nop 
     DW_CFA_nop 
     DW_CFA_nop 
     DW_CFA_nop 
     DW_CFA_nop

ARM64 (broken):

0000e05c 000000000000001c 00000000 FDE cie=00000000 pc=0000000000067720..0000000000067730
  DW_CFA_same_value: r30 (x30)
  DW_CFA_def_cfa_offset_sf: 0
  DW_CFA_advance_loc: 15 to 000000000006772f
  DW_CFA_nop
  DW_CFA_nop
  DW_CFA_nop

Notice in the ARM64 version we have DW_CFA_same_value: r30 (x30). It's reasonable the compiler would emit that because in this case main.main is a leaf function, so it's expected the return address would simply be stored in X30. However, runtime.sigpanic sets itself up so that it appears it is called directly by the function which generated the signal, so when Delve unwinds and follows the DWARF information it will endlessly loop thinking the value of X30 is correct as the return address for main.main.

If the program is modified such that main.main calls fmt.Println a few times we get:

00013ac4 000000000000002c 00000000 FDE cie=00000000 pc=000000000008b1f0..000000000008b410
  DW_CFA_same_value: r30 (x30)
  DW_CFA_def_cfa_offset_sf: 0
  DW_CFA_advance_loc: 20 to 000000000008b204
  DW_CFA_offset_extended_sf: r30 (x30) at cfa-192
  DW_CFA_def_cfa_offset_sf: 192
  DW_CFA_advance_loc2: 508 to 000000000008b400
  DW_CFA_same_value: r30 (x30)
  DW_CFA_def_cfa_offset_sf: 0
  DW_CFA_advance_loc: 15 to 000000000008b40f
  DW_CFA_nop
  DW_CFA_nop
  DW_CFA_nop
  DW_CFA_nop
  DW_CFA_nop
  DW_CFA_nop

Which now shows how we find the return address at cfa-192: DW_CFA_offset_extended_sf: r30 (x30) at cfa-192.

cc @thanm @mknyszek

@derekparker
Copy link
Contributor

To add onto the above, I'm not really sure what the compiler should generate in such a case. We could work around this in Delve, most likely, by special casing based on seeing runtime.sigpanic in the stack or something.

@cherrymui
Copy link
Member

@derekparker Thanks for the investigation. Interesting. I agree that this is a tricky case, because the frame is not usually there, unless it panics and sigpanic injects a frame. In the runtime's unwinder we special case sigpanic https://cs.opensource.google/go/go/+/master:src/runtime/traceback.go;l=498-510 . If you have any suggestion for DWARF unwind info we should generate, that would be great. Otherwise I feel the best option is probably to have Delve do similar special case, if that is not too complex. Thanks.

@cherrymui
Copy link
Member

cherrymui commented Nov 8, 2023

@thanm (or anybody) do you know what C/C++ compiler do for frameless leaf functions when -fnon-call-exceptions is enabled? What unwind information does it generate?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Debugging NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

6 participants