Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: incorrect debug information for some function and method parameters #46845

Closed
PeterFeicht opened this issue Jun 21, 2021 · 18 comments
Closed
Labels
NeedsFix release-blocker
Milestone

Comments

@PeterFeicht
Copy link

@PeterFeicht PeterFeicht commented Jun 21, 2021

What version of Go are you using (go version)?

go version devel go1.17-97cee43c93 Thu Jun 17 04:32:50 2021 +0000 windows/amd64

What operating system and processor architecture are you using (go env)?

windows/amd64

go env Output
$ go env
set GO111MODULE=
set GOARCH=amd64
set GOBIN=
set GOCACHE=C:\Users\peter.feichtinger\AppData\Local\go-build
set GOENV=C:\Users\peter.feichtinger\AppData\Roaming\go\env
set GOEXE=.exe
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOINSECURE=
set GOMODCACHE=C:\msys64\home\peter.feichtinger\go\pkg\mod
set GONOPROXY=
set GONOSUMDB=
set GOOS=windows
set GOPATH=C:/msys64/home/peter.feichtinger/go
set GOPRIVATE=
set GOPROXY=https://proxy.golang.org,direct
set GOROOT=C:\Workspaces\git\go
set GOSUMDB=sum.golang.org
set GOTMPDIR=
set GOTOOLDIR=C:\Workspaces\git\go\pkg\tool\windows_amd64
set GOVCS=
set GOVERSION=devel go1.17-97cee43c93 Thu Jun 17 04:32:50 2021 +0000
set GCCGO=gccgo
set AR=ar
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set GOMOD=C:\Workspaces\go\src\regabiDwarf\go.mod
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0 -fdebug-prefix-map=C:\msys64\tmp\go-build1085975185=/tmp/go-build -gno-record-gcc-switches

What did you do?

There seems to be something wrong with the debug information for methods (e.g. database/sql.(*DB).execDC) and functions (e.g. runtime.selunlock), but only with optimizations enabled. According to the ABI spec, each argument is either passed entirely in registers or entirely on the stack. That is also what I'm seeing in the generated code for both the optimized an unoptimized cases.

However, debug information for the optimized case shows the second argument being split up between stack and registers in both cases. See below for debug information of database/sql.(*DB).execDC for both an unoptimized and an optimized build.

The sample program I compiled (didn't run it) is below. I only included a transaction so I could look at what the call site of database/sql.(*DB).execDC looks like in database/sql.(*Tx).ExecContext, to confirm the actual ABI.

package main

import (
   "context"
   "database/sql"
)

func main() {
   db, _ := sql.Open("foo", "")
   _, _ = db.Exec("SELECT 1;")

   tx, _ := db.BeginTx(context.Background(), nil)
   _, _ = tx.Exec("SELECT 1;")
}

The function signatures look like this:

func (db *DB) execDC(ctx context.Context, dc *driverConn, release func(error), query string, args []interface{}) (res Result, err error)

func selunlock(scases []scase, lockorder []uint16)

In both cases the second parameter is affected, but I don't know if that's always the case.

Optimized binary (no build options)

 <2><3307>: Abbrev Number: 16 (DW_TAG_formal_parameter)
    <3308>   DW_AT_name        : ctx
    <330c>   DW_AT_variable_parameter: 0
    <330d>   DW_AT_decl_line   : 1629
    <330f>   DW_AT_type        : <0x619d3>
    <3313>   DW_AT_location    : 0x7250 (location list)
00007250 ffffffffffffffff 00000000004a2340 (base address)
00007260 00000000004a2340 00000000004a2414 (DW_OP_fbreg: 32; DW_OP_piece: 8; DW_OP_reg2 (rcx); DW_OP_piece: 8)
00007279 00000000004a2414 00000000004a29c5 (DW_OP_fbreg: 32; DW_OP_piece: 8; DW_OP_fbreg: 40; DW_OP_piece: 8)
00007293 <End of list>
All parameters
 <1><32c6>: Abbrev Number: 3 (DW_TAG_subprogram)
    <32c7>   DW_AT_name        : database/sql.(*DB).execDC
    <32e1>   DW_AT_low_pc      : 0x4a2340
    <32e9>   DW_AT_high_pc     : 0x4a29c5
    <32f1>   DW_AT_frame_base  : 1 byte block: 9c  (DW_OP_call_frame_cfa)
    <32f3>   DW_AT_decl_file   : 0x4
    <32f7>   DW_AT_external    : 1
 <2><32f8>: Abbrev Number: 16 (DW_TAG_formal_parameter)
    <32f9>   DW_AT_name        : db
    <32fc>   DW_AT_variable_parameter: 0
    <32fd>   DW_AT_decl_line   : 1629
    <32ff>   DW_AT_type        : <0x659f1>
    <3303>   DW_AT_location    : 0x721d (location list)
0000721d ffffffffffffffff 00000000004a2340 (base address)
0000722d 00000000004a2340 00000000004a240e (DW_OP_reg0 (rax))
00007240 
 <2><3307>: Abbrev Number: 16 (DW_TAG_formal_parameter)
    <3308>   DW_AT_name        : ctx
    <330c>   DW_AT_variable_parameter: 0
    <330d>   DW_AT_decl_line   : 1629
    <330f>   DW_AT_type        : <0x619d3>
    <3313>   DW_AT_location    : 0x7250 (location list)
00007250 ffffffffffffffff 00000000004a2340 (base address)
00007260 00000000004a2340 00000000004a2414 (DW_OP_fbreg: 32; DW_OP_piece: 8; DW_OP_reg2 (rcx); DW_OP_piece: 8)
00007279 00000000004a2414 00000000004a29c5 (DW_OP_fbreg: 32; DW_OP_piece: 8; DW_OP_fbreg: 40; DW_OP_piece: 8)
00007293 
 <2><3317>: Abbrev Number: 16 (DW_TAG_formal_parameter)
    <3318>   DW_AT_name        : dc
    <331b>   DW_AT_variable_parameter: 0
    <331c>   DW_AT_decl_line   : 1629
    <331e>   DW_AT_type        : <0x658d5>
    <3322>   DW_AT_location    : 0x72a3 (location list)
000072a3 ffffffffffffffff 00000000004a2340 (base address)
000072b3 00000000004a2340 00000000004a2419 (DW_OP_reg5 (rdi))
000072c6 00000000004a2419 00000000004a29c5 (DW_OP_fbreg: 48)
000072da 
 <2><3326>: Abbrev Number: 16 (DW_TAG_formal_parameter)
    <3327>   DW_AT_name        : release
    <332f>   DW_AT_variable_parameter: 0
    <3330>   DW_AT_decl_line   : 1629
    <3332>   DW_AT_type        : <0x67ad4>
    <3336>   DW_AT_location    : 0x72ea (location list)
000072ea ffffffffffffffff 00000000004a2340 (base address)
000072fa 00000000004a2340 00000000004a23f2 (DW_OP_reg4 (rsi))
0000730d 
 <2><333a>: Abbrev Number: 16 (DW_TAG_formal_parameter)
    <333b>   DW_AT_name        : query
    <3341>   DW_AT_variable_parameter: 0
    <3342>   DW_AT_decl_line   : 1629
    <3344>   DW_AT_type        : <0x51e43>
    <3348>   DW_AT_location    : 0x731d (location list)
0000731d ffffffffffffffff 00000000004a2340 (base address)
0000732d 00000000004a2340 00000000004a2419 (DW_OP_reg8 (r8); DW_OP_piece: 8; DW_OP_reg9 (r9); DW_OP_piece: 8)
00007345 00000000004a2419 00000000004a29c5 (DW_OP_fbreg: 64; DW_OP_piece: 8; DW_OP_fbreg: 72; DW_OP_piece: 8)
00007361 
 <2><334c>: Abbrev Number: 16 (DW_TAG_formal_parameter)
    <334d>   DW_AT_name        : args
    <3352>   DW_AT_variable_parameter: 0
    <3353>   DW_AT_decl_line   : 1629
    <3355>   DW_AT_type        : <0x6453f>
    <3359>   DW_AT_location    : 0x7371 (location list)
00007371 ffffffffffffffff 00000000004a2340 (base address)
00007381 00000000004a2340 00000000004a29c5 (DW_OP_call_frame_cfa; DW_OP_piece: 8; DW_OP_fbreg: 8; DW_OP_piece: 8; DW_OP_fbreg: 16; DW_OP_piece: 8)
0000739e   
 <2><335d>: Abbrev Number: 16 (DW_TAG_formal_parameter)
    <335e>   DW_AT_name        : res
    <3362>   DW_AT_variable_parameter: 1
    <3363>   DW_AT_decl_line   : 1629
    <3365>   DW_AT_type        : <0x671ae>
    <3369>   DW_AT_location    : 0x73ae (location list)
000073ae ffffffffffffffff 00000000004a2340 (base address)
000073be 00000000004a23a5 00000000004a29c5 (DW_OP_fbreg: -408)
000073d3 
 <2><336d>: Abbrev Number: 16 (DW_TAG_formal_parameter)
    <336e>   DW_AT_name        : err
    <3372>   DW_AT_variable_parameter: 1
    <3373>   DW_AT_decl_line   : 1629
    <3375>   DW_AT_type        : <0x5bbc3>
    <3379>   DW_AT_location    : 0x73e3 (location list)
000073e3 ffffffffffffffff 00000000004a2340 (base address)
000073f3 00000000004a23ae 00000000004a29c5 (DW_OP_fbreg: -392)
00007408 

Debug binary (gcflags -N)

 <2><3b06>: Abbrev Number: 16 (DW_TAG_formal_parameter)
    <3b07>   DW_AT_name        : ctx
    <3b0b>   DW_AT_variable_parameter: 0
    <3b0c>   DW_AT_decl_line   : 1629
    <3b0e>   DW_AT_type        : <0x65733>
    <3b12>   DW_AT_location    : 0x233d (location list)
0000233d ffffffffffffffff 00000000004e12c0 (base address)
0000234d 00000000004e12c0 00000000004e1321 (DW_OP_reg3 (rbx); DW_OP_piece: 8; DW_OP_reg2 (rcx); DW_OP_piece: 8)
00002365 00000000004e1321 00000000004e1dbb (DW_OP_fbreg: 32)
00002379 <End of list>
All parameters
 <1><3ac5>: Abbrev Number: 3 (DW_TAG_subprogram)
    <3ac6>   DW_AT_name        : database/sql.(*DB).execDC
    <3ae0>   DW_AT_low_pc      : 0x4e12c0
    <3ae8>   DW_AT_high_pc     : 0x4e1dbb
    <3af0>   DW_AT_frame_base  : 1 byte block: 9c   (DW_OP_call_frame_cfa)
    <3af2>   DW_AT_decl_file   : 0x4
    <3af6>   DW_AT_external    : 1
 <2><3af7>: Abbrev Number: 16 (DW_TAG_formal_parameter)
    <3af8>   DW_AT_name        : db
    <3afb>   DW_AT_variable_parameter: 0
    <3afc>   DW_AT_decl_line   : 1629
    <3afe>   DW_AT_type        : <0x6a3d5>
    <3b02>   DW_AT_location    : 0x22f6 (location list)
000022f6 ffffffffffffffff 00000000004e12c0 (base address)
00002306 00000000004e12c0 00000000004e1321 (DW_OP_reg0 (rax))
00002319 00000000004e1321 00000000004e1dbb (DW_OP_fbreg: 24)
0000232d 
 <2><3b06>: Abbrev Number: 16 (DW_TAG_formal_parameter)
    <3b07>   DW_AT_name        : ctx
    <3b0b>   DW_AT_variable_parameter: 0
    <3b0c>   DW_AT_decl_line   : 1629
    <3b0e>   DW_AT_type        : <0x65733>
    <3b12>   DW_AT_location    : 0x233d (location list)
0000233d ffffffffffffffff 00000000004e12c0 (base address)
0000234d 00000000004e12c0 00000000004e1321 (DW_OP_reg3 (rbx); DW_OP_piece: 8; DW_OP_reg2 (rcx); DW_OP_piece: 8)
00002365 00000000004e1321 00000000004e1dbb (DW_OP_fbreg: 32)
00002379 
 <2><3b16>: Abbrev Number: 16 (DW_TAG_formal_parameter)
    <3b17>   DW_AT_name        : dc
    <3b1a>   DW_AT_variable_parameter: 0
    <3b1b>   DW_AT_decl_line   : 1629
    <3b1d>   DW_AT_type        : <0x6a2b9>
    <3b21>   DW_AT_location    : 0x2389 (location list)
00002389 ffffffffffffffff 00000000004e12c0 (base address)
00002399 00000000004e12c0 00000000004e1321 (DW_OP_reg5 (rdi))
000023ac 00000000004e1321 00000000004e1dbb (DW_OP_fbreg: 48)
000023c0 
 <2><3b25>: Abbrev Number: 16 (DW_TAG_formal_parameter)
    <3b26>   DW_AT_name        : release
    <3b2e>   DW_AT_variable_parameter: 0
    <3b2f>   DW_AT_decl_line   : 1629
    <3b31>   DW_AT_type        : <0x6cd45>
    <3b35>   DW_AT_location    : 0x23d0 (location list)
000023d0 ffffffffffffffff 00000000004e12c0 (base address)
000023e0 00000000004e12c0 00000000004e1321 (DW_OP_reg4 (rsi))
000023f3 00000000004e1321 00000000004e1dbb (DW_OP_fbreg: 56)
00002407 
 <2><3b39>: Abbrev Number: 16 (DW_TAG_formal_parameter)
    <3b3a>   DW_AT_name        : query
    <3b40>   DW_AT_variable_parameter: 0
    <3b41>   DW_AT_decl_line   : 1629
    <3b43>   DW_AT_type        : <0x54d0d>
    <3b47>   DW_AT_location    : 0x2417 (location list)
00002417 ffffffffffffffff 00000000004e12c0 (base address)
00002427 00000000004e12c0 00000000004e1321 (DW_OP_reg8 (r8); DW_OP_piece: 8; DW_OP_reg9 (r9); DW_OP_piece: 8)
0000243f 00000000004e1321 00000000004e1dbb (DW_OP_fbreg: 64)
00002454 
 <2><3b4b>: Abbrev Number: 15 (DW_TAG_formal_parameter)
    <3b4c>   DW_AT_name        : args
    <3b51>   DW_AT_variable_parameter: 0
    <3b52>   DW_AT_decl_line   : 1629
    <3b54>   DW_AT_type        : <0x689a9>
    <3b58>   DW_AT_location    : 1 byte block: 9c   (DW_OP_call_frame_cfa)
 <2><3b5a>: Abbrev Number: 15 (DW_TAG_formal_parameter)
    <3b5b>   DW_AT_name        : res
    <3b5f>   DW_AT_variable_parameter: 1
    <3b60>   DW_AT_decl_line   : 1629
    <3b62>   DW_AT_type        : <0x6c10d>
    <3b66>   DW_AT_location    : 3 byte block: 91 c8 7b     (DW_OP_fbreg: -568)
 <2><3b6a>: Abbrev Number: 15 (DW_TAG_formal_parameter)
    <3b6b>   DW_AT_name        : err
    <3b6f>   DW_AT_variable_parameter: 1
    <3b70>   DW_AT_decl_line   : 1629
    <3b72>   DW_AT_type        : <0x5ea8d>
    <3b76>   DW_AT_location    : 3 byte block: 91 f8 7b     (DW_OP_fbreg: -520)
@mdempsky
Copy link
Member

@mdempsky mdempsky commented Jun 21, 2021

/cc @thanm

@mdempsky mdempsky added the NeedsInvestigation label Jun 21, 2021
@thanm thanm self-assigned this Jun 21, 2021
@thanm
Copy link
Contributor

@thanm thanm commented Jun 21, 2021

Thanks for reporting. I did a quick build with my tip toolchain and I agree, from what I can see the variable location does indeed appear broken for "db".

I'll make a more detailed examination later this week. As you probably are aware, variable locations are not always 100% accurate when optimization is enabled, it is still something of a "best effort" proposition.

@PeterFeicht
Copy link
Author

@PeterFeicht PeterFeicht commented Jun 22, 2021

@thanm did you mix them up or is it really db that's broken for you? In both cases that I looked at, it was the second parameter's location that was messed up.

So regarding optimization, in case of optimized binaries debug information might not be 100% reliable. Can I expect debug information to be reliable for non-optimized cases though?
And on a related note, should parameter assignment (to registers/stack) be the exact same between optimized and non-optimized builds?

@PeterFeicht
Copy link
Author

@PeterFeicht PeterFeicht commented Jun 22, 2021

Alright so I'm not sure if that's the same issue or a different one, but I found another case of incorrect debug information. This one is the same for optimized and non-optimized builds though: the third parameter sp of runtime.dopanic_m is passed in rcx on amd64, but the location list is missing the register entry (it only has the entry where it's already spilled to the stack, oddly not into its dedicated spill space).

func dopanic_m(gp *g, pc, sp uintptr) bool

This time it's not the second parameter that's affected, but the third one.

 <2><3480e>: Abbrev Number: 16 (DW_TAG_formal_parameter)
    <3480f>   DW_AT_name        : sp
    <34812>   DW_AT_variable_parameter: 0
    <34813>   DW_AT_decl_line   : 1371
    <34815>   DW_AT_type        : <0x544fa>
    <34819>   DW_AT_location    : 0x4b8b0 (location list)
0004b8b0 ffffffffffffffff 0000000000434960 (base address)
0004b8c0 0000000000434960 0000000000434bfe (DW_OP_fbreg: -24)
0004b8d4 <End of list>

It's the exact same between debug and release builds, and it's very similar between Windows and Linux. Since it's the same in both debug and release builds I guess it's a different issue, though. Should I open another ticket for this?

All parameters
 <1><347c6>: Abbrev Number: 3 (DW_TAG_subprogram)
    <347c7>   DW_AT_name        : runtime.dopanic_m
    <347d9>   DW_AT_low_pc      : 0x434960
    <347e1>   DW_AT_high_pc     : 0x434bfe
    <347e9>   DW_AT_frame_base  : 1 byte block: 9c  (DW_OP_call_frame_cfa)
    <347eb>   DW_AT_decl_file   : 0x4b
    <347ef>   DW_AT_external    : 1
 <2><347f0>: Abbrev Number: 16 (DW_TAG_formal_parameter)
    <347f1>   DW_AT_name        : gp
    <347f4>   DW_AT_variable_parameter: 0
    <347f5>   DW_AT_decl_line   : 1371
    <347f7>   DW_AT_type        : <0x55799>
    <347fb>   DW_AT_location    : 0x4b823 (location list)
0004b823 ffffffffffffffff 0000000000434960 (base address)
0004b833 0000000000434960 000000000043499e (DW_OP_reg0 (rax))
0004b846 000000000043499e 0000000000434bfe (DW_OP_call_frame_cfa)
0004b859 
 <2><347ff>: Abbrev Number: 16 (DW_TAG_formal_parameter)
    <34800>   DW_AT_name        : pc
    <34803>   DW_AT_variable_parameter: 0
    <34804>   DW_AT_decl_line   : 1371
    <34806>   DW_AT_type        : <0x544fa>
    <3480a>   DW_AT_location    : 0x4b869 (location list)
0004b869 ffffffffffffffff 0000000000434960 (base address)
0004b879 0000000000434960 000000000043499e (DW_OP_reg3 (rbx))
0004b88c 000000000043499e 0000000000434bfe (DW_OP_fbreg: -32)
0004b8a0 
 <2><3480e>: Abbrev Number: 16 (DW_TAG_formal_parameter)
    <3480f>   DW_AT_name        : sp
    <34812>   DW_AT_variable_parameter: 0
    <34813>   DW_AT_decl_line   : 1371
    <34815>   DW_AT_type        : <0x544fa>
    <34819>   DW_AT_location    : 0x4b8b0 (location list)
0004b8b0 ffffffffffffffff 0000000000434960 (base address)
0004b8c0 0000000000434960 0000000000434bfe (DW_OP_fbreg: -24)
0004b8d4 

@thanm
Copy link
Contributor

@thanm thanm commented Jun 22, 2021

@thanm did you mix them up or is it really db that's broken for you? In both cases that I looked at, it was the second parameter's location that was messed up.

Yes, sorry, that was a typo / sloppiness on my part. "ctx" is the problematic parameter.

So regarding optimization, in case of optimized binaries debug information might not be 100% reliable. Can I expect debug information to be reliable for non-optimized cases though?

Yes.

And on a related note, should parameter assignment (to registers/stack) be the exact same between optimized and non-optimized builds?

Correct, the register assignment will be the same in both cases.

Alright so I'm not sure if that's the same issue or a different one, but I found another case of incorrect debug information. This one is the same for optimized and non-optimized builds though: the third parameter sp of runtime.dopanic_m is passed in rcx on amd64, but the location list is missing the register entry (it only has the entry where it's already spilled to the stack, oddly not into its dedicated spill space).

func dopanic_m(gp *g, pc, sp uintptr) bool

This time it's not the second parameter that's affected, but the third one.

 <2><3480e>: Abbrev Number: 16 (DW_TAG_formal_parameter)
    <3480f>   DW_AT_name        : sp
    <34812>   DW_AT_variable_parameter: 0
    <34813>   DW_AT_decl_line   : 1371
    <34815>   DW_AT_type        : <0x544fa>
    <34819>   DW_AT_location    : 0x4b8b0 (location list)
0004b8b0 ffffffffffffffff 0000000000434960 (base address)
0004b8c0 0000000000434960 0000000000434bfe (DW_OP_fbreg: -24)
0004b8d4 <End of list>

It's the exact same between debug and release builds, and it's very similar between Windows and Linux. Since it's the same in both debug and release builds I guess it's a different issue, though. Should I open another ticket for this?

No ticket needed. The Go runtime package (in order to function properly) is always compiled with optimization, regardless of the command line flags. So when you issue a command like

  go build -x -gcflags=all="-l -N" himom.go 

If you look at the trace output (from -x) you'll see that the Go command doesn't actually apply "-l -N" to the runtime package. There are a variety of reasons for this, but the bottom line is that it's a good deal easier to maintain the runtime and keep it working properly if optimization is always enabled.

@PeterFeicht
Copy link
Author

@PeterFeicht PeterFeicht commented Jul 12, 2021

@thanm did you have time to look into this?

Would it be feasible for a compiler-newbie like myself to attack this?

@thanm
Copy link
Contributor

@thanm thanm commented Jul 12, 2021

I've looked at it, but I'm not sure what the right fix is yet.

Probably not a great issue for a compiler newbie, the code in question is pretty complicated.

@PeterFeicht
Copy link
Author

@PeterFeicht PeterFeicht commented Jul 12, 2021

Yeah, I was afraid of that... Thanks for taking a look anyway.

@heschi
Copy link
Contributor

@heschi heschi commented Oct 6, 2021

Ping from the release team: any updates on this release blocker?

@michael-obermueller
Copy link

@michael-obermueller michael-obermueller commented Oct 8, 2021

@thanm as a workaround for this issue, we have implemented an algorithm that is able to compute DWARF location lists according to the register ABI. It uses DWARF types of function parameters and return values as input. We could run this algorithm on unoptimized binaries, maybe we can find an error pattern. Would such an analysis help you in your investigations?

@thanm
Copy link
Contributor

@thanm thanm commented Oct 8, 2021

@michael-obermueller , not sure what you are suggesting. This bug deals with DWARF for optimized code. How would it help to run a tool on unoptimized binaries?

@michael-obermueller
Copy link

@michael-obermueller michael-obermueller commented Oct 12, 2021

@thanm yes, you are right. This tool should be used to check DWARF location lists of optimized code in order to find possible error patterns. Would such an analysis help you in your investigations?

I'm asking, because maybe you already know what the problem is and further analysis is not needed:

I've looked at it, but I'm not sure what the right fix is yet.

Thanks

@thanm
Copy link
Contributor

@thanm thanm commented Oct 12, 2021

Would such an analysis help you in your investigations?

Thanks, but at this stage, no. Maybe later once some of the more low-hanging problems are fixed.

@aclements
Copy link
Member

@aclements aclements commented Oct 27, 2021

Ping @thanm, any progress on this? We're about to enter the freeze.

@thanm
Copy link
Contributor

@thanm thanm commented Oct 27, 2021

No progress to report, I have been working on other development tasks.

@thanm
Copy link
Contributor

@thanm thanm commented Nov 8, 2021

Here is some background info on what's happening in the compiler for this bug, for posterity. It is a fairly tricky bug, so I think some explanation is in order.

The code of interest is the "locationlists" phase, which runs very late in SSA processing: after all changes to the code for a function have been finalized, but before obj-related processing (such as adding in the prolog code that does the "morestack" check). Source file to look at is src/cmd/compile/internal/ssa/debug.go.

The locationlists pass has three main phases: an initialization/setup phase, a liveness phase, and a final phase that actually builds location lists for variables of interest. The liveness phase is concerned strictly with computing sets of live locations on basic block boundaries, e.g. for each block B, what variables will be in which locations (stack or registers) on entry to B.

The problematic behavior is not in liveness but in the final phase buildLocationLists. This phase makes a sweep through all blocks in the function in layout order. It maintains two main items of state during this walk: state.currentState.slots, which records the locations that all our variables are in at any given point, and state.pendingEntries, which records for each variable the ranges being built up to form a DWARF location list.

The main loop in buildLocationLists looks roughly like this (this is heavily simplified):

  for _, block := range block {
    <setup incoming locations for 'block'>
    for v := range block.Values {
      processValue(state, v, &changedVars)
      if ... {
        for v in changedVars {
          updateVar(state, v)
        }
	...
      }
    }
  }

The job of "processValue" is to look at the current SSA instruction ("v") and determine if it is doing something that will change the location of a variable, and if it is, store that information in "state.currentState.slots".

The job of updateVar() is to decide whether the things we've recorded in our state data structures about a given variable qualify as changes that would trigger a new DWARF location list segment for the variable, whether we can continue to extend the existing segment. It does this by comparing what's in state.currentState.slots for a variable with the info in "state.pendingEntries".

For example, let's suppose that our current state records variable V (a string) as being located in a pair of 8-byte slots at stack[32] and stack[40]. Now let's say that we process a load instruction that loads up stack[32] into register BX. This means that the first chunk of "V" is now in two places: memory and a register. updateVar() will look at this event and say, "No change needed to the pending DWARF location list. It's fine that the first half of V is now in a register, but we can still get it from memory, so leave things as they are".

Now let's try a different example: let's say that at the start of some basic block, we know that an integer variable "V2" is located in register R8, then processValue() hits an instruction in that block that moves R8 into CX. When updateVar() is called for V2, it will realize that this change represents an incompatible pair of locations, so it will trigger an update to the DWARF location expression, writing a new entry to the DWARF range list we're building up, and then updating state.pendingEntries to reflect the new reality.

Note that if 'updateVar' decides to add a new entry to the location list for a variable, it has to have a concrete SSA value (e.g. instruction) as a marker for that event (since the final location expression has to refer to some specific code offset). This is key.

Things get more complicated in buildLocationLists() due to the need to handle so-called zero-width ops: these are basically ephemeral SSA pseudo-ops that describe where a given variable is located at a given point of time. Examples include SSA phi function ops, and SSA argument ops (OpArg, OpArgIntReg, OpArgFloatReg). These ops are called "zero width" since they don't correspond to a real instruction in the final program. Thus we can't pass a zero-width op to "updateVar", since "updateVar" has to be given a specific concrete offset in the function so as to refer to that offset in the DWARF location list.

To make matters more complicated, there are essentially two flavors of zero-width ops: those holding lifetimes that begin at the start of the block, and those whose lifetime start correspond to some specific instruction. Operations that fall into the first category are things like Phi, Arg, ArgIntReg, etc. Operations that fall into the second category are Select0, Select1, etc.

The function buildLocationLists handles zero-width ops by basically buffering them up and waiting until it hits the next non-zero-width op that does something interesting, the code that does this (again, simplified for readability) looks like:

    changedVars = nil
    for val := range block.Values {
      var = processValue(state, val)
      if isZeroWidth(v) {
        changedVars = append(changedVars, var)
	continue
      }
      if ... {
        for v in changedVars {
          updateVar(state, v)
        }
	...
      }
    }
  }

The problem with this design is that when we do finally call "updateVar" for a parameter variable, we may have seen other changes for it. Here is a concrete example, this time from the function of interest in this bug:

    v68 = ArgIntReg <unsafe.Pointer> {ctx+0} [1] : BX (ctx[unsafe.Pointer])
    v67 = ArgIntReg <unsafe.Pointer> {ctx+8} [2] : CX (ctx+8[unsafe.Pointer])
    ...
    v281 = StoreReg <unsafe.Pointer> v67 : ctx+8[unsafe.Pointer]
    v467 = StoreReg <unsafe.Pointer> v68 : ctx[unsafe.Pointer]

Here we have a 16-byte context variable "ctx" arriving in two registers, "BX" and "CX". The first two values are zero-width, so we don't call updateVar yet, but rather we wait until the next concrete instruction. The problem here is that the first concrete instruction we see is the StoreReg that writes CX to memory, so by the time updateVar is invoked for the variable, it now looks as though the var is in two locations (stack and register). This is the source of the bad location lists for this specific issue.

There is a second bug also related to handling of zero width ops, which is that when we "buffer" a zero width op that is not a "block start" op (example, Select0) the eventual call to updateVar() passes the wrong value. The code is here (https://go.googlesource.com/go/+/759eaa22adb0ab883959e4a36c19f2dfe77b5895/src/cmd/compile/internal/ssa/debug.go#1151):

	// Not zero-width; i.e., a "real" instruction.
	zeroWidthPending = false
	blockPrologComplete = true
	for i, varID := range state.changedVars.contents() {
		if i < apcChangedSize { // buffered true start-of-block changes
/* HERE */		state.updateVar(VarID(varID), v.Block, BlockStart)
		} else {
			state.updateVar(VarID(varID), v.Block, v)
		}
	}
	state.changedVars.clear()
	apcChangedSize = 0

Note the first call to updateVar, which passes BlockStart. This value is incorrect for Select ops.

The fix that I am working on basically separates out the two categories of zero-width ops. The first category (OpArg, OpPhi, etc) whose lifetimes begin at block start will be handled by an initial pre-pass for each block. The second category (OpSelect*) will be handled as part of processing the remainder of the ops in the block.

@gopherbot
Copy link

@gopherbot gopherbot commented Nov 8, 2021

Change https://golang.org/cl/362244 mentions this issue: cmd/compile/internal/ssa: fix debug location gen issue with zero width ops

@gopherbot
Copy link

@gopherbot gopherbot commented Nov 18, 2021

Change https://golang.org/cl/364674 mentions this issue: dwtest: add new pseudo-package with DWARF location tests

@dmitshur dmitshur added NeedsFix and removed NeedsInvestigation labels Nov 18, 2021
gopherbot pushed a commit to golang/debug that referenced this issue Nov 22, 2021
This patch rolls out a new set of regression tests for verifying the
integrity of DWARF variable location expressions. The tests use a
"scaffold" consisting of harness program 'dwdumploc' that can be run
on a Go binary to dump out the DWARF location expressions for a
function's formal parameters, and then a new set of tests that invoke
the harness on specific canned testcases. The scaffold itself depends
on packages from Delve.

DWARF location expressions can refer to machine registers, making the
output of the dump architecture-dependent. The dumper program
currently supports { Amd64, Arm64 } x { Linux, Macho, Windows }, which
matches up mostly with current Delve support.

Updates golang/go#47354.
Updates golang/go#46845.

Change-Id: I7377c03b19e3cf43aa0f7698fcc636be9c852765
Reviewed-on: https://go-review.googlesource.com/c/debug/+/364674
Trust: Than McIntosh <thanm@google.com>
Trust: Dmitri Shuralyov <dmitshur@golang.org>
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
doujiang24 pushed a commit to doujiang24/debug that referenced this issue Jan 17, 2022
This patch rolls out a new set of regression tests for verifying the
integrity of DWARF variable location expressions. The tests use a
"scaffold" consisting of harness program 'dwdumploc' that can be run
on a Go binary to dump out the DWARF location expressions for a
function's formal parameters, and then a new set of tests that invoke
the harness on specific canned testcases. The scaffold itself depends
on packages from Delve.

DWARF location expressions can refer to machine registers, making the
output of the dump architecture-dependent. The dumper program
currently supports { Amd64, Arm64 } x { Linux, Macho, Windows }, which
matches up mostly with current Delve support.

Updates golang/go#47354.
Updates golang/go#46845.

Change-Id: I7377c03b19e3cf43aa0f7698fcc636be9c852765
Reviewed-on: https://go-review.googlesource.com/c/debug/+/364674
Trust: Than McIntosh <thanm@google.com>
Trust: Dmitri Shuralyov <dmitshur@golang.org>
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
@rsc rsc unassigned thanm Jun 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsFix release-blocker
Projects
None yet
Development

No branches or pull requests

9 participants