cmd/compile: improve fn arg recovery in tracebacks via call site records


With the current Go compiler + runtime, when a program crashes/panics and we generate a traceback, the stack walking code in addition to printing out the names of functions present in the trace also makes an attempt to print function argument information.

Arg printing is done opportunistically, and is which is to say that the runtime will print the contents of the stack locations associated with parameters for a function, which are sometimes accurate but more often than not, inaccurate.

This issue to track the idea of enhancing the pclntab info for Go binaries to try to do a little better at recovering correct arg values. First a background section talking about the current implementation characteristics, then a second section talking about ideas on how we could do better.


**Background**

Here is a toy program that when run, causes a crash due to an out-of-bounds array access:

```
package main

var G1, G2 int
var C [10]int

func main() {
	println(foo(G1+11, G2+11, "blah", 5.0))
}

func foo(x, y int, s string, f float32) int {
	var big [1010]int
	q := x + y
	il := x + 1
	jl := y + 1
	for i := 0; i < il; i++ {
		for j := 0; j < jl; j++ {
			q += bar(i, j)
		}
		big[i]++
	}
	return q + big[0]
}

func bar(q, r int) int {
	v := q + 1 + C[0]
	defer func() { G1 += G2 }()
	for i := 0; i < 10; i++ {
		C[i] &= 0x3
	}
	C[v] += 3
	return v
}
```

If you run this program optimization disabled, here's the crash you get:

```
$ go run -trimpath -gcflags="-l -N" ./example.go
panic: runtime error: index out of range [10] with length 10

goroutine 1 [running]:
main.bar(0x9, 0x0)
	./example.go:31 +0x12f
main.foo(0xb, 0xb, {0x468238, 0x4}, 0x40a00000)
	./example.go:18 +0xf8
main.main()
	./example.go:8 +0x3d
exit status 2
```

Note that for the "main.bar" frame, the runtime is able to correctly print the values of the incoming parameters "q" and "r", and similarly the args for "main.foo" show up correctly as well.

Now here's what happens when you run a default build (optimization enabled, without "-l -N"):

```
$  go run -trimpath ./example.go
panic: runtime error: index out of range [10] with length 10

goroutine 1 [running]:
main.bar(0x0?, 0xc0000a0688?)
	./example.go:30 +0x9d
main.foo(0x60?, 0x0?, {0xc00010e058?, 0x4ca370?}, 0x45dcc0?)
	./example.go:17 +0x93
main.main()
	./example.go:7 +0x3d
exit status 2
```

The args for "main.bar" are shown incorrectly as "0x0?, 0xc0000a0688?"; the question mark symbols show that the compiler is aware of the fact that params are no longer live at the point where the crash takes place, but is making a "best effort" by reading the spill spill locations for the params (stack slot that the param is written to if we have to call morestack). In this case the contents of the spill slot are pretty much garbage. Similarly for the "main.foo" parameter.

Suppose we rewrite "var big [10]int" at line 12 to "var big [1010]int" and try again:

```
$ go run -trimpath ./example.go
panic: runtime error: index out of range [10] with length 10

goroutine 1 [running]:
main.bar(0x0?, 0x0?)
	./example.go:30 +0x9d
main.foo(0xb?, 0xb?, {0x468238?, 0x4?}, 0x40a00000?)
	./example.go:17 +0x94
main.main()
	./example.go:7 +0x3d
exit status 2
```

We still have incorrect values for the "main.bar" args, but the "main.foo" args are looking pretty decent -- this is due to the fact that the large stack frame for the routine triggered a call to "morestack", so the spill locations were actually written to, so the values we read back are correct.

Overall however the story is not great on reporting arg values accurately, and since we moved to the register ABI, the fraction of inaccurate values printed has definitely gone up.

**One way to do better**

In the DWARF world, compiler developers also wrestle with a similar problem, e.g. you want to print the values of incoming params, but the params are no longer live at the point where something bad happens.

A while back DWARF was enhanced with some extra mechanisms to help here, specifically the `DW_TAG_callsite` and `DW_TAG_callsite_value` tags. How these work are described in the DWARF standard, also in https://dwarfstd.org/issues/100909.2.html, but here rough idea is that the compiler generates supplemental location expressions at each callsite that describe how to compute the value of each argument passed at the call.  When the unwinder is trying to recover function args for a given frame and runs into "no longer live" params, it can go back up the stack and use the callsite DIEs to recover them. Example:

```
    func foo(x, y int) {
       var x [10]int
       ...
       bar(&x[0])
       ...
    }
    func bar(p *int) int {
      a := *p // last use of "p"
      ...
      if ... {
        panic("bad")
      }
    }
```

Within "foo", the compiler emits a DWARF `DW_TAG_call_site` DIE for the the call to "bar", and then examines each expression feeding into the call. For the first param, since the frame offset of "x" is know, it write out a `DW_AT_call_site_value` attribute indicating that the value of the arg can be computed as FP plus a constant.

When the unwinder is emitting stack trace output for "bar", since parameter "p" is no longer live at the panic, it instead finds the subprogram DIE for "foo", then locates the correct `DW_TAG_call_site` for the call to "bar", and then uses the location expression in the first `DW_AT_call_site_value` attribute to arrive at an accurate value for the argument.

This is not a bulletproof solution, of course, since the arg value may be something that we can't describe with a single location expression, and we're again vulnerable to values being clobbered, but in many cases it can help recover the value.

We could in theory implement the same sort of thing in the Go compiler with the pclntab-- emit records for each call site (if that site has one or more recoverable arg values) and then some sort of simple/restricted location expression that tells how to compute the value of the arg.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cmd/compile: improve fn arg recovery in tracebacks via call site records #65021

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

cmd/compile: improve fn arg recovery in tracebacks via call site records #65021

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions