cmd/compile: improve fn arg recovery in tracebacks via call site records #65021
Labels
compiler/runtime
Issues related to the Go compiler and/or runtime.
NeedsDecision
Feedback is required from experts, contributors, and/or the community before a change can be made.
Milestone
With the current Go compiler + runtime, when a program crashes/panics and we generate a traceback, the stack walking code in addition to printing out the names of functions present in the trace also makes an attempt to print function argument information.
Arg printing is done opportunistically, and is which is to say that the runtime will print the contents of the stack locations associated with parameters for a function, which are sometimes accurate but more often than not, inaccurate.
This issue to track the idea of enhancing the pclntab info for Go binaries to try to do a little better at recovering correct arg values. First a background section talking about the current implementation characteristics, then a second section talking about ideas on how we could do better.
Background
Here is a toy program that when run, causes a crash due to an out-of-bounds array access:
If you run this program optimization disabled, here's the crash you get:
Note that for the "main.bar" frame, the runtime is able to correctly print the values of the incoming parameters "q" and "r", and similarly the args for "main.foo" show up correctly as well.
Now here's what happens when you run a default build (optimization enabled, without "-l -N"):
The args for "main.bar" are shown incorrectly as "0x0?, 0xc0000a0688?"; the question mark symbols show that the compiler is aware of the fact that params are no longer live at the point where the crash takes place, but is making a "best effort" by reading the spill spill locations for the params (stack slot that the param is written to if we have to call morestack). In this case the contents of the spill slot are pretty much garbage. Similarly for the "main.foo" parameter.
Suppose we rewrite "var big [10]int" at line 12 to "var big [1010]int" and try again:
We still have incorrect values for the "main.bar" args, but the "main.foo" args are looking pretty decent -- this is due to the fact that the large stack frame for the routine triggered a call to "morestack", so the spill locations were actually written to, so the values we read back are correct.
Overall however the story is not great on reporting arg values accurately, and since we moved to the register ABI, the fraction of inaccurate values printed has definitely gone up.
One way to do better
In the DWARF world, compiler developers also wrestle with a similar problem, e.g. you want to print the values of incoming params, but the params are no longer live at the point where something bad happens.
A while back DWARF was enhanced with some extra mechanisms to help here, specifically the
DW_TAG_callsite
andDW_TAG_callsite_value
tags. How these work are described in the DWARF standard, also in https://dwarfstd.org/issues/100909.2.html, but here rough idea is that the compiler generates supplemental location expressions at each callsite that describe how to compute the value of each argument passed at the call. When the unwinder is trying to recover function args for a given frame and runs into "no longer live" params, it can go back up the stack and use the callsite DIEs to recover them. Example:Within "foo", the compiler emits a DWARF
DW_TAG_call_site
DIE for the the call to "bar", and then examines each expression feeding into the call. For the first param, since the frame offset of "x" is know, it write out aDW_AT_call_site_value
attribute indicating that the value of the arg can be computed as FP plus a constant.When the unwinder is emitting stack trace output for "bar", since parameter "p" is no longer live at the panic, it instead finds the subprogram DIE for "foo", then locates the correct
DW_TAG_call_site
for the call to "bar", and then uses the location expression in the firstDW_AT_call_site_value
attribute to arrive at an accurate value for the argument.This is not a bulletproof solution, of course, since the arg value may be something that we can't describe with a single location expression, and we're again vulnerable to values being clobbered, but in many cases it can help recover the value.
We could in theory implement the same sort of thing in the Go compiler with the pclntab-- emit records for each call site (if that site has one or more recoverable arg values) and then some sort of simple/restricted location expression that tells how to compute the value of the arg.
The text was updated successfully, but these errors were encountered: