STACK WIN unwinding #240

Gankra · 2021-09-23T03:14:01Z

This seems to get the implementation working well on some examples in socorro I tested.

Notable problems I found and fixed:

I was too aggressively rejecting overlapping STACK WIN address ranges. It seems very common that the length of a STACK WIN record just covers the entirety of the function(?) and subsequent entries mask over it. So we have to truncate the lengths as we parse them to clean this up.
It seems that looking up STACK WIN entries based on the callee's raw $eip is incorrect, and that you need to subtract 1? (Which is what adjust_instruction does, so really we should just be using that value instead of eip?)
I was deriving grand_callee_parameter_size from the value provided in FUNC records, but it appears that STACK WIN records have the more accurate value.

Also I added in a quick and dirty more fancy version of the address_seems_valid that actually looks up the function record for the module, but it's not quite right because it doesn't have a case for "it's in the module but we couldn't load any function records". I should probably delete the code, I only wrote it while I was desperately trying to figure out what was breaking.

Surprisingly, I now don't have any examples which seem to need the accursed scanning or other hacks that breakpad is stuffed with for STACK WIN cfi!

TODO:

test more socorro cases
write tests
don't forward callee's registers as aggressively (doesn't seem to cause problems, but breakpad doesn't say those registers are valid anymore)

Gankra · 2021-09-23T03:16:02Z

reminder if anyone wants to test this out on some examples that I have instructions on how to use this with tecken here: https://github.com/luser/rust-minidump/tree/master/minidump-stackwalk#analyzing-firefox-minidumps

Gankra · 2021-09-23T04:36:20Z

Hmm the extra -1 offset I applied makes a ton of the breakpad x86 unittests break (both STACK WIN and STACK CFI). Seems there's something more subtle going on that I need to work out.

The reason I convinced myself I needed the -1 was that I was seeing a few cases where we were one-past-the-end of the record we were supposed to be using. Perhaps I'm still not computing/handling the length right?

luser · 2021-09-23T12:23:37Z

It seems that looking up STACK WIN entries based on the callee's raw $eip is incorrect, and that you need to subtract 1? (Which is what adjust_instruction does, so really we should just be using that value instead of eip?)

I think this is actually not unreasonable, because the return address on the stack is the instruction after the call, right?

luser · 2021-09-23T12:25:43Z

Surprisingly, I now don't have any examples which seem to need the accursed scanning or other hacks that breakpad is stuffed with for STACK WIN cfi!

Hah, amazing! I'm sure you could grovel through blame and bugzilla for examples, but all the actual crash reports are assuredly lost to the mists of time. Given that this is all 32-bit Windows-specific, maybe it's not worth the effort?

Gankra · 2021-09-23T15:00:09Z

Ah right, having slept on it I realized of course that we don't adjust the context frame's instruction, and that's the logic I need to make the -1 work properly in all cases!

As it turns out, most of my implementation was actually correct! The main issues seemed to stem from the breakpad symbol file being a nasty thing full of Lies. Most notably: * The byte length of a STACK WIN entry may actually cover the whole function(?) and just expect later STACK WIN entries that overlap with it to mask it out. To handle this, we now truncate the previous STACK WIN entry to the starting address of the next one during parsing. * The parameter_size field in FUNC entries is actually a trap, and the STACK WIN values are the ones that should be used. It's not 100% clear to me if the FUNC entries are *ever* reasonable, or if we should refuse to evaluate CFI if there is no STACK WIN entry to provide that value. For now I'm maximally permissive and will both use the FUNC value as a fallback, and further use 0 as a fallback if even that isn't present. (Defaulting to 0 is necessary when unwinding the context frame since there is no grand_callee yet, but it's not clear to me if that default should apply otherwise.) I also realized that I should be using the callee's adjusted "instruction" value rather than the raw value of `$eip` to lookup modules/symbols. This is more universally applied in a refactor in the next patch. This also adds a few "basic" tests from the breakpad testsuite. Notably I have omitted the tests for all the weird corner-case heuristics that breakpad uses like scanning from .raSearchStart. So far I haven't found an actual firefox minidump (in my limited testing) that needs these, so I'll punt on them until proven otherwise.

cfi unwining should be using callee.instruction instead of the pc/ip in the context. Rather than duplicate the logic to specially handle the context frame everywhere, just use the fact that we keep the right value in the callee and pass it down. To avoid having a million things passed down I also remove several other arguments that are just in the callee frame already. This adds some boilerplate for pulling out the callee validity, but I think it's fine.

also a little tweak to prevent register forwarding when using STACK WIN that I just absentmindedly added while reviewing the documentation

Gankra · 2021-09-30T01:39:22Z

PR complete, I think?

gabrielesvelto · 2021-09-30T13:19:21Z

I'd say yes! I'm going to throw some particularly nasty minidumps at it to look at the results but the few I've already tried all looked very good already.

Gankra · 2021-09-30T16:50:36Z

Ok I got one (1) socorro crash report confidently verified to have the same unwind for all threads using rust-minindump, so I'm happy landing this now. Can iterate on this more later.

(Verified using my new little script: https://github.com/Gankra/socc-pair/)

Gankra force-pushed the stack-win branch from 6801a3f to a8d94ac Compare September 23, 2021 15:03

This was referenced Sep 23, 2021

FUNC and WIN STACK entries disagree on parameter size mozilla/dump_syms#284

Open

STACK WIN entries have slightly incorrect lengths mozilla/dump_syms#285

Open

Gankra added 4 commits September 29, 2021 18:44

cleanup unwinding tracing

84f1be9

cleanup unwinding documentation and comments

1ddbdc1

also a little tweak to prevent register forwarding when using STACK WIN that I just absentmindedly added while reviewing the documentation

Gankra force-pushed the stack-win branch from a8d94ac to 1ddbdc1 Compare September 29, 2021 22:45

Gankra changed the title ~~WIP STACK WIN unwinding~~ STACK WIN unwinding Sep 30, 2021

add hack for alignment operator

9e60c90

Gankra merged commit ec75e45 into rust-minidump:master Sep 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

STACK WIN unwinding #240

STACK WIN unwinding #240

Uh oh!

Gankra commented Sep 23, 2021 •

edited

Loading

Uh oh!

Gankra commented Sep 23, 2021

Uh oh!

Gankra commented Sep 23, 2021

Uh oh!

luser commented Sep 23, 2021

Uh oh!

luser commented Sep 23, 2021

Uh oh!

Gankra commented Sep 23, 2021

Uh oh!

Gankra commented Sep 30, 2021

Uh oh!

gabrielesvelto commented Sep 30, 2021

Uh oh!

Gankra commented Sep 30, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

STACK WIN unwinding #240

STACK WIN unwinding #240

Uh oh!

Conversation

Gankra commented Sep 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Gankra commented Sep 23, 2021

Uh oh!

Gankra commented Sep 23, 2021

Uh oh!

luser commented Sep 23, 2021

Uh oh!

luser commented Sep 23, 2021

Uh oh!

Gankra commented Sep 23, 2021

Uh oh!

Gankra commented Sep 30, 2021

Uh oh!

gabrielesvelto commented Sep 30, 2021

Uh oh!

Gankra commented Sep 30, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Gankra commented Sep 23, 2021 •

edited

Loading