Skip to content

Conversation

@jan-auer
Copy link
Member

@jan-auer jan-auer commented Aug 13, 2018

This PR introduces CFI into the native processing pipeline. It builds on previous PRs that have refactored debug symbol handling (#10052) and introduced CFI caching (#10161).

Frame Interface

Stack frames now have an optional trust field. The value of this field is unstructured and depends on the event platform. For minidump events, this field holds the values of the FrameTrust enum, as exported by symbolic: cfi, cfi-scan, context, fp, none, prewalked, scan.

The trust level can be used to infer whether stack traces need CFI (none and scan) or whether stack traces have been walked with CFI (cfi and cfi-scan).

Plugin Interface

The minidump needs to be stackwalked with CFI before the stacktrace processors run during event processing. The current order is:

  • plugin.get_stacktrace_processors()
  • plugin.get_event_preprocessors() (which are, really, event processors)

This required to introduce a new hook -- for now internal -- with the same interface as event processors. It is called: plugin.get_event_enhancers. The purpose of an enhancer is to add useful information before processing actually starts, that is subsequently available to all processors and also the stacktrace processing step. Their interface is identical to event processors:

def get_event_enhancers(self, data, **kwargs): 
    return [lambda x: x]

Conditions

CFI is only applied if all of the following conditions are met (checked in that order):

  1. The event has mechanism.type: "minidump" set
  2. There is at least one thread with scanned frames (or no frame trust)
  3. One or more of those threads cannot be retrieved from the cache
  4. At least one CFI cache for a loaded module can be loaded from uploaded debug files

Caching

We cache entire stack traces. The cache key is computed by hashing the following value for each frame:

  • If a code module is present: (module.debug_id, frame.relative_address)
  • If no code module is present: None. (Only the absolute instruction address is known, and that changes potentially with every execution)

Once a stacktrace has been reprocessed with CFI and the stack trace contains at least one frame with CFI trust, the following values are cached for each frame:

  • If a code module is present: (module.debug_id, frame.relative_address, frame.trust)
  • If no code module is present: (None, frame.absolute_address, frame.trust)

When loading these traces from the cache, the relative addresses are again rebased to the new image addresses from the event to form a new stack trace. If a module is not found in the new event, or only an absolute address was stored, the address is used directly without rebasing.

If the reprocessed stack trace does not contain CFI frames, indicating that no CFI could be applied during reprocessing, a marker "__no_cfi__" is stored in the cache instead of the above frames. In that case, the original raw stack trace is not touched. This speeds up restoration when loading from the cache and avoids the hack with absolute addresses.

Matching Threads

If a minidump needs to be reprocessed, it is loaded along with all CFI caches and processed again with symbolic. Instead of transforming the entire process state into a payload, only the threads are transformed. They are matched with the original event payload via their thread_id. It is assumed that in minidumps every thread has a unique ID.

Requires getsentry/symbolic#98
Fixes #8193

@jan-auer jan-auer self-assigned this Aug 13, 2018
@jan-auer jan-auer force-pushed the feat/minidump-cfi branch 2 times, most recently from 79e1afb to fced113 Compare October 16, 2018 13:36
@jan-auer jan-auer force-pushed the feat/minidump-cfi branch 3 times, most recently from d865a16 to b550e0b Compare October 25, 2018 17:55
@jan-auer jan-auer requested a review from mitsuhiko October 25, 2018 17:59
@jan-auer
Copy link
Member Author

@mitsuhiko This is ready for review now.

@mitsuhiko
Copy link
Contributor

lgtm but it comes with a disclaimer: this will likely generate new groups for some customers. Since the current issues are not grouping well anyways I do not expect this to be an issue but I would propose we watch the graphs as the deploy goes out to see what happens.

@jan-auer jan-auer merged commit 529c2d3 into master Oct 30, 2018
@jan-auer jan-auer deleted the feat/minidump-cfi branch October 30, 2018 19:58
@github-actions github-actions bot locked and limited conversation to collaborators Dec 21, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Minidump stack walking does not use CFI data from debug symbols

3 participants