Skip to content

Allow stack symbolization of fully-optimized binaries with DWARF #25239

@dschuff

Description

@dschuff

Forked out from the discussion in #20462

This is for the same use case, of symbolizing fully-optimized production binaries, but using DWARF instead of source maps. The usual workflow would be to generate a binary with some amount of DWARF info, with full optimization, and then strip the dwarf out of the binary, ship the stripped binary, and archive the DWARF binary. Then later use the DWARF binary to symbolized stack traces generated from the stripped binary.
Compared to using source maps to symbolize stack traces, this has the advantage that DWARF contains inlining information, meaning that the stack traces can be more complete and correct (because one call frame in a stack trace can actually represent more than one frame in the source).

This can be done today, but the problem is that using DWARF currently suppresses some Binaryen optimizations. This is because Binaryen's understanding of DWARF is limited. While it understands source location information and keeps track of it in its IR, the same is not true for the other semantic information you'd need to keep DWARF really correct as it makes significant transformations (e.g. Binaryen's own inlining, local variable coalescing, etc). Binaryen does a sort of "best effort" where it can track the original and final binary locations of each instruction and then go back and try to update the pointers to those binary regions in the DWARF with the new addresses, but it isn't able to actually rewrite or restructure the debug info, or create new records. So I don't think the DWARF is all that great, even with the limited optimizations (although I don't think we really have great data on just how good it is). I think that would be a really significant project to do properly, because we'd basically need a full semantic model of the debug info in the IR (e.g. with source variables etc).

So if we can't do that in a reasonable amount of time, could we do anything else? Maybe Binaryen could generate more-correct DWARF line tables and inlining information, so that the output could have a fairly complete and correct, but minimal set of information? Binaryen would need to model inlining information, and parse the DWARF inlining
I think Binaryen actually generates the DWARF line table from scratch already, and probably we could just re-use the existing type (and maybe function/subprogram?) information without regenerating it. So maybe Binaryen would just need to generate new inlining records (DW_TAG_inlined_subroutine) and that would be enough.
Even better would be if we could also limit clang's debug information (to omit things like type and variable information completely), which would speed up linking and optimization (or alternatively omit them during rewriting).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions