-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Support names field in source maps #25870
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This adds support for `names` field in source maps, which contains function names. Source map mappings are correspondingly updated and emsymbolizer now can provide function name information only with source maps. While source maps don't provide the full inlined hierarchies, this provides the name of the original (= pre-inlining) function, which may not exist in the final binary because they were inlined. This is because source maps are primarily intended for user debugging. This also demangles C++ function names using `llvm-cxxfilt`, so the printed names can be human-readable. I tested with `wasm-opt.wasm` from Binaryen by `if (EMSCRIPTEN)` setup here: https://github.com/WebAssembly/binaryen/blob/95b2cf0a4ab2386f099568c5c61a02163770af32/CMakeLists.txt#L311-L372 with `-g -gsource-map`. With this PR and WebAssembly/binaryen#8068, the source map file size increases by 3.5x (8632423 -> 30070042) primarily due to the function name strings. From `llvm-dwarfdump` output, this also requires additional parsing of `DW_TAG_subprogram` and `DW_TAG_inlined_subroutine` tags which can be at any depths (because functions can be within nested namespaces or classes), so we cannot use `--recurse-depth=0` (emscripten-core#9580) anymore. In case of `wasm-opt.wasm` built with DWARF info, without `--recurse-depth=0` in the command line, the size of its text output increased by 27.5x, but with the `--filter-child-tag` / `-t` option (llvm/llvm-project#165720), the text output increased only (?) by 3.2x, which I think is tolerable. This disables `names` field generation when `-t` option is not available in `llvm-dwarfdump` because it was added recently. To avoid this text size problem, we can consider using DWARF-parsing Python libraries like https://github.com/eliben/pyelftools, but this will make another third party dependency, so I'm not sure if it's worth it at this point. This also increased running time of `wasm-sourcemap.py`, in case of the `wasm-opt.wasm`, by 2.3x (6.6s -> 15.4s), but compared to the linking time this was not very noticeable. Fixes emscripten-core#20715 and closes emscripten-core#25116.
| col += data[3] | ||
| info.append(col) | ||
| # TODO: see if we need the name, which is the next field (data[4]) | ||
| if len(data) == 5: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be >= 5 like above on line 173?
| try: | ||
| input_str = '\n'.join(mangled_names) | ||
| process = Popen([LLVM_CXXFILT], stdin=PIPE, stdout=PIPE, stderr=PIPE, text=True) | ||
| stdout, stderr = process.communicate(input=input_str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use subprcoess.run here instead of Popen + communicate.
In fact I would just use shared.check_call which is our wrapper around subprocess.run.
| return dict(zip(mangled_names, demangled_list)) | ||
| except OSError: | ||
| logger.warning('Failed to run llvm-cxxfilt') | ||
| return {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can skip the whole try/except here if you use shared.check_call
| _, err = process.communicate() | ||
| return 'requires a value' in err | ||
| except OSError: | ||
| return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, use check_call to avoid direct usage of subprocess.
|
Hmm.. looks like the Popen usage is pre-existing. I can upload a PR to fix that. |
kripken
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the large change in source map size and link time, perhaps this is worth mentioning in the changelog?
| from tools import shared, utils | ||
| from tools.system_libs import DETERMINISTIC_PREFIX | ||
|
|
||
| LLVM_CXXFILT = shared.llvm_tool_path('llvm-cxxfilt') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if we ship this with the emsdk?
I'm also surprised we need this now, but didn't before. Is it that wasm-ld demangles the Names section for us, but now we are parsing DWARF, which is not demangled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if we ship this with the emsdk?
Yes it looks we do. I installed / activated latest in emsdk and llvm-cxxfilt was installed.
I'm also surprised we need this now, but didn't before. Is it that wasm-ld demangles the Names section for us,
Yes wasm-ld demangles it for us: https://github.com/llvm/llvm-project/blob/b3428bb966f1de8aa48375ffee0eba04ede133b7/lld/wasm/SyntheticSections.cpp#L825
but now we are parsing DWARF, which is not demangled?
Yes.
| # Recently --filter-child-tag / -t option was added to llvm-dwarfdump prune | ||
| # tags. Because it is a recent addition, check if it exists in the user's | ||
| # llvm-dwarfdump. If not, print only the top-level DW_TAG_compile_units for | ||
| # source location info and don't generate 'names' field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we assume this is present, since we control the LLVM version we use? That is, once this rolled in from LLVM, won't it be in all future emsdk builds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I made this fallback routine in case someone uses their local LLVM.. But maybe this is too much hassle for that niche case (which we don't need to support to begin with). I'll remove the fallbacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does llvm-dwarfdump not normally ship with LLVM packages? I think maybe we assume it does and we can always add a fallback later if folks say otherwise.
On my system it seems to be part of the llvm package:
$ dpkg -S llvm-dwarfdump-19
llvm-19: /usr/bin/llvm-dwarfdump-19
llvm-19: /usr/share/man/man1/llvm-dwarfdump-19.1.gz
This adds support for
namesfield in source maps, which contains function names. Source map mappings are correspondingly updated and emsymbolizer now can provide function name information only with source maps.While source maps don't provide the full inlined hierarchies, this provides the name of the original (= pre-inlining) function, which may not exist in the final binary because they were inlined. This is because source maps are primarily intended for user debugging.
This also demangles C++ function names using
llvm-cxxfilt, so the printed names can be human-readable.I tested with
wasm-opt.wasmfrom Binaryen byif (EMSCRIPTEN)setup here:https://github.com/WebAssembly/binaryen/blob/95b2cf0a4ab2386f099568c5c61a02163770af32/CMakeLists.txt#L311-L372 with
-g -gsource-map. With this PR and WebAssembly/binaryen#8068, the source map file size increases by 3.5x (8632423 -> 30070042) primarily due to the function name strings.From
llvm-dwarfdumpoutput, this also requires additional parsing ofDW_TAG_subprogramandDW_TAG_inlined_subroutinetags which can be at any depths (because functions can be within nested namespaces or classes), so we cannot use--recurse-depth=0(#9580) anymore. In case ofwasm-opt.wasmbuilt with DWARF info, without--recurse-depth=0in the command line, the size of its text output increased by 27.5x, but with the--filter-child-tag/-toption (llvm/llvm-project#165720), the text output increased only (?) by 3.2x, which I think is tolerable. This disablesnamesfield generation when-toption is not available inllvm-dwarfdumpbecause it was added recently. To avoid this text size problem, we can consider using DWARF-parsing Python libraries like https://github.com/eliben/pyelftools, but this will make another third party dependency, so I'm not sure if it's worth it at this point.This also increased running time of
wasm-sourcemap.py, in case of thewasm-opt.wasm, by 2.3x (6.6s -> 15.4s), but compared to the linking time this was not very noticeable.Fixes #20715 and closes #25116.