Skip to content

Conversation

@aheejin
Copy link
Member

@aheejin aheejin commented Nov 26, 2025

This adds support for names field in source maps, which contains function names. Source map mappings are correspondingly updated and emsymbolizer now can provide function name information only with source maps.

While source maps don't provide the full inlined hierarchies, this provides the name of the original (= pre-inlining) function, which may not exist in the final binary because they were inlined. This is because source maps are primarily intended for user debugging.

This also demangles C++ function names using llvm-cxxfilt, so the printed names can be human-readable.

I tested with wasm-opt.wasm from Binaryen by if (EMSCRIPTEN) setup here:
https://github.com/WebAssembly/binaryen/blob/95b2cf0a4ab2386f099568c5c61a02163770af32/CMakeLists.txt#L311-L372 with -g -gsource-map. With this PR and WebAssembly/binaryen#8068, the source map file size increases by 3.5x (8632423 -> 30070042) primarily due to the function name strings.

From llvm-dwarfdump output, this also requires additional parsing of DW_TAG_subprogram and DW_TAG_inlined_subroutine tags which can be at any depths (because functions can be within nested namespaces or classes), so we cannot use --recurse-depth=0 (#9580) anymore. In case of wasm-opt.wasm built with DWARF info, without --recurse-depth=0 in the command line, the size of its text output increased by 27.5x, but with the --filter-child-tag / -t option (llvm/llvm-project#165720), the text output increased only (?) by 3.2x, which I think is tolerable. This disables names field generation when -t option is not available in llvm-dwarfdump because it was added recently. To avoid this text size problem, we can consider using DWARF-parsing Python libraries like https://github.com/eliben/pyelftools, but this will make another third party dependency, so I'm not sure if it's worth it at this point.

This also increased running time of wasm-sourcemap.py, in case of the wasm-opt.wasm, by 2.3x (6.6s -> 15.4s), but compared to the linking time this was not very noticeable.

Fixes #20715 and closes #25116.

This adds support for `names` field in source maps, which contains
function names. Source map mappings are correspondingly updated and
emsymbolizer now can provide function name information only with source
maps.

While source maps don't provide the full inlined hierarchies, this
provides the name of the original (= pre-inlining) function, which may
not exist in the final binary because they were inlined. This is because
source maps are primarily intended for user debugging.

This also demangles C++ function names using `llvm-cxxfilt`, so the
printed names can be human-readable.

I tested with `wasm-opt.wasm` from Binaryen by `if (EMSCRIPTEN)` setup
here:
https://github.com/WebAssembly/binaryen/blob/95b2cf0a4ab2386f099568c5c61a02163770af32/CMakeLists.txt#L311-L372
with `-g -gsource-map`. With this PR and
WebAssembly/binaryen#8068, the source map file
size increases by 3.5x (8632423 -> 30070042) primarily due to the
function name strings.

From `llvm-dwarfdump` output, this also requires additional parsing of
`DW_TAG_subprogram` and `DW_TAG_inlined_subroutine` tags which can be at
any depths (because functions can be within nested namespaces or
classes), so we cannot use `--recurse-depth=0` (emscripten-core#9580) anymore. In case
of `wasm-opt.wasm` built with DWARF info, without `--recurse-depth=0` in
the command line, the size of its text output increased by 27.5x, but
with the `--filter-child-tag` / `-t` option
(llvm/llvm-project#165720), the text output
increased only (?) by 3.2x, which I think is tolerable. This disables
`names` field generation when `-t` option is not available in
`llvm-dwarfdump` because it was added recently. To avoid this text size
problem, we can consider using DWARF-parsing Python libraries like
https://github.com/eliben/pyelftools, but this will make another third
party dependency, so I'm not sure if it's worth it at this point.

This also increased running time of `wasm-sourcemap.py`, in case of the
`wasm-opt.wasm`, by 2.3x (6.6s -> 15.4s), but compared to the linking
time this was not very noticeable.

Fixes emscripten-core#20715 and closes emscripten-core#25116.
col += data[3]
info.append(col)
# TODO: see if we need the name, which is the next field (data[4])
if len(data) == 5:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be >= 5 like above on line 173?

try:
input_str = '\n'.join(mangled_names)
process = Popen([LLVM_CXXFILT], stdin=PIPE, stdout=PIPE, stderr=PIPE, text=True)
stdout, stderr = process.communicate(input=input_str)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use subprcoess.run here instead of Popen + communicate.

In fact I would just use shared.check_call which is our wrapper around subprocess.run.

return dict(zip(mangled_names, demangled_list))
except OSError:
logger.warning('Failed to run llvm-cxxfilt')
return {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can skip the whole try/except here if you use shared.check_call

_, err = process.communicate()
return 'requires a value' in err
except OSError:
return False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, use check_call to avoid direct usage of subprocess.

@sbc100
Copy link
Collaborator

sbc100 commented Nov 26, 2025

Hmm.. looks like the Popen usage is pre-existing. I can upload a PR to fix that.

Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the large change in source map size and link time, perhaps this is worth mentioning in the changelog?

from tools import shared, utils
from tools.system_libs import DETERMINISTIC_PREFIX

LLVM_CXXFILT = shared.llvm_tool_path('llvm-cxxfilt')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if we ship this with the emsdk?

I'm also surprised we need this now, but didn't before. Is it that wasm-ld demangles the Names section for us, but now we are parsing DWARF, which is not demangled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if we ship this with the emsdk?

Yes it looks we do. I installed / activated latest in emsdk and llvm-cxxfilt was installed.

I'm also surprised we need this now, but didn't before. Is it that wasm-ld demangles the Names section for us,

Yes wasm-ld demangles it for us: https://github.com/llvm/llvm-project/blob/b3428bb966f1de8aa48375ffee0eba04ede133b7/lld/wasm/SyntheticSections.cpp#L825

but now we are parsing DWARF, which is not demangled?

Yes.

# Recently --filter-child-tag / -t option was added to llvm-dwarfdump prune
# tags. Because it is a recent addition, check if it exists in the user's
# llvm-dwarfdump. If not, print only the top-level DW_TAG_compile_units for
# source location info and don't generate 'names' field.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we assume this is present, since we control the LLVM version we use? That is, once this rolled in from LLVM, won't it be in all future emsdk builds?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I made this fallback routine in case someone uses their local LLVM.. But maybe this is too much hassle for that niche case (which we don't need to support to begin with). I'll remove the fallbacks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does llvm-dwarfdump not normally ship with LLVM packages? I think maybe we assume it does and we can always add a fallback later if folks say otherwise.

On my system it seems to be part of the llvm package:

$ dpkg -S llvm-dwarfdump-19 
llvm-19: /usr/bin/llvm-dwarfdump-19
llvm-19: /usr/share/man/man1/llvm-dwarfdump-19.1.gz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Do source maps represent original location or current location? Should we add function names to source maps generated by emscripten?

3 participants