Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for RFC 2603, "Rust Symbol Mangling (v0)" #60705

Open
14 of 16 tasks
Centril opened this issue May 10, 2019 · 99 comments · May be fixed by #89917
Open
14 of 16 tasks

Tracking issue for RFC 2603, "Rust Symbol Mangling (v0)" #60705

Centril opened this issue May 10, 2019 · 99 comments · May be fixed by #89917
Labels
B-RFC-implemented Blocker: Approved by a merged RFC and implemented. B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. S-tracking-needs-to-bake Status: The implementation is "complete" but it needs time to bake. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@Centril
Copy link
Contributor

Centril commented May 10, 2019

This is a tracking issue for the RFC "Rust Symbol Mangling (v0)" (rust-lang/rfcs#2603).

Current status:

Since #90128, you can control the mangling scheme with -C symbol-mangling-version, which can be:

  • legacy: the older mangling version, still the default currently
    • explicitly specifying this is unstable-only and also requires -Z unstable-options
      (to allow for eventual removal after v0 becomes the default)
  • v0: the new RFC mangling version, as implemented by Introduce Rust symbol mangling scheme. #57967

(Before #90128, this flag was the nightly-only -Z symbol-mangling-version)

To test the new mangling, set RUSTFLAGS=-Csymbol-mangling-version=v0 (or change rustflags in .cargo/config.toml). Please note that only symbols from crates built with that flag will use the new mangling, and that tool support (e.g. debuggers) will be limited initially, until everything is upstreamed. However, RUST_BACKTRACE and rustfilt should work out of the box with either mangling version.

Steps:

Unresolved questions:

Desired availability of tooling:

Linux:

  • Tools: binutils, gdb, lldb, perf, valgrind
Distro Has versions of all tools with support?
Debian (latest stable) ?
Arch ?
Ubuntu (latest release) ?
Ubuntu (latest LTS) ?
Fedora (latest release) ?
Alpine (latest release) ?

Windows:

Windows does not have support for demangling either legacy or v0 Rust symbols and requires debuginfo to load the appropriate function name. As such, no special support is required.

macOS:

More investigation is needed to determine to what extent macOS system tools already support Rust v0 mangling.

@Centril Centril added B-RFC-approved Blocker: Approved by a merged RFC but not yet implemented. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. labels May 10, 2019
@eddyb

This comment has been minimized.

@michaelwoerister

This comment has been minimized.

@eddyb

This comment has been minimized.

@eddyb eddyb mentioned this issue Jun 3, 2019
6 tasks
@jonas-schievink jonas-schievink added B-RFC-implemented Blocker: Approved by a merged RFC and implemented. and removed B-RFC-approved Blocker: Approved by a merged RFC but not yet implemented. labels Jun 9, 2019
@jonas-schievink jonas-schievink added the B-unstable Blocker: Implemented in the nightly compiler and unstable. label Nov 26, 2019
@robinmoussu
Copy link

I just opened a bug in compiler-explorer because I noticed collision in the demangled name of different monomorphisation of the same function. As far as I understand, the current mangling scheme of rust still use the v1 witch doesn't contains the required information, while this proposed v2 would solved it? Did I correctly understood what the current situation is?

@eddyb
Copy link
Member

eddyb commented Jan 16, 2020

@robinmoussu As per #57967, you can set RUSTFLAGS=-Zsymbol-mangling-version=v0 on nightly.
EDIT: maybe we should update this tracking issue with that information?

@crlf0710
Copy link
Member

crlf0710 commented Jan 19, 2020

@eddyb @michaelwoerister
I wonder if it's possible/feasible to put some "mark" into the mangled symbol name to indicate that this stack frame is preferred not to display in backtraces, #68336.

@eddyb
Copy link
Member

eddyb commented Jan 19, 2020

Could just rely on a _ prefix or something similar, but either way I would prefer not to add something that marginally related, to the symbol mangling syntax.

@eddyb
Copy link
Member

eddyb commented Mar 13, 2020

Updating the RFC's "v2" to "v0" as per #57967 (comment) (should've done it back then).

@eddyb eddyb changed the title Tracking issue for RFC 2603, "Symbol Mangling v2" Tracking issue for RFC 2603, "Rust Symbol Mangling (v0)" Mar 13, 2020
@eddyb
Copy link
Member

eddyb commented Mar 13, 2020

I've just submitted "[PATCH] Support the new ("v0") mangling scheme in rust-demangle.", to the gcc-patches mailing list (yes, the demangler used by binutils and gdb is in libiberty, the main copy of which is in GCC - would've been easier to contribute to it if it were separate, oh well).

If everything goes well, once that's merged we'll be able to also upstream the same implementation to lldb, Linux perf, valgrind, etc. - that's been the main blocker for rolling out the v0 mangling.

@eddyb
Copy link
Member

eddyb commented Mar 14, 2020

Potential change we might want to make to const value mangling: #61486 (comment).

The tricky aspect is whether we change the way _: usize is encoded today, from jp to p, or we leave it as-is so that demanglers can keep working unchanged.

EDIT: AFAIK, that situation can only be reached with #![feature(const_generics)], so we should be able to defer it to whenever we get around to #61486.

EDIT2: just checked and my 1 million symbols dataset doesn't contain a single instance of jp for an array length, so I think we can remove support for it.

@eddyb
Copy link
Member

eddyb commented May 28, 2020

Quick update: I haven't received a response on the final patch in 2.5 months.

This means I can probably resubmit with the placeholder constant syntax removed (see my previous comment, i.e. #60705 (comment)), and/or maybe we can even implement some form of ADT mangling in the interim, since it doesn't seem to be that difficult (see #71232 for something similar).

But this is disappointing and another delay in getting the new mangling scheme to be supported in native tooling, which would've ideally been all dealt with last year.

@mark-i-m

This comment has been minimized.

@eddyb

This comment has been minimized.

@eddyb
Copy link
Member

eddyb commented Jul 28, 2022

@eddyb I have written a demangler that can demangle the latest v0 syntax symbol into a structured AST: https://github.com/EFanZh/ast-demangle/.

Oh, my bad @EFanZh, now that I'm looking at it, I'm pretty sure I've seen it before and just forgot :(

@pnkfelix
Copy link
Member

Discussed in T-compiler backlog bonanza

The v0 symbol mangling has been implemented. From #89917 we have considered making v0 the default, but we have held off on doing so in order to give external tools time to add support. In PR #90054 we did make v0 the default for builds of rustc itself (but not object code generated by rustc on other programs).

We need to figure out what criteria we will use in this and other cases to decide that "it is time" to switch the defaults.

(We also considered opening a separate tracking issue for the question of "when to switch the default", but at this point I think we would only open such a tracking issue if we were ready to close this one, #60705, itself.

@rustbot label: S-tracking-needs-to-bake

@bstrie
Copy link
Contributor

bstrie commented Dec 20, 2022

we have held off on doing so in order to give external tools time to add support

We need to figure out what criteria we will use in this and other cases to decide that "it is time" to switch the defaults.

The first thing to do would be to produce a list of tools that people want to support. For each tool, we should determine whether it supports v0, and, if so, the date of the first public release that features v0 support. Once each tool supports v0, and once each has supported v0 for long enough (precise criteria TBD), then stabilization should be unblocked.

Obviously this list cannot guarantee that it will exhaustively mention every tool ever made, but the only alternative would be to never stabilize v0 for fear of overlooking some tool. In the meantime, we can use a blog post to put out a general call to tool developers to ask them to ensure that v0 works with their tools.

@jyn514
Copy link
Member

jyn514 commented Mar 25, 2023

The first thing to do would be to produce a list of tools that people want to support. For each tool, we should determine whether it supports v0, and, if so, the date of the first public release that features v0 support. Once each tool supports v0, and once each has supported v0 for long enough (precise criteria TBD), then stabilization should be unblocked.

Obviously this list cannot guarantee that it will exhaustively mention every tool ever made, but the only alternative would be to never stabilize v0 for fear of overlooking some tool. In the meantime, we can use a blog post to put out a general call to tool developers to ask them to ensure that v0 works with their tools.

Nominating to hopefully act as a forcing function to create this list.

@jyn514 jyn514 added the I-compiler-nominated Nominated for discussion during a compiler team meeting. label Mar 25, 2023
@nnethercote
Copy link
Contributor

One problem with v0 mangling that hasn't been identified: it completely breaks the cargo llvm-lines tool. Here is example output with legacy mangling:

  Lines                 Copies              Function name
  -----                 ------              -------------
  134295                3225                (TOTAL)
    6102 (4.5%,  4.5%)    18 (0.6%,  0.6%)  alloc::raw_vec::RawVec<T,A>::grow_amortized
    2641 (2.0%,  6.5%)    64 (2.0%,  2.5%)  core::option::Option<T>::map
    2329 (1.7%,  8.2%)    17 (0.5%,  3.1%)  <core::slice::iter::Iter<T> as core::iter::traits::iterator::Iterator>::next
    1716 (1.3%,  9.5%)    11 (0.3%,  3.4%)  alloc::raw_vec::RawVec<T,A>::allocate_in
    1694 (1.3%, 10.8%)    15 (0.5%,  3.9%)  alloc::alloc::box_free
    1476 (1.1%, 11.9%)    18 (0.6%,  4.4%)  alloc::raw_vec::RawVec<T,A>::current_memory
    1461 (1.1%, 13.0%)     3 (0.1%,  4.5%)  hashbrown::raw::RawTable<T,A>::reserve_rehash
    1456 (1.1%, 14.1%)    16 (0.5%,  5.0%)  core::slice::iter::Iter<T>::new
    1249 (0.9%, 15.0%)     8 (0.2%,  5.3%)  <T as alloc::slice::hack::ConvertVec>::to_vec
    1065 (0.8%, 15.8%)     5 (0.2%,  5.4%)  aho_corasick::automaton::Automaton::leftmost_find_at_no_state_imp

And with v0 mangling:

  Lines                 Copies              Function name
  -----                 ------              -------------
  134295                3225                (TOTAL)
     960 (0.7%,  0.7%)     1 (0.0%,  0.0%)  <regex[455e3194582446bb]::prog::Program as core[d1a89b04220dd38d]::fmt::Debug>::fmt
     722 (0.5%,  1.3%)     1 (0.0%,  0.1%)  <regex[455e3194582446bb]::exec::ExecBuilder>::build
     544 (0.4%,  1.7%)     1 (0.0%,  0.1%)  <regex[455e3194582446bb]::dfa::Fsm>::exec_at
     497 (0.4%,  2.0%)     1 (0.0%,  0.1%)  <regex[455e3194582446bb]::compile::Compiler>::compile_many
     494 (0.4%,  2.4%)     1 (0.0%,  0.2%)  <aho_corasick[afd2d59d996825a5]::nfa::NFA<u32> as core[d1a89b04220dd38d]::fmt::Debug>::fmt
     487 (0.4%,  2.8%)     1 (0.0%,  0.2%)  <hashbrown[18cdbe82094945b3]::raw::RawTable<(&usize, &alloc[c687d6376d1d0c58]::string::String)>>::reserve_rehash::<hashbrown[18cdbe82094945b3]::map::make_hasher<&usize, &usize, &alloc[c687d6376d1d0c58]::string::String, std[e45faeee946555a1]::collections::hash::map::RandomState>::{closure#0}>
     487 (0.4%,  3.1%)     1 (0.0%,  0.2%)  <hashbrown[18cdbe82094945b3]::raw::RawTable<(alloc[c687d6376d1d0c58]::string::String, usize)>>::reserve_rehash::<hashbrown[18cdbe82094945b3]::map::make_hasher<alloc[c687d6376d1d0c58]::string::String, alloc[c687d6376d1d0c58]::string::String, usize, std[e45faeee946555a1]::collections::hash::map::RandomState>::{closure#0}>
     487 (0.4%,  3.5%)     1 (0.0%,  0.2%)  <hashbrown[18cdbe82094945b3]::raw::RawTable<(regex[455e3194582446bb]::dfa::State, u32)>>::reserve_rehash::<hashbrown[18cdbe82094945b3]::map::make_hasher<regex[455e3194582446bb]::dfa::State, regex[455e3194582446bb]::dfa::State, u32, std[e45faeee946555a1]::collections::hash::map::RandomState>::{closure#0}>
     456 (0.3%,  3.8%)     1 (0.0%,  0.3%)  <regex[455e3194582446bb]::compile::Compiler>::c_alternate
     433 (0.3%,  4.1%)     1 (0.0%,  0.3%)  <alloc[c687d6376d1d0c58]::alloc::Global as core[d1a89b04220dd38d]::alloc::Allocator>::shrink

Note the difference in the copies column. cargo llvm-lines entirely depends on the type-imprecison of legacy mangling. We go from having N different functions with the same name being combined, to every function being separate. E.g. with legacy mangling all the grow_amortized instances end up in the same bucket, while with v0 mangling they look like this:

     339 (0.3%,  6.8%)     1 (0.0%,  0.6%)  <alloc[c687d6376d1d0c58]::raw_vec::RawVec<(char, char)>>::grow_amortized
     339 (0.3%,  7.0%)     1 (0.0%,  0.6%)  <alloc[c687d6376d1d0c58]::raw_vec::RawVec<(u8, u32)>>::grow_amortized
     339 (0.3%,  7.3%)     1 (0.0%,  0.7%)  <alloc[c687d6376d1d0c58]::raw_vec::RawVec<(usize, usize)>>::grow_amortized

This is probably a case where cargo llvm-lines needs to change, rather than v0 mangling, but I thought it worth mentioning.

cc @dtolnay

@sanmai-NL
Copy link

Is or will there be an official Rust name mangling library (functionality), rather than demangling? Sometimes, one needs to mangle Rust item paths to look into binaries, e.g. like perf does. I hope there will be a reference implementation of specification.

@bjorn3
Copy link
Member

bjorn3 commented Mar 31, 2023

Why would perf need to mangle names? There is no way to exactly reproduce symbol names outside of rustc itself given that they contain a crate disambiguator whose value depends on the -Cmetadata arguments passed when compiling the crate that defined the mentioned function/type (which for the standard library is unknown) as well as the exact rustc version used. Even two consecutive nightly releases will produce different symbol names.

@sanmai-NL
Copy link

Please re-read my sentence @bjorn3. I'm not claiming perf mangles names.

@sanmai-NL
Copy link

@bjorn3 Thanks for your explanation. I hope you didn't assume every commenter should know these details. I think the question is legitimate. It was asked before in the context of GCC C++. The information required I have available, but that's not important now. Even if just the algorithm were to be specified like in the GCC case, perhaps enough of the translation symbol to mangled symbol can be reconstructed to find the specific symbol in a binary for a given item path. That's my use case but I don't assume this would be the only solution of the only use case for a mangling spec or reference implementation.

@bjorn3
Copy link
Member

bjorn3 commented Mar 31, 2023

perhaps enough of the translation symbol to mangled symbol can be reconstructed to find the specific symbol in a binary for a given item path. That's my use case but I don't assume this would be the only solution of the only use case for a mangling spec or reference implementation.

It should be possible to have something like an api where you specify in the input a wildcard for the crate disambiguator and then the name mangling library would output a wildcard where it would otherwise print the crate disambiguator. Would this work for your use case?

@CAD97
Copy link
Contributor

CAD97 commented Mar 31, 2023

I hope there will be a reference implementation of specification.

A reference implementation does already exist, essentially, as part of rustc. That it's not a reusable library just reflects that the goal of a known mangling scheme is the ability for 3rd party non-rustc tooling to be able to turn mangled symbols back into the demangled human-meaningful form. Being able to mangle symbols is explicitly a non-goal.

Sometimes, one needs to mangle Rust item paths to look into binaries, e.g. like perf does.

For binary introspection, demangling is sufficient. Given an unmangled name, to find the corresponding mangled names1, you don't mangle the unmangled name to compare to the mangled symbols; instead, you demangle the symbols from the binary to compare to the unmangled symbol. Most of the time you'll want the full list of demangled symbols anyway, e.g. for display or otherwise.

If you want fully predictable names (e.g. for linking manually ABI-stable interfaces), you should be specifying them explicitly. It would be interesting to be able to request v0 mangling (without the use of disambiguators) rather than having to manually apply a mangling scheme, but that's a completely separate feature request than the use for Rust-only names tracked here.

Footnotes

  1. Names, plural; multiple crates with the same name will have symbols which collide when unmangled and are disambiguated with the crate disambiguator.

@sanmai-NL
Copy link

sanmai-NL commented Mar 31, 2023

@CAD97

When you need to step back to the same binary you demangled symbols of, and determine to what mangled symbol a demangled name refers, then you may want this functionality. Please also consider that binaries you have and even a build pipeline including source code, does not mean you are free to modify the source code to achieve predicable symbol names or whatever.

By the way, it's bit of a semantic discussion what demangling entails, in response to my functional requirement at least. Third party tooling like perf may only demangle in a strict sense, but could be considered to mangle a given name that exists as a mangled symbol in the binary:

perf \
  probe \
    --exec $(realpath "mycrate/target/debug/deps/binary-cfcd9bd03ac152c2") \
    --add="uprobe123=mycrate\:\:tests\:\:test_1"

perf demangles the symbols and then matches with the unmangled name specified as --add argument. So if perf or such were to keep a mapping between the two and report that back, that would work for my particular use case as well. This procedure may not amount to demangling in the general, but it would cover some use cases without Rust people having to work on it.

@sanmai-NL
Copy link

It should be possible to have something like an api where you specify in the input a wildcard for the crate disambiguator and then the name mangling library would output a wildcard where it would otherwise print the crate disambiguator. Would this work for your use case?

Yes, sure. And perhaps there are other, forensic cases and such. Please note, I'm not an expert or involved in this mangling work here or elsewhere, just chiming in as a user with a practical use case that I think will be relevant to a subgroup of real-world developers (not detailing it since it's part of a paper to be published).

@benpye
Copy link

benpye commented Jul 23, 2023

Now that RFC rust-lang/rfcs#3224 has been merged to resolve the suffix question, are there any implementations that still need updating in order to handle . and $ suffixes? If not, shall we revive #89917 for making v0 the default?

As far as I can tell gdb does not support suffixes using $ rather than .. It's also somewhat unfortunate that GDB strips the suffix, rather than including it in the demangled string - but that's not unbearable.

osiewicz added a commit to zed-industries/zed that referenced this issue Sep 18, 2023
rust-lang/rust#60705
Due to modification of .cargo/config.toml your `cargo build` should pick
this change up automatically. Use `legacy` instead of `v0` if you find
yourself in need of old mangling scheme for whatever reason
Release Notes:

- Improved precision of backtraces in application crashes
@bstrie
Copy link
Contributor

bstrie commented Nov 23, 2023

@benpye Given that the only documented use of $ suffixes in the wild is for thread-local data on Mach-O, I don't think it's a showstopper for shipping this as the default.

I wonder if the compiler team would like to use the upcoming 2024 edition as an excuse to finally ship v0 mangling? This would let us roll it out gradually and in a way that can be easily rolled back by users, and since it's an implementation detail we could make it the default for all editions someday in the future if we really wanted to.

bors added a commit to rust-lang-ci/rust that referenced this issue May 24, 2024
… r=<try>

[perf] Test performance with v0 symbol mangling scheme being the default

With the wider ecosystem getting close to support the v0 mangling scheme and LLD as the default linker addressing v0's main performance issue, it is getting more likely that we can soon make v0 the default.

This PR's purpose is to collect current performance numbers and also to serve as a baseline for gauging the performance impact of [MCP 737](rust-lang/compiler-team#737).

v0 symbol mangling tracking issue: rust-lang#60705

r? `@ghost`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B-RFC-implemented Blocker: Approved by a merged RFC and implemented. B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. S-tracking-needs-to-bake Status: The implementation is "complete" but it needs time to bake. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.