Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demangle Rust symbols #371

Open
rui314 opened this issue Feb 28, 2022 · 28 comments
Open

Demangle Rust symbols #371

rui314 opened this issue Feb 28, 2022 · 28 comments
Labels
enhancement New feature or request

Comments

@rui314
Copy link
Owner

rui314 commented Feb 28, 2022

Currently, we demangle only C++ symbols, but we should be able to demangle (i.e. pretty print) mangled Rust symbols.

Unfortunately, it looks like there's no C++ library that can demangle Rust symbols. Maybe we need to rewrite https://github.com/rust-lang/rustc-demangle in C++.

Note that this is not high priority.

@rui314 rui314 added the enhancement New feature or request label Feb 28, 2022
@Alcaro
Copy link

Alcaro commented Feb 28, 2022

LLVM has something named RustDemangle.cpp, in case that helps.

Don't know how complete it is, though. Nor how stable its API is. AFAIK LLVM offers no stability guarantees other than their C API (which only exports a subset of functionality), but char *llvm::rustDemangle(const char *MangledName, char *Buf, size_t *N, int *Status) doesn't look very likely to change.

Another option would be tossing a C-compatible API onto rustc-demangle and calling that. This has the drawback that mold needs rustc nearby to demangle Rust symbols, but the advantage that there's no need to wait for the alternate demangler to update if Rust starts doing something new (C++ lambdas took a while to be demangled correctly).

@rui314
Copy link
Owner Author

rui314 commented Feb 28, 2022

I don't want to depend on LLVM nor rustc, so I was thinking of implementing it ourselves. It looks like it's not too hard to write our own implementation.

@bjorn3
Copy link

bjorn3 commented May 7, 2022

GNU libiberty (as used by binutils and gdb) has support for demangling both the current legacy symbol mangling scheme and the new v0 symbol mangling scheme. In any case if you want to implement it yourself the v0 symbol mangling scheme is documented at https://rust-lang.github.io/rfcs/2603-rust-symbol-name-mangling-v0.html

@a-lafrance
Copy link

If there's still interest in implementing this ourselves, I'd be interested in giving it a try. I looked over the RFC linked above and it seems doable.

If that sounds alright, I'm just wondering what approach to take for writing the new demangler. I was thinking of either writing a new Rust demangling function in demangle.cc, or extending the existing one to cover Rust symbols (or doing something else entirely if another approach is better). In any case, just looking for tips on the overall approach here, since I have very limited knowledge of how the demangler interacts with the rest of the code.

Thanks in advance!

@rui314
Copy link
Owner Author

rui314 commented May 9, 2022

The current situation isn't as bad as I originally thought when I filed this bug. I misunderstood that the new mangling scheme is already in use, but in fact, Rust seems to be still using the old modified Itanium mangling scheme, and that can be demangled fairly well with the existing C++ demangler.

We need to support the new Rust mangling scheme eventually if Rust will eventually switch to that scheme, though. Does anyone have any idea when that will happen?

@a-lafrance
A demangler for Rust written in C++ will be useful not only for mold but for other C++ programs, so here is what I was planning:

  1. Create a separate repository for a new demangler,
  2. write a demangler in C++ that demangles argv[1] and print out the demangled name to stdout,
  3. release it with a liberal license such as the MIT license, and
  4. let other project to take the source code from that repo to theirs.

@a-lafrance
Copy link

a-lafrance commented May 9, 2022

Interesting, makes sense. In that case, if you're looking for help with that new demangler project, I'd be interested.

@bjorn3
Copy link

bjorn3 commented May 9, 2022

Rust seems to be still using the old modified Itanium mangling scheme, and that can be demangled fairly well with the existing C++ demangler.

Indeed.

We need to support the new Rust mangling scheme eventually if Rust will eventually switch to that scheme, though. Does anyone have any idea when that will happen?

You can already opt in on stable. All blockers towards switching to it by default seem to have been resolved so we may switch the default soonish: rust-lang/rust#60705 (comment) After that it takes between 6 and 12 weeks before the change lands on stable depending on where in the release cycle we switch.

@rui314
Copy link
Owner Author

rui314 commented May 9, 2022

@a-lafrance Feel free to start a project. If you start it, it'll be your project, and not only mold but other projects will find it useful.

@a-lafrance
Copy link

@rui314 Sounds good, I'll give it a try then. I'll revisit this thread when I have something to show for it I guess.

@eddyb
Copy link

eddyb commented May 19, 2022

This would be the fourth non-Rust demangler for Rust (libiberty, IllumOS, LLVM off the top of my head).

I would suggest reusing the libiberty one (that I wrote - let me know what you need from me to re-license it, surely I am not limited to GPL just because I contributed it to the GCC repo?), or the LLVM one if its license is agreeable. AFAIK both are ports of the Rust code, though in different styles.

The former has been used in several other projects, and it was supposed to be a self-contained single-file demangler implemented in C from the start, that everyone used, but perhaps we failed on the messaging front.

@rui314
Copy link
Owner Author

rui314 commented May 19, 2022

I wasn't aware that there are already multiple implementations for demangling Rust symbols. Let me take a look.

As to the one in the libiberty, I think you can relicense it unless you assign its copyright to GNU, but until the last year, GNU project had required all contributors to assign copyright to them, so the code you wrote may no longer be owned by you.

If the Rust demangler in libiberty were in its own repo and can be compiled to a simple command that just demangles a mangled symbol, it would have been much easier for me to find.

@bjorn3
Copy link

bjorn3 commented May 19, 2022

https://www.fsf.org/bulletin/2014/spring/copyright-assignment-at-the-fsf

Thus, we grant back to contributors a license to use their work as they see fit. This means they are free to modify, share, and sublicense their own work under terms of their choice. This enables contributors to redistribute their work under another free software license. While this technically also permits distributing their work under a proprietary license, we hope they won't.

If I understand correctly @eddyb could still relicense it under any license, the FSF would just be the copyright holder.

@rui314
Copy link
Owner Author

rui314 commented May 19, 2022

I'm not a lawyer, but I don't think you can (re-)license software as you want unless you have the copyright of that software. You can still ask the current owner of the software to relicense, though.

@bjorn3
Copy link

bjorn3 commented May 19, 2022

As I understand it they give you a license back that allows you to sublicense it under any terms you want. I can't easily find the exact legal text though.

@rui314
Copy link
Owner Author

rui314 commented May 19, 2022

@bjorn3 Ah, I missed that point. I think you are right.

@eddyb
Copy link

eddyb commented May 19, 2022

If the Rust demangler in libiberty were in its own repo and can be compiled to a simple command that just demangles a mangled symbol, it would have been much easier for me to find.

I agree - upstreaming (to allow e.g. gdb and valgrind to work), was the main priority (and delayed stabilization by a lot, because it turned out to be messier than we hoped) but we should've made it more reusable from the start.

I'm not sure where it should go, the problem with putting it in rust-lang/rustc-demangle is that that also has its own C API, IIRC, but in that case it's wrapping the Rust code, not a plain C implementation.

One version of it has been in a gist (w/ a small test driver, which compares output against rustc-demangle), but that's mostly been for my own convenience of linking it privately, and isn't really a suitable replacement for a repo (and I would also want to fork its history at the point where I integrated into it the legacy demangling and finally added the libiberty copyright header).

but until the last year, GNU project had required all contributors to assign copyright to them

I'll have to check, I don't remember being put through that, but it's possible they only didn't require it for the first few (smaller) refactor patches - either way, pretty big blunder on my part if I went with it just "to get it done".

@kassane
Copy link

kassane commented May 22, 2022

Unfortunately, it looks like there's no C++ library that can demangle Rust symbols. Maybe we need to rewrite https://github.com/rust-lang/rustc-demangle in C++.

alternative: https://github.com/getsentry/symbolic/blob/master/symbolic-demangle

@bjorn3
Copy link

bjorn3 commented May 22, 2022

That is a rust crate which uses the rustc_demangle crate for demangling rust symbols: https://github.com/getsentry/symbolic/blob/56349e9686a4dc655d0b9e9c0ccfff82ca4e95d7/symbolic-demangle/src/lib.rs#L278

@kassane
Copy link

kassane commented May 22, 2022

That is a rust crate which uses the rustc_demangle crate for demangling rust symbols: https://github.com/getsentry/symbolic/blob/56349e9686a4dc655d0b9e9c0ccfff82ca4e95d7/symbolic-demangle/src/lib.rs#L278

Damn it! At times forget about these interdependencies. 😅

@eddyb
Copy link

eddyb commented Jun 18, 2022

(Sorry for not getting back to this in the past month, other things kept coming up etc.)

So I just took a look over the emails for my libiberty contributions, and at no point did anyone even mention copyright assignment, and I never signed anything, so unless it's implicit without a signature, I'm in the clear.
(I suppose this not being GCC proper made it easier to accidentally bypass their policy? not sure heh)

The plan right now is to take the gist history and turn it into a repo, but cut it off short (just before I integrated the existing libiberty code), and reimplement the relevant missing features (legacy symbols, some of the constants, etc.) based on the Rust code in rustc-demangle (and use the same license as that project).

@rui314
Copy link
Owner Author

rui314 commented Jun 19, 2022

@eddyb Sounds great! Once the repo is ready, I'll try to integrate it into mold.

@eddyb
Copy link

eddyb commented Jul 4, 2022

Finally got around to setting up that repo I kept promising: rust-demangle.c.

However, I ran out of time this weekend to actually implement anything new, and only got around to writing the README, cleaning up the code style (it was GNU/C89-ish), setting up a test harness etc.

(FWIW, for v0 symbols it does fully pass the 01-06-2019 ~1M symbol dataset I still have, with identical output to rustc_demangle, so it's not useless, just incapable of demangling legacy symbols and some const generics)


The public API is declared entirely in the tiny rust-demangle.h header:

#define RUST_DEMANGLE_FLAG_VERBOSE 1

bool rust_demangle_with_callback(
    const char *mangled, int flags,
    void (*callback)(const char *data, size_t len, void *opaque), void *opaque
);
char *rust_demangle(const char *mangled, int flags);

I don't expect this to change (other than adding flags for e.g. controlling recursion/output limits), unless there's demand for additional features that can't be expressed through it.

So feel free to leave any feedback (ideally as issues on that repo) on that side of the design.


As for all the missing features, I'll get back when I get more time to port them.

rui314 added a commit that referenced this issue Jul 4, 2022
@rui314
Copy link
Owner Author

rui314 commented Jul 4, 2022

Thank you very much for @eddyb for doing this!

I integrated your Rust demangler in the above commit, though I haven't tested it yet.

@eddyb
Copy link

eddyb commented Jul 4, 2022

Oof, C++ compat (0e88fd0) definitely is something I should've handled (thought of it but then forgot).

But I don't think it's going to work within the if (name.starts_with("_Z")) { check (as v0 symbols don't begin with that), and for legacy ones you also have to try before attempting C++ demangling (as legacy Rust mangling is C++ mangling with a couple weird details - but there's no way to test that until legacy demangling is ported).

Looking at the commit history, seems like git subtree is used, so updating it further on should be easy, right?
(I didn't expect immediate adoption, but it's being informative, so I don't mind)

@rui314
Copy link
Owner Author

rui314 commented Jul 4, 2022

Is there any regression from the C++ users's point of view if we attempt to demangle a string as a Rust mangled name before as a C++ mangled name? If a valid C++ mangled name is demangled as a Rust mangled name in an weird way, we can't call rust_demangle before cxx_demangle.

Updating a subtree is easy; it should be updatable with a git command with some command line options.

@eddyb
Copy link

eddyb commented Jul 4, 2022

Is there any regression from the C++ users's point of view if we attempt to demangle a string as a Rust mangled name before as a C++ mangled name?

No regression AFAIK, and this is what libiberty does.

If a valid C++ mangled name is demangled as a Rust mangled name in an weird way

So the way legacy symbols work (ignoring the weird $-based substitutions) is they're flat _ZN...E paths (with only length-prefixed identifiers in the ... part), and the last component is a hexadecimal hash prefixed with h.

As an example, _ZN3std2io5error5Error3new17hbb46546882ad6265E demangles to std::io::error::Error::new::hbb46546882ad6265 (when passed through a C++ demangler), and you can see the hbb46546882ad6265 at the end that's the hash component.
In that case, all a Rust legacy demangler would do differently is strip the ::hbb46546882ad6265.

It's unfortunate that they're not more distinct, but I haven't heard of a C++ symbol being mistaken for one yet.
(Then again, it's up to you how you want to integrate it, and long-term we hope to forget legacy mangling ever existed, but for the time being it's still the default)

@rui314
Copy link
Owner Author

rui314 commented Jul 4, 2022

Thanks. I made a change so that symbols are demangled as Rust symbols before as C++ symbols. I'll test this change tomorrow.

@XVilka
Copy link

XVilka commented Jun 30, 2023

Rizin also implemented Rust demangling in pure C in the universal demangling library - rz-libdemangle, it's under LGPLv3 license (except C++ demangling for now: rizinorg/rz-libdemangle#3)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants