Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removal of C++ name mangling is not correctly applied #14205

Open
juj opened this issue May 17, 2021 · 12 comments
Open

Removal of C++ name mangling is not correctly applied #14205

juj opened this issue May 17, 2021 · 12 comments

Comments

@juj
Copy link
Collaborator

juj commented May 17, 2021

In wasm backend, someone decided that the traditional C++ name mangling would not be applied. This has caused numerous issues into the toolchain and to the users of the toolchain in the past. After #13477 I was hopeful that would have been the last of it, but here we go again:

#include <emscripten.h>

class Foo
{
public:
	Foo(int i, int j)
	{
		EM_ASM(console.log($0 + ' ' + $1), i, j);
	}
};

int main()
{
	Foo f(1,2);
}

Run with

$ em++ a.cpp -c -o a.o
$ em++ a.o -o a.html -g2
$ llvm-nm a.o
0000001b W _ZN20__em_asm_sig_builderI19__em_asm_type_tupleIJiiEEE6bufferE
0000005f W _ZN3FooC2Eii
00000000 W _ZZN3FooC1EiiE1x
00000001 T __original_main
         U __stack_pointer
         U emscripten_asm_const_int
000000f9 T main

$ wasm-opt --nm a.wasm
    __wasm_call_ctors : 1
    __original_main : 40
    Foo::Foo\28int\2c\20int\29 : 60
    main : 5
    stackSave : 1
    stackRestore : 2
    stackAlloc : 9
    emscripten_stack_init : 9
    emscripten_stack_get_free : 3
    emscripten_stack_get_end : 1
    __lockfile : 1
    __unlockfile : 1
    __lock : 1
    __unlock : 1
    __ofl_lock : 4
    __ofl_unlock : 2
    fflush : 82
    __fflush_unlocked : 49
    __errno_location : 1
warning: no output file specified, not emitting output

We see that when the code is still at the object file stage, the C++ name mangling is still applied, and e.g. the constructor Foo::Foo has a name _ZN3FooC2Eii. After the linking has occurred, the constructor instead has changed its name to Foo::Foo\28int\2c\20int\29.

This is preventing our tooling from analyzing which object files contribute which symbols (for size analysis) into the final output.

Is there a way that we could restore the proper C++ name mangling like the way it used to be? Maybe with a custom flag? This has been a major source of headache. Alternatively, is there a way that the symbols in .o files would have unmangled names also? (or maybe both solutions?)

@sbc100
Copy link
Collaborator

sbc100 commented May 17, 2021

I remember when this change was made in upstream llvm. Here is the discussion we had at the time: https://reviews.llvm.org/D44316.

I think the idea was based on the fact that the name section is supposed to contain "user visible names" for functions and that mangled names are not very helpful to users when they see them in backtrace. Also, since the mangling conventions is language specific, its not easy for devtools to do any kind of universal demangling when it shows backtraces.

I agree that adding an option to control this could make sense, especially if we want to be able to use the names section for analysis. We do have a --emit-symbol-map which is nice for mapping function indexes to names, and this will work even if there is no name section at all and the import/export names are minified.. its kind of like an external name section for tools to use. However, I can see that it could be tricky to teach all the different analysis tools to deal with a separate symbols file (at least we would want twiggy and bloaty to work with these).

Maybe easier and more universal to just add a -Wl,--no-demangled-name-section linker flag?

@sbc100
Copy link
Collaborator

sbc100 commented May 17, 2021

@dschuff @sunfishcode @kripken @RReverser what do you think? Should we add an option for keeping the name section mangled? Or even switch the default back to mangled?

@juj
Copy link
Collaborator Author

juj commented May 18, 2021

Maybe easier and more universal to just add a -Wl,--no-demangled-name-section linker flag?

Would this mean that the name section would then have the compile time (.o time) names like _ZN3FooC2Eii again? If so, then that sounds like a good resolution to me.

Should we add an option for keeping the name section mangled? Or even switch the default back to mangled?

I don't particularly mind what the default is, so needing a -Wl,--no-demangled-name-section option would work for me here.

We do have a --emit-symbol-map which is nice for mapping function indexes to names, and this will work even if there is no name section at all and the import/export names are minified.. its kind of like an external name section for tools to use. However, I can see that it could be tricky to teach all the different analysis tools to deal with a separate symbols file (at least we would want twiggy and bloaty to work with these).

We did already use --emit-symbol-map, and it does look like it is affected by the same bug:

$ em++ a.cpp -c -o a.o
$ em++ a.o -o a.html -g2 --emit-symbol-map
$ llvm-nm a.o
0000001b W _ZN20__em_asm_sig_builderI19__em_asm_type_tupleIJiiEEE6bufferE
0000005f W _ZN3FooC2Eii
00000000 W _ZZN3FooC1EiiE1x
00000001 T __original_main
         U __stack_pointer
         U emscripten_asm_const_int
000000f9 T main

$ type a.html.symbols
0:emscripten_asm_const_int
1:__wasm_call_ctors
2:__original_main
3:Foo::Foo\28int\2c\20int\29
4:main
5:stackSave
6:stackRestore
7:stackAlloc
8:emscripten_stack_init
9:emscripten_stack_get_free
10:emscripten_stack_get_end
11:__lockfile
12:__unlockfile
13:__lock
14:__unlock
15:__ofl_lock
16:__ofl_unlock
17:fflush
18:__fflush_unlocked
19:__errno_location

I.e. one is unable to correlate the object file contents together with the symbol map.

@kripken
Copy link
Member

kripken commented May 18, 2021

+1 for -Wl,--no-demangled-name-section. We could also use that internally to simplify how asyncify and other things use the name section, which has required several annoying hacks.

@sbc100
Copy link
Collaborator

sbc100 commented May 18, 2021

Oh.. I forgot we already have --no-demangle and it is already honored in the name section: https://reviews.llvm.org/D54279

@sbc100
Copy link
Collaborator

sbc100 commented May 18, 2021

Should we make it the default for emscripten? i.e. How useful is it having human readable stack traced generated by the VM?

@dschuff
Copy link
Member

dschuff commented May 18, 2021

I like the idea of the linker flag too. So for the question of defaults...
Clearly (to me) any name that we expect to be used by tools should have the mangled name. I'm not too familiar with the uses of the symbol map, but it seems likely to me that it would be used by some kind of tool rather than read by humans, so it seems that it should be mangled. The name section is more interesting. The whole reason it's demangled is because its most common use is that the names show up in stack traces seen by developers, and it's clearly much easier to read for that use case. (But of course some developers want to collect those stack traces on the client side for tool analysis later, so that's another case for having them mangled).

Maybe the "right" thing to do is have the browser devtools do the demangling when it shows stack traces. There is some precedent for that kind of thing (e.g. Chrome has some smarts for async functions). Then if you captured the trace programmatically you'd get mangled names, which is probably what you'd want.
I wonder if @bmeurer has any thoughts about that.

@sbc100
Copy link
Collaborator

sbc100 commented May 18, 2021

I think there are a couple of issues with expecting the engine to be able to do demangling:

  1. Each language can/may have its one scheme and it doesn't seem reasonable to ask engines to understand all of them
  2. IIRC demangling is surprisingly complex for C++ (or is it mangling.. certainly one of them is hard).

@bmeurer
Copy link
Contributor

bmeurer commented May 19, 2021

cc @pfaffe

My initial thought here would be that the names section should contain mangled names, and that it's the debugger's responsibility to demangle. But thinking about it more, the DWARF data already contains the mangled names and the primary use of the names section is when DWARF data isn't readily available, and in that case, it might be some generic tool looking at the names that doesn't understanding C++ name demangling.

@RReverser
Copy link
Collaborator

RReverser commented May 19, 2021

I think keeping demangled names in the name section makes more sense too, given that it can be mangled by pretty much any scheme from any language and, unlike in DWARF, we don't have information on which language it was.

@sbc100
Copy link
Collaborator

sbc100 commented May 19, 2021

@juj, can you try the existing -Wl,-no-demangle option to see that does what you want? In my tests it seems to. If so, perhaps we can just document that and be done for now.

Also, not that the strange Foo::Foo\28int\2c\20int\29 is specific to binaryen I think. The actual name in the name section looks more human readable than that:

$ wasm-objdump -x a.wasm
...
Code[19]:
 - func[1] size=4 <__wasm_call_ctors>
 - func[2] size=77 <__original_main>
 - func[3] size=130 <Foo::Foo(int, int)>
...

Perhaps wasm-opt --nm could use fixing here too?

@juj
Copy link
Collaborator Author

juj commented May 27, 2021

@juj, can you try the existing -Wl,-no-demangle option to see that does what you want? In my tests it seems to. If so, perhaps we can just document that and be done for now.

I am a bit hard pressed on time to test that out - when I have a chance to jump back on this, I'll give this a try (we disabled object file analysis support for now)

Perhaps wasm-opt --nm could use fixing here too?

Ah yea, that would be useful to UTF-8 decode the prints.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants