New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Symbol mangling happens despite #[no_mangle] or #[link_name] #35052

Closed
jjpe opened this Issue Jul 26, 2016 · 11 comments

Comments

Projects
None yet
5 participants
@jjpe

jjpe commented Jul 26, 2016

I see symbols being mangled despite using #[no_mangle] or #[link_name = "foo"]. Specifically, the symbols are being prepended by _, i.e. foo gets mangled to _foo rather than being accessible as foo.

I tried this code:

#[no_mangle]
pub static mut foo: std::os::raw::c_int = 0;

and

#[link_name = "foo"]
pub static mut foo: std::os::raw::c_int = 0;

Meta

OS: OS X 10.10

rustc --version --verbose: I tried both

rustc 1.10.0 (cfcb716cf 2016-07-03) binary: rustc commit-hash: cfcb716cf0961a7e3a4eceac828d94805cf8140b commit-date: 2016-07-03 host: x86_64-apple-darwin release: 1.10.0

and

rustc 1.12.0-nightly (9316ae515 2016-07-24) binary: rustc commit-hash: 9316ae515e2f8f3f497fb4f1559910c1eef2433d commit-date: 2016-07-24 host: x86_64-apple-darwin release: 1.12.0-nightly

@brson

This comment has been minimized.

Show comment
Hide comment
@brson

brson Jul 26, 2016

Contributor

The _ prefix isn't considered to be part of the Rust name mangling - it's part of the platform ABI for OS X. I don't know of any way to get a symbol in Rust on OS X without the _, so seemingly the language would need a new mechanism to mean "use this exact string for the symbol name".

Contributor

brson commented Jul 26, 2016

The _ prefix isn't considered to be part of the Rust name mangling - it's part of the platform ABI for OS X. I don't know of any way to get a symbol in Rust on OS X without the _, so seemingly the language would need a new mechanism to mean "use this exact string for the symbol name".

@jjpe

This comment has been minimized.

Show comment
Hide comment
@jjpe

jjpe Jul 26, 2016

Having no knowledge of rustc's source code, it seems to me that that is the purpose of #[link_name = "..."] (which would suggest that the real bug is in the implementation of the link_name attribute). But I could be mistaken.

jjpe commented Jul 26, 2016

Having no knowledge of rustc's source code, it seems to me that that is the purpose of #[link_name = "..."] (which would suggest that the real bug is in the implementation of the link_name attribute). But I could be mistaken.

@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton Jul 26, 2016

Member

Ah the #[no_mangle] is actually just for Rust's mangling (as @brson mentioned). To disable the "mangling" LLVM does (e.g. adding the _ in front) you can work with a likely undocumented escape hatch in LLVM:

#[export_name = "\x01foo"]
pub extern fn foo() {
}

That is, explicit symbol names which start with the 1 byte get no extra mangling at the LLVM layer, and that 1 byte is stripped.

Member

alexcrichton commented Jul 26, 2016

Ah the #[no_mangle] is actually just for Rust's mangling (as @brson mentioned). To disable the "mangling" LLVM does (e.g. adding the _ in front) you can work with a likely undocumented escape hatch in LLVM:

#[export_name = "\x01foo"]
pub extern fn foo() {
}

That is, explicit symbol names which start with the 1 byte get no extra mangling at the LLVM layer, and that 1 byte is stripped.

@comex

This comment has been minimized.

Show comment
Hide comment
@comex

comex Jul 27, 2016

Contributor

For reference, in C, using the asm-on-declarations GNU extension on OS X omits the underscore:

% cat test.c
int has_underscore;
int x asm("no_underscore");
% clang -c test.c
% nm test.o
0000000000000004 C _has_underscore
0000000000000004 C no_underscore

Arguably it would be more consistent for Rust to do the same with export_name, though that would be breaking.

Contributor

comex commented Jul 27, 2016

For reference, in C, using the asm-on-declarations GNU extension on OS X omits the underscore:

% cat test.c
int has_underscore;
int x asm("no_underscore");
% clang -c test.c
% nm test.o
0000000000000004 C _has_underscore
0000000000000004 C no_underscore

Arguably it would be more consistent for Rust to do the same with export_name, though that would be breaking.

@retep998

This comment has been minimized.

Show comment
Hide comment
@retep998

retep998 Jul 27, 2016

Member

Really I think we just need to properly document the "\x01foo" trick somewhere.

Member

retep998 commented Jul 27, 2016

Really I think we just need to properly document the "\x01foo" trick somewhere.

@comex

This comment has been minimized.

Show comment
Hide comment
@comex

comex Jul 27, 2016

Contributor

Yuck... it's confusing enough that mangling adds characters you didn't write, without added magic that removes characters you did write. I would much prefer either changing export_name or adding another attribute (export_name_asm?).

Contributor

comex commented Jul 27, 2016

Yuck... it's confusing enough that mangling adds characters you didn't write, without added magic that removes characters you did write. I would much prefer either changing export_name or adding another attribute (export_name_asm?).

@retep998

This comment has been minimized.

Show comment
Hide comment
@retep998

retep998 Jul 27, 2016

Member

Normally the ABI decoration is desired though. If some code links to an extern C function named foo, it usually is actually looking for the decorated form of that, such as _foo or _foo@4 (stdcall) or even @foo@4 (fastcall), and if some code defines an extern C function named foo it is usually defined having the decorated form of that as well. This form of decoration is the standard thing to do and is usually desired, so I see no problem with having it be the default. However there does need to be a way to opt out for situations that call for it, either document and stabilize the behavior of \x01 or add new attributes that guarantee no decoration. Don't change the behavior of the existing attributes as there is a lot of code that relies on the decorations.

Member

retep998 commented Jul 27, 2016

Normally the ABI decoration is desired though. If some code links to an extern C function named foo, it usually is actually looking for the decorated form of that, such as _foo or _foo@4 (stdcall) or even @foo@4 (fastcall), and if some code defines an extern C function named foo it is usually defined having the decorated form of that as well. This form of decoration is the standard thing to do and is usually desired, so I see no problem with having it be the default. However there does need to be a way to opt out for situations that call for it, either document and stabilize the behavior of \x01 or add new attributes that guarantee no decoration. Don't change the behavior of the existing attributes as there is a lot of code that relies on the decorations.

@jjpe

This comment has been minimized.

Show comment
Hide comment
@jjpe

jjpe Jul 27, 2016

I omitted this initially in an attempt to keep the thread focused.
But now I think it's a good idea explain my exact use case:

I'm developing an Emacs module. Modules are a new Emacs 25 feature and basically provide an FFI to the C world. Emacs requires each module to have a symbol plugin_is_GPL_compatible to be present, otherwise it will refuse to load the module.

I have a working C module (working as in it can be successfully loaded by Emacs) in which the symbol is defined as int plugin_is_GPL_compatible;. When I inspect that with nm -gU my_c_prototype_module.so, the output is:

0000000000002ad0 T _emacs_module_init
0000000000007090 S _plugin_is_GPL_compatible
00000000000016a0 T _socket_option_to_c
0000000000000fc0 T _socket_option_to_elisp
0000000000000f80 T _socket_option_valid
0000000000000a50 T _socket_type_to_c
0000000000000810 T _socket_type_to_elisp
00000000000007e0 T _socket_type_valid

(The -g flag omits non-globally defined symbols, and the -U omits symbols defined by library code so that the output only contains things that are directly defined by my C source file.)
So apparently even in the C version the symbol does end up as _plugin_is_GPL_compatible.

When I build the Rust code with this definition:

#[no_mangle]
pub static plugin_is_GPL_compatible: c_int = 0;

and inspect the generated .dylib with nm -gU target/debug/libemm.dylib|grep -v "__", I get

0000000000000ec0 T _Fzmq_context
00000000000026b0 T _bind_function
0000000000000f00 T _emacs_module_init
0000000000002180 T _find_function
00000000000024e0 T _get_environment
0000000000002510 T _intern_symbol
0000000000002320 T _make_function
0000000000086330 S _plugin_is_GPL_compatible
0000000000002880 T _provide
0000000000041e20 T _rust_begin_unwind
00000000000435e0 T _rust_eh_personality
00000000000e98d0 S _rust_metadata_emm_42c41e3ad011af9b

(The grep is necessary since there are almost 5800 symbols with __ in them in the .dylib, all of them seemingly unrelated to this discussion.)

I notice 2 things:

  1. The symbols _plugin_is_GPL_compatible and _emacs_module_init indeed aren't mangled, at least not any worse than their C counterparts. So #[no_mangle] works as it's supposed to.
  2. The order in which they appear here is oddly unrelated to the order in which they are defined in the Rust source. If those 2 were related, _plugin_is_GPL_compatible would appear on top of the list.

I'm not sure what to make of this to be honest, since Emacs still refuses to load it. But this is a .dylib and the C example produces a .so file.
Are there any meaningful differences between those 2?
And does the symbol order in a dynamic library matter at all?

While it would be great to know the answers to those questions, it also seems to me that this specific bug can be closed.

jjpe commented Jul 27, 2016

I omitted this initially in an attempt to keep the thread focused.
But now I think it's a good idea explain my exact use case:

I'm developing an Emacs module. Modules are a new Emacs 25 feature and basically provide an FFI to the C world. Emacs requires each module to have a symbol plugin_is_GPL_compatible to be present, otherwise it will refuse to load the module.

I have a working C module (working as in it can be successfully loaded by Emacs) in which the symbol is defined as int plugin_is_GPL_compatible;. When I inspect that with nm -gU my_c_prototype_module.so, the output is:

0000000000002ad0 T _emacs_module_init
0000000000007090 S _plugin_is_GPL_compatible
00000000000016a0 T _socket_option_to_c
0000000000000fc0 T _socket_option_to_elisp
0000000000000f80 T _socket_option_valid
0000000000000a50 T _socket_type_to_c
0000000000000810 T _socket_type_to_elisp
00000000000007e0 T _socket_type_valid

(The -g flag omits non-globally defined symbols, and the -U omits symbols defined by library code so that the output only contains things that are directly defined by my C source file.)
So apparently even in the C version the symbol does end up as _plugin_is_GPL_compatible.

When I build the Rust code with this definition:

#[no_mangle]
pub static plugin_is_GPL_compatible: c_int = 0;

and inspect the generated .dylib with nm -gU target/debug/libemm.dylib|grep -v "__", I get

0000000000000ec0 T _Fzmq_context
00000000000026b0 T _bind_function
0000000000000f00 T _emacs_module_init
0000000000002180 T _find_function
00000000000024e0 T _get_environment
0000000000002510 T _intern_symbol
0000000000002320 T _make_function
0000000000086330 S _plugin_is_GPL_compatible
0000000000002880 T _provide
0000000000041e20 T _rust_begin_unwind
00000000000435e0 T _rust_eh_personality
00000000000e98d0 S _rust_metadata_emm_42c41e3ad011af9b

(The grep is necessary since there are almost 5800 symbols with __ in them in the .dylib, all of them seemingly unrelated to this discussion.)

I notice 2 things:

  1. The symbols _plugin_is_GPL_compatible and _emacs_module_init indeed aren't mangled, at least not any worse than their C counterparts. So #[no_mangle] works as it's supposed to.
  2. The order in which they appear here is oddly unrelated to the order in which they are defined in the Rust source. If those 2 were related, _plugin_is_GPL_compatible would appear on top of the list.

I'm not sure what to make of this to be honest, since Emacs still refuses to load it. But this is a .dylib and the C example produces a .so file.
Are there any meaningful differences between those 2?
And does the symbol order in a dynamic library matter at all?

While it would be great to know the answers to those questions, it also seems to me that this specific bug can be closed.

@retep998

This comment has been minimized.

Show comment
Hide comment
@retep998

retep998 Jul 27, 2016

Member

But this is a .dylib and the C example produces a .so file.

Perhaps you could try using the new cdylib crate type. Would also take care of your almost 5800 unrelated symbols.

And does the symbol order in a dynamic library matter at all?

It shouldn't.

Member

retep998 commented Jul 27, 2016

But this is a .dylib and the C example produces a .so file.

Perhaps you could try using the new cdylib crate type. Would also take care of your almost 5800 unrelated symbols.

And does the symbol order in a dynamic library matter at all?

It shouldn't.

@jjpe

This comment has been minimized.

Show comment
Hide comment
@jjpe

jjpe Jul 27, 2016

@retep998 I just tried changing

# ...
[lib]
crate-type = ["dylib"]
# ...

to

# ...
[lib]
crate-type = ["cdylib"]
# ...

But those 2 produce identical results for me when using rustc 1.12.0-nightly (9316ae515 2016-07-24), both in lib output and in how Emacs treats them.

jjpe commented Jul 27, 2016

@retep998 I just tried changing

# ...
[lib]
crate-type = ["dylib"]
# ...

to

# ...
[lib]
crate-type = ["cdylib"]
# ...

But those 2 produce identical results for me when using rustc 1.12.0-nightly (9316ae515 2016-07-24), both in lib output and in how Emacs treats them.

@jjpe jjpe closed this Jul 27, 2016

@jjpe

This comment has been minimized.

Show comment
Hide comment
@jjpe

jjpe Jul 27, 2016

I think I finally figured out what happened. Emacs wants the dynamic lib to have a .so extension, to the point that building my library as a regular .dylib and renaming the extension to .so actually works. Weird quirk on Emacs' part.

jjpe commented Jul 27, 2016

I think I finally figured out what happened. Emacs wants the dynamic lib to have a .so extension, to the point that building my library as a regular .dylib and renaming the extension to .so actually works. Weird quirk on Emacs' part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment