Skip to content
This repository has been archived by the owner on Aug 24, 2022. It is now read-only.

Mangle names #7

Open
sunjay opened this issue Sep 18, 2019 · 2 comments
Open

Mangle names #7

sunjay opened this issue Sep 18, 2019 · 2 comments

Comments

@sunjay
Copy link
Owner

sunjay commented Sep 18, 2019

Function names (except for main) and even variable names should all be managed to prevent accidental name clashes. E.g. If we don't do this someone could define a function __disco__DInt__operator_add (or whatever) and mess everything up.

https://internals.rust-lang.org/t/pre-rfc-a-new-symbol-mangling-scheme/8501

@sunjay
Copy link
Owner Author

sunjay commented Feb 29, 2020

Name Mangling Scheme

The hash values below are made up, but should eventually come from some decent algorithm for short hashes.

  1. package/module
    • Fully qualified name of module is used, with each part mangled individually
    • Includes the mangled package name (e.g. std)
    • Hash for each part is created using "mod" + module name
    • :: becomes __
    • Example: std::num becomes std_493ef8a1__num_839a1b34
  2. type
    • Mangled name of containing module + __ + mangled name of type
    • Hash for type is created using "type" + type name
    • Example: std::num::NonZeroU32 becomes std_493ef8a1__num_839a1b34__NonZeroU32_6743fa12
  3. method
    • Mangled name of type + __ + unmangled name of method
    • Method names are unique for a given type so this works
    • Example: std::num::NonZeroU32::new becomes std_493ef8a1__num_839a1b34__NonZeroU32_6743fa12__new
  4. function
    • Mangled name of containing module/function + __ + mangled name of function
    • Hash for function name is created using "fn" + function name
    • Don't want type names and function names to be able to conflict, so both are mangled with a different prefix for the hash
    • Example: std::io::copy becomes std_493ef8a1__io_15ad8c91__copy_ab77d31a
  5. variable/function parameter
    • Hash of variable name
    • Needed to prevent collisions with temporaries

The extra string included in the hash is because different namespaces can contain the same names. That is, you can technically have a type and a function named Foo.

Short Hashes

use std::io::Read;

use adler32::adler32;

fn mangle_function_name(mod_mangled_name: &str, name: &str) -> String {
    let prefix = "fn".as_bytes();
    // https://doc.rust-lang.org/nightly/std/io/trait.Read.html#method.chain
    let mangle_input = prefix.chain(name.as_bytes());
    // unwrap() is safe because Read for &[u8] can never result in an error.
    let hash = adler32(mangle_input).unwrap();
    format!("{}__{}_{:x}", mod_mangled_name, name, hash)
}

Update: Apparently adler32 isn't great for short strings. Might just want to truncate md5 to 32-bits. This may have issues too.

@sunjay
Copy link
Owner Author

sunjay commented Feb 29, 2020

Even simpler scheme that just uses the length and not a hash: https://rust-lang.github.io/rfcs/2603-symbol-name-mangling-v2.html#the-mangling-scheme-by-example

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant