Should we have a way to give functions a stable address? #515

RalfJung · 2024-07-05T06:40:37Z

Nominally we currently do not guarantee anything about a function's address -- the same function can have different addresses when it is turned into a function pointer multiple times, and different functions can have the same address. (The mechanics why that can happen is that we set the unnamed_addr flag in LLVM, so the duplicates of this function that are generated in different CGUs can have visibly different addresses, and LLVM can deduplicate different functions if they have identical assembly code.)

Following that, Miri used to generate a fresh address each time a function item is turned into a function pointer. However, that turned out to be problematic: the std formatting machinery used to depend on one function having a fixed address (but this got fixed recently), and the backtrace machinery uses function pointer comparison to shorten the backtrace and hide frames that are part of the std pre-main life. So we now have a sort of hack in Miri where we do give specific functions a stable address, namely if they are monomorphic and inline(never). But of course this is not actually a semantic guarantee.

So maybe it would make sense to have an attribute we can attach to a function that disables unnamed_addr? Then we should not see any de-duplication any more, I hope. I am not sure about duplication -- is there a way that all the copies of a function that exist for inlining and generics can be given a consistent address, or is that not possible and we have to enforce such "functions with stable address" to be monomorphic and inline(never)?

The text was updated successfully, but these errors were encountered:

chorman0773 · 2024-07-05T11:47:26Z

is there a way that all the copies of a function that exist for inlining and generics can be given a consistent address

It's possible on ELF and Mach-O platforms, but not PE (like windows) when dynamic linking is in the picture - PE doesn't have a way to collapse every dynamic export of a symbol into a single canonical definition.

CAD97 · 2024-07-07T19:06:14Z

If it's critically important, we could (probably) fake it in PE by adding additional hidden indirection, i.e. exporting a symbol that is a static with the address of the actual function instead of exporting the function symbol directly, then transparently projecting through that indirection in codegen, kind of similar to how extern static lowering ends up working.

In fact, it's already possible to write stable code that manually does this:

pub fn f() { /* ... */ }
pub const F: fn() = f; // or static

I would hope that F is actually guaranteed to be a structurally equal value in every codegen unit -- operationally it certainly feels like once you make a function pointer it should be a deterministic value -- but I'm not fully convinced that it is currently.

As far as how to directly request the non-usage of LLVM-unnamed_addr, it feels like it fits in as a subdirective for #[used]. Bringing "don't cull this" semantics along as well doesn't seem too unreasonable, since the only reason to require a stable address requires taking and using that address. (Although in fairness it's always possible for the relevant modules to be completely unused.)

RalfJung · 2024-07-07T20:20:30Z

As far as how to directly request the non-usage of LLVM-unnamed_addr, it feels like it fits in as a subdirective for #[used]. Bringing "don't cull this" semantics along as well doesn't seem too unreasonable, since the only reason to require a stable address requires taking and using that address. (Although in fairness it's always possible for the relevant modules to be completely unused.)

Already for used we have the issue that people don't use this because what they really mean is "keep this in the binary if certain parts of the code are not dead" -- so they add volatile reads in these codepaths to ensure the presence of a static, rather than adding #[used]. That seems like something that may also come up here, so I don't think this should be tied up with anything used-like.

comex · 2024-07-07T23:56:47Z

If it's critically important, we could (probably) fake it in PE by adding additional hidden indirection

That solves the issue of the stub you get if you're missing dllimport on the caller side (i.e. rust-lang/rust#27438). But that's not the hardest part. The hardest part is a situation like:

crate A: a dylib that defines fn foo<T>() {}
crate B: a dylib that depends on A and takes the address of foo::<u32>
crate C: a dylib that also depends on A and takes the address of foo::<u32>
crate D: an executable that depends on B and C

B and C both need to instantiate the generic, because neither depends on the other or is aware of the other. But for the address to be unique, you need only one of those instantiations to be actually used.

ELF platforms and Darwin support this by having a way to do process-wide symbol deduplication by name, and rely on this to uphold C++'s guarantee of a stable address for template instantiations. Windows does not support it, and just violates the C++ spec when it comes to instantiations across DLL boundaries.

This is still possible to fake with hidden indirection, but it's more complex. Either D would need to have a static initializer that patches up references in B and C, or libstd (or even libcore) would need to have some kind of hash table and basically pretend to be a dynamic linker.

saethlin · 2024-07-08T02:23:55Z

or is that not possible and we have to enforce such "functions with stable address" to be monomorphic and inline(never)?

I don't think this is too severe a limitation. The monomorphic requirement seems entirely unsurprising, and we might be able to lift the inline(never) requirement later. The problem is LocalCopy codegen, not the actual inlining optimization, right? If we were to add a new attribute like #[stable_address] we could issue a diagnostic when that's used with conflicting attributes, and cross_crate_inlinable would just know to return false by default for such functions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should we have a way to give functions a stable address? #515

Should we have a way to give functions a stable address? #515

RalfJung commented Jul 5, 2024

chorman0773 commented Jul 5, 2024

CAD97 commented Jul 7, 2024 •

edited

Loading

RalfJung commented Jul 7, 2024

comex commented Jul 7, 2024

saethlin commented Jul 8, 2024

Should we have a way to give functions a stable address? #515

Should we have a way to give functions a stable address? #515

Comments

RalfJung commented Jul 5, 2024

chorman0773 commented Jul 5, 2024

CAD97 commented Jul 7, 2024 • edited Loading

RalfJung commented Jul 7, 2024

comex commented Jul 7, 2024

saethlin commented Jul 8, 2024

CAD97 commented Jul 7, 2024 •

edited

Loading