Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we have a way to give functions a stable address? #515

Open
RalfJung opened this issue Jul 5, 2024 · 5 comments
Open

Should we have a way to give functions a stable address? #515

RalfJung opened this issue Jul 5, 2024 · 5 comments

Comments

@RalfJung
Copy link
Member

RalfJung commented Jul 5, 2024

Nominally we currently do not guarantee anything about a function's address -- the same function can have different addresses when it is turned into a function pointer multiple times, and different functions can have the same address. (The mechanics why that can happen is that we set the unnamed_addr flag in LLVM, so the duplicates of this function that are generated in different CGUs can have visibly different addresses, and LLVM can deduplicate different functions if they have identical assembly code.)

Following that, Miri used to generate a fresh address each time a function item is turned into a function pointer. However, that turned out to be problematic: the std formatting machinery used to depend on one function having a fixed address (but this got fixed recently), and the backtrace machinery uses function pointer comparison to shorten the backtrace and hide frames that are part of the std pre-main life. So we now have a sort of hack in Miri where we do give specific functions a stable address, namely if they are monomorphic and inline(never). But of course this is not actually a semantic guarantee.

So maybe it would make sense to have an attribute we can attach to a function that disables unnamed_addr? Then we should not see any de-duplication any more, I hope. I am not sure about duplication -- is there a way that all the copies of a function that exist for inlining and generics can be given a consistent address, or is that not possible and we have to enforce such "functions with stable address" to be monomorphic and inline(never)?

@chorman0773
Copy link
Contributor

is there a way that all the copies of a function that exist for inlining and generics can be given a consistent address

It's possible on ELF and Mach-O platforms, but not PE (like windows) when dynamic linking is in the picture - PE doesn't have a way to collapse every dynamic export of a symbol into a single canonical definition.

@CAD97
Copy link

CAD97 commented Jul 7, 2024

If it's critically important, we could (probably) fake it in PE by adding additional hidden indirection, i.e. exporting a symbol that is a static with the address of the actual function instead of exporting the function symbol directly, then transparently projecting through that indirection in codegen, kind of similar to how extern static lowering ends up working.

In fact, it's already possible to write stable code that manually does this:

pub fn f() { /* ... */ }
pub const F: fn() = f; // or static

I would hope that F is actually guaranteed to be a structurally equal value in every codegen unit -- operationally it certainly feels like once you make a function pointer it should be a deterministic value -- but I'm not fully convinced that it is currently.

As far as how to directly request the non-usage of LLVM-unnamed_addr, it feels like it fits in as a subdirective for #[used]. Bringing "don't cull this" semantics along as well doesn't seem too unreasonable, since the only reason to require a stable address requires taking and using that address. (Although in fairness it's always possible for the relevant modules to be completely unused.)

@RalfJung
Copy link
Member Author

RalfJung commented Jul 7, 2024

As far as how to directly request the non-usage of LLVM-unnamed_addr, it feels like it fits in as a subdirective for #[used]. Bringing "don't cull this" semantics along as well doesn't seem too unreasonable, since the only reason to require a stable address requires taking and using that address. (Although in fairness it's always possible for the relevant modules to be completely unused.)

Already for used we have the issue that people don't use this because what they really mean is "keep this in the binary if certain parts of the code are not dead" -- so they add volatile reads in these codepaths to ensure the presence of a static, rather than adding #[used]. That seems like something that may also come up here, so I don't think this should be tied up with anything used-like.

@comex
Copy link

comex commented Jul 7, 2024

If it's critically important, we could (probably) fake it in PE by adding additional hidden indirection

That solves the issue of the stub you get if you're missing dllimport on the caller side (i.e. rust-lang/rust#27438). But that's not the hardest part. The hardest part is a situation like:

  • crate A: a dylib that defines fn foo<T>() {}
  • crate B: a dylib that depends on A and takes the address of foo::<u32>
  • crate C: a dylib that also depends on A and takes the address of foo::<u32>
  • crate D: an executable that depends on B and C

B and C both need to instantiate the generic, because neither depends on the other or is aware of the other. But for the address to be unique, you need only one of those instantiations to be actually used.

ELF platforms and Darwin support this by having a way to do process-wide symbol deduplication by name, and rely on this to uphold C++'s guarantee of a stable address for template instantiations. Windows does not support it, and just violates the C++ spec when it comes to instantiations across DLL boundaries.

This is still possible to fake with hidden indirection, but it's more complex. Either D would need to have a static initializer that patches up references in B and C, or libstd (or even libcore) would need to have some kind of hash table and basically pretend to be a dynamic linker.

@saethlin
Copy link
Member

saethlin commented Jul 8, 2024

or is that not possible and we have to enforce such "functions with stable address" to be monomorphic and inline(never)?

I don't think this is too severe a limitation. The monomorphic requirement seems entirely unsurprising, and we might be able to lift the inline(never) requirement later. The problem is LocalCopy codegen, not the actual inlining optimization, right? If we were to add a new attribute like #[stable_address] we could issue a diagnostic when that's used with conflicting attributes, and cross_crate_inlinable would just know to return false by default for such functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants