Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miscompilation with clashing symbol fn names of different types #69390

Closed
jumbatm opened this issue Feb 23, 2020 · 14 comments · Fixed by #70946
Closed

Miscompilation with clashing symbol fn names of different types #69390

jumbatm opened this issue Feb 23, 2020 · 14 comments · Fixed by #70946
Assignees
Labels
A-linkage Area: linking into static, shared libraries and binaries C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-high High priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@jumbatm
Copy link
Contributor

jumbatm commented Feb 23, 2020

Related to #67946.

Currently, it's possible to perform a read to uninitialised memory by shadowing a function with an extern declaration with the same name. For example:

#[derive(Debug)]
struct Point {
    x: i32,
    y: i32
}

#[no_mangle]
fn bar(n: Point) {
    println!("{:?}", n);
}

fn main() {
    extern {
        fn bar();
    }
    unsafe {
        bar();
    }
}

(Playground)

Output:

Point { x: 1014863296, y: 1421860952 }

Errors:

   Compiling playground v0.0.1 (/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 0.39s
     Running `target/debug/playground`

You can see we're able to happily print garbage this way. This is because on symbol name clash (but with different types), instead of erroring or otherwise doing work to produce distinct symbols, the function is bitcast instead. See #67946 (comment). In our example above, it means we produce a call to bar as if it takes no arguments, meaning we read garbage if we access the function arguments. You can also produce garbage return values, by shadowing a function which has no return value with a declaration that does have a return value.

This issue has been assigned to @jumbatm via this comment.

@jumbatm jumbatm added the C-bug Category: This is a bug. label Feb 23, 2020
@jumbatm
Copy link
Contributor Author

jumbatm commented Feb 23, 2020

While working on #67946, I started to prepare a fix for this issue. However, my "fix" is just to error out on name clash in codegen. This has the potential to break user code (but for the greater good, as their code probably wasn't behaving the way they thought it would anyway). What should be done in this situation? Should we be issuing a warning for now that will become an error? Or issuing a hard error immediately okay?

@nagisa
Copy link
Member

nagisa commented Feb 23, 2020

We used to have such a check already in the past, but the remnants of that functionality has been since removed in db477af#diff-1dee305d01351d452373973846b43671 cc @eddyb

@eddyb
Copy link
Member

eddyb commented Feb 23, 2020

Oh whoops I could've inlined the check/fatal error.
How do we not have a test for this though?!

@nagisa
Copy link
Member

nagisa commented Feb 23, 2020

#23011 added 7 of them and they still exist, but not sure why they don’t get triggered.

EDIT: I guess because CU partitioning emits them. That makes sense.

@Centril Centril added I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness A-linkage Area: linking into static, shared libraries and binaries T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 23, 2020
@jumbatm
Copy link
Contributor Author

jumbatm commented Feb 23, 2020

Odd thing is, we currently do have a similar test. The addition of my check actually causes it to fail, so it probably shouldn't have compiled to begin with.

In short form, the test is essentially

mod foo {
    extern {
        pub fn func(x: u32);
    }
}
mod bar {
    extern {
        pub fn func();
    }
}
fn main() {
    unsafe {
        foo::func(100);
        bar::func();
    }
}

(Playground - show LLVM IR)

The thing which is potentially confusing about this case is one might expect the two declarations to resolve to two different functions, because they're in separate mods. However, this case is actually miscompiled as well:

; ...
; playground::main
; Function Attrs: nonlazybind uwtable
define internal void @_ZN10playground4main17h37f684608cd1371bE() unnamed_addr #0 !dbg !120 {
start:
  call void @func(i32 100), !dbg !122
  br label %bb1, !dbg !122

bb1:                                              ; preds = %start
  call void bitcast (void (i32)* @func to void ()*)(), !dbg !124
  br label %bb2, !dbg !124

bb2:                                              ; preds = %bb1
  ret void, !dbg !125
}
; ...

This is because even though the extern functions are declared in different modules, they end up clashing on symbol name anyway. I think this case deserves a specific diagnostic, too -- it's a desirable feature to be able to organise extern functions into modules, and this feels like it should work, but it doesn't (and can't). Like I said -- it's probably confusing.

Anyway, it's clear to me at this point that we were meant to be erroring on this case anyway, so I'll go ahead and just emit an error right away instead introducing a transitional warning.

@jumbatm
Copy link
Contributor Author

jumbatm commented Feb 23, 2020

@rustbot claim

@rustbot rustbot self-assigned this Feb 23, 2020
@jonas-schievink
Copy link
Contributor

cc #28179

@eddyb
Copy link
Member

eddyb commented Feb 23, 2020

I think this case deserves a specific diagnostic, too -- it's a desirable feature to be able to organise extern functions into modules, and this feels like it should work, but it doesn't (and can't).

But... it does work? The organization part anyway.

They are both func on the C side, and one of them has a wrong signature on the Rust side (or maybe even both would work? but this probably requires variadics).
The feature that causes the cast exists in case they were defined with compatible but different according to LLVM, signatures (blame this on the LLVM pointee type debacle).

You're welcome to use FnAbi to write a heuristic of compatibility before we even get to LLVM (or even try to merge signatures?), but given that C variadics are (only on some platforms?) compatible with non-variadic signatures, there might not be a way to do anything meaningful there.

@jumbatm
Copy link
Contributor Author

jumbatm commented Feb 23, 2020

But... it does work? The organization part anyway.

Sorry, I wasn't clear. Yes -- organising extern functions into modules does work, and is a valuable feature at that. I was referring to the case of having two extern symbols with the same name not resolving to two different functions. The diagnostic I was thinking of adding was to explain that extern declarations under different modules with different types don't resolve to different functions.

... given that C variadics are (only on some platforms?) compatible with non-variadic signatures, there might not be a way to do anything meaningful there.

I'm not convinced we should allow the bitcast just because C can do it or because it's a valid conversion in LLVM bitcode -- it's a too-delicate thing to support, and will vary between C standards and codegen backends. Indeed, it looks like lang team already had input on this (see #28179 (comment)) and are in favour of just erroring out on unmangled symbol clash instead of trying to "solve similar problems that might occur with C code and other stuff out of our purview".

If a user writes two different signatures for an extern function with the same name in two separate modules with, will they be expecting that it'll bitcast the function pointer if the types are different? Or rather, flip the question to be from a language design point of view -- should shadowing an extern fn / putting it under a different module with a different type be the way to cast extern function pointers? Even if we switched our codegen backend? It's a no, in my opinion -- it's too easy to invoke this behaviour "false-positively"; ie, it's more likely the user didn't intend for a bitcast to happen when they write code that introduces an extern declaration of the same name with different types, and more likely to be a mistake.

The feature that causes the cast exists in case they were defined with compatible but different according to LLVM, signatures (blame this on the LLVM pointee type debacle).

I'm not familiar with this - what was the pointee type debacle? I searched around but couldn't find anything about it.

@eddyb
Copy link
Member

eddyb commented Feb 24, 2020

I was referring to the case of having two extern symbols with the same name not resolving to two different functions. The diagnostic I was thinking of adding was to explain that extern declarations under different modules with different types don't resolve to different functions.

I should've been clearer on this: this seems like a concern that only makes sense in a vacuum, like these test cases.

Usually you use FFI to call into C code, which must already have functions. I don't see which two different functions anyone could have on the C side and expect to bind using just modules on the Rust side.

If you still have a relevant example in mind, it should start from C - even if not C definitions, at least declaration in a header.

Or rather, flip the question to be from a language design point of view -- should shadowing an extern fn / putting it under a different module with a different type be the way to cast extern function pointers?

First off, the bitcast is just an implementation detail (also, the bitcast is a noop at the LLVM level, it just changes high-level type information that shouldn't exist - more on that later).
The import should ideally be untyped until it's used (since that's what actually happens, at the linker level and whatnot), but LLVM has typed imports so all but one of the observed signatures require a bitcast.

It's nothing like shadowing, or any intention on the user to "cast", we could just as well implement this on top of LLVM by importing functions with some arbitrary type then casting all uses.

Also, the practical situation this arises in is two different crates binding the same C function, or two versions of the same crate (libc for example). All of the uses are ABI-compatible, but they don't happen to have the exact same LLVM signature.

Even if we switched our codegen backend?

A non-LLVM backend would ideally not make both of these mistakes at the same time:

  1. LLVM has typed imports (despite it not being meaningful to any linking model)
  2. LLVM ties the imported symbol name to the identity of import itself (meaning one type per symbol name due to 1.)

I'm not familiar with this - what was the pointee type debacle? I searched around but couldn't find anything about it.

Many years ago, LLVM realized pointer types were a mistake - more specifically, the T in T*, i.e. the type of the pointee, is unnecessary information that only complicates and slows down everything working with pointers, compared to only having T at use sites.
The proposed fix would be to have a single ptr type, IIRC, but progress has been slow.
Cranelift, for example, doesn't have this design mistake.

This is relevant here because if you have a C function:

struct Foo;
void foo_bar(struct Foo *foo);

Then the Rust signature of foo_bar will differ between every crate binding to it, because you would have a different Foo definition.
LLVM would also get different types, even if Foo is behind a pointer, but Cranelift or WASM wouldn't.

This is why I brought up FnAbi - it would let you check that the clashing imports actually have the same call ABI once you ignore pointee types and other irrelevant details.

@jumbatm
Copy link
Contributor Author

jumbatm commented Feb 24, 2020

Usually you use FFI to call into C code, which must already have functions. I don't see which two different functions anyone could have on the C side and expect to bind using just modules on the Rust side.

If you still have a relevant example in mind, it should start from C - even if not C definitions, at least declaration in a header.

Oh yeah, for sure - I'm not disagreeing with you that there isn't a way this could actually call two different functions (without mangling the names in some way so that they ended becoming two, distinct symbols anyway). However, I only proposed adding a diagnostic here because doing it incorrectly is too easy -- I only ran into this issue while I was working on this issue which came from real-world code.

Also, the practical situation this arises in is two different crates binding the same C function, or two versions of the same crate (libc for example). All of the uses are ABI-compatible, but they don't happen to have the exact same LLVM signature.
...
... the Rust signature of foo_bar will differ between every crate binding to it, because you would have a different Foo definition.

You're very right, and absolutely spot on when you observed I was only thinking of cases in a vacuum, like those test cases.

I agree -- across linked crates, you wouldn't want to break some project just because two different crates were developed to link against two slightly different but compatible extern library versions.

What about issuing a diagnostic when we see extern fn declarations with the same name but different signatures within the same crate? If a user really wanted to call a function pointer with some other signature, shouldn't it should be something they very plainly opt into at the callsite via an explicit transmute or something (no matter how cursed)? We'd have to eventually trust that at least one of the signatures the user declared was the correct one, but it's what we're already doing with extern function declarations anyway.

Many years ago, LLVM realized pointer types were a mistake - more specifically, the T in T*, i.e. the type of the pointee, is unnecessary information that only complicates and slows down everything working with pointers, compared to only having T at use sites.

Ohhh! You've just made it click for me why llvm::Module::getOrInsertFunction automatically inserts a compile-time bitcast if you pass a different type in! I always thought it was a weird design given LLVM IR's otherwise type-safety, but that makes a lot more sense now 😝 .

@eddyb
Copy link
Member

eddyb commented Feb 24, 2020

What about issuing a diagnostic when we see extern fn declarations with the same name but different signatures within the same crate?

You mean having the same Rust signature, specifically? Ignoring the FnAbi compatibility check I mentioned earlier?

It sounds doable, we could at the very least do a Crater check-only run with it.
I doubt it's going to be very useful outside of preventing some footguns but I don't have anything against it (if it's a lint, that is).

@jumbatm
Copy link
Contributor Author

jumbatm commented Feb 24, 2020

You mean having the same Rust signature, specifically? Ignoring the FnAbi compatibility check I mentioned earlier?

Yes, sorry -- I should have commented on this in my reply above. We can come back to it if need be, but I felt silently allowing conversions between compatible function pointer types based on ABI was too quiet and implicit a conversion for Rust, where we are usually loud and explicit. It's also dependent on the language's implementation (ABI being what it is) instead of the language's design: arguably, the design should inform the implementation, and not the only way around (well, ideally, anyway).

It sounds doable, we could at the very least do a Crater check-only run with it.
I doubt it's going to be very useful outside of preventing some footguns but I don't have anything against it (if it's a lint, that is).

Great - I'll implement this as a lint, and we'll see what we get from the Crater check run.

RalfJung added a commit to RalfJung/rust that referenced this issue May 21, 2020
Add a lint to catch clashing `extern` fn declarations.

Closes rust-lang#69390.

Adds lint `clashing_extern_decl` to detect when, within a single crate, an extern function of the same name is declared with different types. Because two symbols of the same name cannot be resolved to two different functions at link time, and one function cannot possibly have two types, a clashing extern declaration is almost certainly a mistake.

This lint does not run between crates because a project may have dependencies which both rely on the same extern function, but declare it in a different (but valid) way. For example, they may both declare an opaque type for one or more of the arguments (which would end up distinct types), or use types that are valid conversions in the language the extern fn is defined in. In these cases, we can't say that the clashing declaration is incorrect.

r? @eddyb
RalfJung added a commit to RalfJung/rust that referenced this issue May 29, 2020
Add a lint to catch clashing `extern` fn declarations.

Closes rust-lang#69390.

Adds lint `clashing_extern_decl` to detect when, within a single crate, an extern function of the same name is declared with different types. Because two symbols of the same name cannot be resolved to two different functions at link time, and one function cannot possibly have two types, a clashing extern declaration is almost certainly a mistake.

This lint does not run between crates because a project may have dependencies which both rely on the same extern function, but declare it in a different (but valid) way. For example, they may both declare an opaque type for one or more of the arguments (which would end up distinct types), or use types that are valid conversions in the language the extern fn is defined in. In these cases, we can't say that the clashing declaration is incorrect.

r? @eddyb
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue Jun 3, 2020
Add a lint to catch clashing `extern` fn declarations.

Closes rust-lang#69390.

Adds lint `clashing_extern_decl` to detect when, within a single crate, an extern function of the same name is declared with different types. Because two symbols of the same name cannot be resolved to two different functions at link time, and one function cannot possibly have two types, a clashing extern declaration is almost certainly a mistake.

This lint does not run between crates because a project may have dependencies which both rely on the same extern function, but declare it in a different (but valid) way. For example, they may both declare an opaque type for one or more of the arguments (which would end up distinct types), or use types that are valid conversions in the language the extern fn is defined in. In these cases, we can't say that the clashing declaration is incorrect.

r? @eddyb
@Elinvynia Elinvynia added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Jun 9, 2020
@LeSeulArtichaut LeSeulArtichaut added P-high High priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Jun 12, 2020
@LeSeulArtichaut
Copy link
Contributor

@bors bors closed this as completed in 228a0ed Jun 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-linkage Area: linking into static, shared libraries and binaries C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-high High priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants