Miscompilation with clashing symbol fn names of different types #69390

jumbatm · 2020-02-23T01:45:30Z

Related to #67946.

Currently, it's possible to perform a read to uninitialised memory by shadowing a function with an extern declaration with the same name. For example:

#[derive(Debug)]
struct Point {
    x: i32,
    y: i32
}

#[no_mangle]
fn bar(n: Point) {
    println!("{:?}", n);
}

fn main() {
    extern {
        fn bar();
    }
    unsafe {
        bar();
    }
}

(Playground)

Output:

Point { x: 1014863296, y: 1421860952 }

Errors:

   Compiling playground v0.0.1 (/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 0.39s
     Running `target/debug/playground`

You can see we're able to happily print garbage this way. This is because on symbol name clash (but with different types), instead of erroring or otherwise doing work to produce distinct symbols, the function is bitcast instead. See #67946 (comment). In our example above, it means we produce a call to bar as if it takes no arguments, meaning we read garbage if we access the function arguments. You can also produce garbage return values, by shadowing a function which has no return value with a declaration that does have a return value.

This issue has been assigned to @jumbatm via this comment.

The text was updated successfully, but these errors were encountered:

jumbatm · 2020-02-23T01:45:38Z

While working on #67946, I started to prepare a fix for this issue. However, my "fix" is just to error out on name clash in codegen. This has the potential to break user code (but for the greater good, as their code probably wasn't behaving the way they thought it would anyway). What should be done in this situation? Should we be issuing a warning for now that will become an error? Or issuing a hard error immediately okay?

nagisa · 2020-02-23T01:55:15Z

We used to have such a check already in the past, but the remnants of that functionality has been since removed in db477af#diff-1dee305d01351d452373973846b43671 cc @eddyb

eddyb · 2020-02-23T02:00:31Z

Oh whoops I could've inlined the check/fatal error.
How do we not have a test for this though?!

nagisa · 2020-02-23T02:09:18Z

#23011 added 7 of them and they still exist, but not sure why they don’t get triggered.

EDIT: I guess because CU partitioning emits them. That makes sense.

jumbatm · 2020-02-23T03:59:15Z

Odd thing is, we currently do have a similar test. The addition of my check actually causes it to fail, so it probably shouldn't have compiled to begin with.

In short form, the test is essentially

mod foo {
    extern {
        pub fn func(x: u32);
    }
}
mod bar {
    extern {
        pub fn func();
    }
}
fn main() {
    unsafe {
        foo::func(100);
        bar::func();
    }
}

(Playground - show LLVM IR)

The thing which is potentially confusing about this case is one might expect the two declarations to resolve to two different functions, because they're in separate mods. However, this case is actually miscompiled as well:

; ...
; playground::main
; Function Attrs: nonlazybind uwtable
define internal void @_ZN10playground4main17h37f684608cd1371bE() unnamed_addr #0 !dbg !120 {
start:
  call void @func(i32 100), !dbg !122
  br label %bb1, !dbg !122

bb1:                                              ; preds = %start
  call void bitcast (void (i32)* @func to void ()*)(), !dbg !124
  br label %bb2, !dbg !124

bb2:                                              ; preds = %bb1
  ret void, !dbg !125
}
; ...

This is because even though the extern functions are declared in different modules, they end up clashing on symbol name anyway. I think this case deserves a specific diagnostic, too -- it's a desirable feature to be able to organise extern functions into modules, and this feels like it should work, but it doesn't (and can't). Like I said -- it's probably confusing.

Anyway, it's clear to me at this point that we were meant to be erroring on this case anyway, so I'll go ahead and just emit an error right away instead introducing a transitional warning.

jumbatm · 2020-02-23T04:07:31Z

@rustbot claim

jonas-schievink · 2020-02-23T12:45:52Z

cc #28179

eddyb · 2020-02-23T16:00:23Z

I think this case deserves a specific diagnostic, too -- it's a desirable feature to be able to organise extern functions into modules, and this feels like it should work, but it doesn't (and can't).

But... it does work? The organization part anyway.

They are both func on the C side, and one of them has a wrong signature on the Rust side (or maybe even both would work? but this probably requires variadics).
The feature that causes the cast exists in case they were defined with compatible but different according to LLVM, signatures (blame this on the LLVM pointee type debacle).

You're welcome to use FnAbi to write a heuristic of compatibility before we even get to LLVM (or even try to merge signatures?), but given that C variadics are (only on some platforms?) compatible with non-variadic signatures, there might not be a way to do anything meaningful there.

jumbatm · 2020-02-23T22:34:54Z

But... it does work? The organization part anyway.

Sorry, I wasn't clear. Yes -- organising extern functions into modules does work, and is a valuable feature at that. I was referring to the case of having two extern symbols with the same name not resolving to two different functions. The diagnostic I was thinking of adding was to explain that extern declarations under different modules with different types don't resolve to different functions.

... given that C variadics are (only on some platforms?) compatible with non-variadic signatures, there might not be a way to do anything meaningful there.

I'm not convinced we should allow the bitcast just because C can do it or because it's a valid conversion in LLVM bitcode -- it's a too-delicate thing to support, and will vary between C standards and codegen backends. Indeed, it looks like lang team already had input on this (see #28179 (comment)) and are in favour of just erroring out on unmangled symbol clash instead of trying to "solve similar problems that might occur with C code and other stuff out of our purview".

If a user writes two different signatures for an extern function with the same name in two separate modules with, will they be expecting that it'll bitcast the function pointer if the types are different? Or rather, flip the question to be from a language design point of view -- should shadowing an extern fn / putting it under a different module with a different type be the way to cast extern function pointers? Even if we switched our codegen backend? It's a no, in my opinion -- it's too easy to invoke this behaviour "false-positively"; ie, it's more likely the user didn't intend for a bitcast to happen when they write code that introduces an extern declaration of the same name with different types, and more likely to be a mistake.

The feature that causes the cast exists in case they were defined with compatible but different according to LLVM, signatures (blame this on the LLVM pointee type debacle).

I'm not familiar with this - what was the pointee type debacle? I searched around but couldn't find anything about it.

eddyb · 2020-02-24T01:34:55Z

I was referring to the case of having two extern symbols with the same name not resolving to two different functions. The diagnostic I was thinking of adding was to explain that extern declarations under different modules with different types don't resolve to different functions.

I should've been clearer on this: this seems like a concern that only makes sense in a vacuum, like these test cases.

Usually you use FFI to call into C code, which must already have functions. I don't see which two different functions anyone could have on the C side and expect to bind using just modules on the Rust side.

If you still have a relevant example in mind, it should start from C - even if not C definitions, at least declaration in a header.

Or rather, flip the question to be from a language design point of view -- should shadowing an extern fn / putting it under a different module with a different type be the way to cast extern function pointers?

First off, the bitcast is just an implementation detail (also, the bitcast is a noop at the LLVM level, it just changes high-level type information that shouldn't exist - more on that later).
The import should ideally be untyped until it's used (since that's what actually happens, at the linker level and whatnot), but LLVM has typed imports so all but one of the observed signatures require a bitcast.

It's nothing like shadowing, or any intention on the user to "cast", we could just as well implement this on top of LLVM by importing functions with some arbitrary type then casting all uses.

Also, the practical situation this arises in is two different crates binding the same C function, or two versions of the same crate (libc for example). All of the uses are ABI-compatible, but they don't happen to have the exact same LLVM signature.

Even if we switched our codegen backend?

A non-LLVM backend would ideally not make both of these mistakes at the same time:

LLVM has typed imports (despite it not being meaningful to any linking model)
LLVM ties the imported symbol name to the identity of import itself (meaning one type per symbol name due to 1.)

I'm not familiar with this - what was the pointee type debacle? I searched around but couldn't find anything about it.

Many years ago, LLVM realized pointer types were a mistake - more specifically, the T in T*, i.e. the type of the pointee, is unnecessary information that only complicates and slows down everything working with pointers, compared to only having T at use sites.
The proposed fix would be to have a single ptr type, IIRC, but progress has been slow.
Cranelift, for example, doesn't have this design mistake.

This is relevant here because if you have a C function:

struct Foo;
void foo_bar(struct Foo *foo);

Then the Rust signature of foo_bar will differ between every crate binding to it, because you would have a different Foo definition.
LLVM would also get different types, even if Foo is behind a pointer, but Cranelift or WASM wouldn't.

This is why I brought up FnAbi - it would let you check that the clashing imports actually have the same call ABI once you ignore pointee types and other irrelevant details.

jumbatm · 2020-02-24T11:24:46Z

Usually you use FFI to call into C code, which must already have functions. I don't see which two different functions anyone could have on the C side and expect to bind using just modules on the Rust side.

If you still have a relevant example in mind, it should start from C - even if not C definitions, at least declaration in a header.

Oh yeah, for sure - I'm not disagreeing with you that there isn't a way this could actually call two different functions (without mangling the names in some way so that they ended becoming two, distinct symbols anyway). However, I only proposed adding a diagnostic here because doing it incorrectly is too easy -- I only ran into this issue while I was working on this issue which came from real-world code.

Also, the practical situation this arises in is two different crates binding the same C function, or two versions of the same crate (libc for example). All of the uses are ABI-compatible, but they don't happen to have the exact same LLVM signature.
...
... the Rust signature of foo_bar will differ between every crate binding to it, because you would have a different Foo definition.

You're very right, and absolutely spot on when you observed I was only thinking of cases in a vacuum, like those test cases.

I agree -- across linked crates, you wouldn't want to break some project just because two different crates were developed to link against two slightly different but compatible extern library versions.

What about issuing a diagnostic when we see extern fn declarations with the same name but different signatures within the same crate? If a user really wanted to call a function pointer with some other signature, shouldn't it should be something they very plainly opt into at the callsite via an explicit transmute or something (no matter how cursed)? We'd have to eventually trust that at least one of the signatures the user declared was the correct one, but it's what we're already doing with extern function declarations anyway.

Many years ago, LLVM realized pointer types were a mistake - more specifically, the T in T*, i.e. the type of the pointee, is unnecessary information that only complicates and slows down everything working with pointers, compared to only having T at use sites.

Ohhh! You've just made it click for me why llvm::Module::getOrInsertFunction automatically inserts a compile-time bitcast if you pass a different type in! I always thought it was a weird design given LLVM IR's otherwise type-safety, but that makes a lot more sense now 😝 .

eddyb · 2020-02-24T11:34:00Z

What about issuing a diagnostic when we see extern fn declarations with the same name but different signatures within the same crate?

You mean having the same Rust signature, specifically? Ignoring the FnAbi compatibility check I mentioned earlier?

It sounds doable, we could at the very least do a Crater check-only run with it.
I doubt it's going to be very useful outside of preventing some footguns but I don't have anything against it (if it's a lint, that is).

jumbatm · 2020-02-24T12:00:24Z

You mean having the same Rust signature, specifically? Ignoring the FnAbi compatibility check I mentioned earlier?

Yes, sorry -- I should have commented on this in my reply above. We can come back to it if need be, but I felt silently allowing conversions between compatible function pointer types based on ABI was too quiet and implicit a conversion for Rust, where we are usually loud and explicit. It's also dependent on the language's implementation (ABI being what it is) instead of the language's design: arguably, the design should inform the implementation, and not the only way around (well, ideally, anyway).

It sounds doable, we could at the very least do a Crater check-only run with it.
I doubt it's going to be very useful outside of preventing some footguns but I don't have anything against it (if it's a lint, that is).

Great - I'll implement this as a lint, and we'll see what we get from the Crater check run.

@eddyb

Add a lint to catch clashing `extern` fn declarations. Closes rust-lang#69390. Adds lint `clashing_extern_decl` to detect when, within a single crate, an extern function of the same name is declared with different types. Because two symbols of the same name cannot be resolved to two different functions at link time, and one function cannot possibly have two types, a clashing extern declaration is almost certainly a mistake. This lint does not run between crates because a project may have dependencies which both rely on the same extern function, but declare it in a different (but valid) way. For example, they may both declare an opaque type for one or more of the arguments (which would end up distinct types), or use types that are valid conversions in the language the extern fn is defined in. In these cases, we can't say that the clashing declaration is incorrect. r? @eddyb

@eddyb

Add a lint to catch clashing `extern` fn declarations. Closes rust-lang#69390. Adds lint `clashing_extern_decl` to detect when, within a single crate, an extern function of the same name is declared with different types. Because two symbols of the same name cannot be resolved to two different functions at link time, and one function cannot possibly have two types, a clashing extern declaration is almost certainly a mistake. This lint does not run between crates because a project may have dependencies which both rely on the same extern function, but declare it in a different (but valid) way. For example, they may both declare an opaque type for one or more of the arguments (which would end up distinct types), or use types that are valid conversions in the language the extern fn is defined in. In these cases, we can't say that the clashing declaration is incorrect. r? @eddyb

@eddyb

Add a lint to catch clashing `extern` fn declarations. Closes rust-lang#69390. Adds lint `clashing_extern_decl` to detect when, within a single crate, an extern function of the same name is declared with different types. Because two symbols of the same name cannot be resolved to two different functions at link time, and one function cannot possibly have two types, a clashing extern declaration is almost certainly a mistake. This lint does not run between crates because a project may have dependencies which both rely on the same extern function, but declare it in a different (but valid) way. For example, they may both declare an opaque type for one or more of the arguments (which would end up distinct types), or use types that are valid conversions in the language the extern fn is defined in. In these cases, we can't say that the clashing declaration is incorrect. r? @eddyb

LeSeulArtichaut · 2020-06-12T17:53:49Z

Assigning P-high as discussed as part of the Prioritization WG procedure.

jumbatm added the C-bug Category: This is a bug. label Feb 23, 2020

Centril added I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness A-linkage Area: linking into static, shared libraries and binaries T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 23, 2020

rustbot self-assigned this Feb 23, 2020

jumbatm mentioned this issue Apr 9, 2020

Add a lint to catch clashing extern fn declarations. #70946

Merged

Elinvynia added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Jun 9, 2020

LeSeulArtichaut added P-high High priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Jun 12, 2020

bors closed this as completed in 228a0ed Jun 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Miscompilation with clashing symbol fn names of different types #69390

Miscompilation with clashing symbol fn names of different types #69390

jumbatm commented Feb 23, 2020 •

edited by rustbot

Loading

jumbatm commented Feb 23, 2020

nagisa commented Feb 23, 2020 •

edited

Loading

eddyb commented Feb 23, 2020 •

edited

Loading

nagisa commented Feb 23, 2020 •

edited

Loading

jumbatm commented Feb 23, 2020 •

edited

Loading

jumbatm commented Feb 23, 2020

jonas-schievink commented Feb 23, 2020

eddyb commented Feb 23, 2020 •

edited

Loading

jumbatm commented Feb 23, 2020

eddyb commented Feb 24, 2020

jumbatm commented Feb 24, 2020

eddyb commented Feb 24, 2020

jumbatm commented Feb 24, 2020

LeSeulArtichaut commented Jun 12, 2020

Miscompilation with clashing symbol fn names of different types #69390

Miscompilation with clashing symbol fn names of different types #69390

Comments

jumbatm commented Feb 23, 2020 • edited by rustbot Loading

jumbatm commented Feb 23, 2020

nagisa commented Feb 23, 2020 • edited Loading

eddyb commented Feb 23, 2020 • edited Loading

nagisa commented Feb 23, 2020 • edited Loading

jumbatm commented Feb 23, 2020 • edited Loading

jumbatm commented Feb 23, 2020

jonas-schievink commented Feb 23, 2020

eddyb commented Feb 23, 2020 • edited Loading

jumbatm commented Feb 23, 2020

eddyb commented Feb 24, 2020

jumbatm commented Feb 24, 2020

eddyb commented Feb 24, 2020

jumbatm commented Feb 24, 2020

LeSeulArtichaut commented Jun 12, 2020

jumbatm commented Feb 23, 2020 •

edited by rustbot

Loading

nagisa commented Feb 23, 2020 •

edited

Loading

eddyb commented Feb 23, 2020 •

edited

Loading

nagisa commented Feb 23, 2020 •

edited

Loading

jumbatm commented Feb 23, 2020 •

edited

Loading

eddyb commented Feb 23, 2020 •

edited

Loading