Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extern "C" functions don't generate the same IR definitions as clang on x86, causing problems with cross-language LTO #102174

Open
glandium opened this issue Sep 23, 2022 · 16 comments
Labels
A-abi Area: Concerning the application binary interface (ABI). A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness O-x86_32 Target: x86 processors, 32 bit (like i686-*) P-high High priority regression-untriaged Untriaged performance or correctness regression. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@glandium
Copy link
Contributor

After llvm/llvm-project@6c8adc5, inlining in cross-language LTO happens in cases where it didn't happen before, including cases where things go very bad (more details in https://bugzilla.mozilla.org/show_bug.cgi?id=1789779#c7)

It seems to boil down to LLVM not liking that rust defines its extern "C" functions in significantly different ways than clang does for the C/C++ code that calls it. For example:

// rustc --emit=llvm-ir foo.rs --crate-type=lib --target=i686-unknown-linux-gnu
#[repr(C)]
pub struct Foo(pub u32);

#[no_mangle]
pub unsafe extern "C" fn foo(foo: Foo) -> u32 {
    foo.0
}

The caller:

// clang -o - -S foo.c -emit-llvm --target=i686-unknown-linux-gnu
struct Foo { int a; };

extern int foo(struct Foo f);

int var(struct Foo f) {
    return foo(f);
}

Rust defines the function as:

Foo = type { i32 }

; Function Attrs: nonlazybind uwtable
define i32 @foo(%Foo* byval(%Foo) %foo) unnamed_addr #0 {

while the caller C code does this:

%struct.Foo = type { i32 }
(...)
declare i32 @foo(i32) #1

The equivalent C code:

// clang -o - -S foo.c -emit-llvm --target=i686-unknown-linux-gnu
struct Foo { int a; };

int foo(Foo foo) {
    return foo.a;
}

defines the function as:

%struct.Foo = type { i32 }

; Function Attrs: noinline nounwind optnone uwtable
define dso_local i32 @foo(i32 %0) #0 {

(Ironically, rustc transforms a non-extern "C" version of the function to the same declaration as clang's)

Arguably, there's an underlying LLVM bug not being able to handle this case, which /could/ be considered fine, but I'm not sure it's supposed to be.

It's worth noting that rustc does not use a byval for e.g. x86_64-unknown-linux-gnu.

Cc: @nikic

@glandium glandium added the C-bug Category: This is a bug. label Sep 23, 2022
@nikic
Copy link
Contributor

nikic commented Sep 23, 2022

Rust seems to just mark all struct arguments as byval:

arg.make_indirect_byval();

While clang will pass certain aggregates directly: https://github.com/llvm/llvm-project/blob/1aaba40dcbe8fdc93d825d1f4e22edaa3e9aa5b1/clang/lib/CodeGen/TargetInfo.cpp#L1875-L1883

Another instance of @eddyb's favorite bug, #65111.

From your description, I don't really see how this used to work before the referenced LLVM change, as the ABIs are completely different -- I guess you just ended up interpreting the pointer address as the value contained in the struct and that happened to "work".

@nikic nikic added the I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness label Sep 23, 2022
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Sep 23, 2022
@nikic nikic added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Sep 23, 2022
@glandium
Copy link
Contributor Author

Before the change, the inlining wasn't happening so we ended up with a normal call, and the rust callee is just fine too because it compiled down to the proper ABI for the function. What screws everything is that inlining now happens.

@glandium
Copy link
Contributor Author

@Gankra did you run the abi checker for some i686 targets?

@inquisitivecrystal inquisitivecrystal added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-abi Area: Concerning the application binary interface (ABI). labels Sep 24, 2022
@eddyb
Copy link
Member

eddyb commented Sep 26, 2022

From your description, I don't really see how this used to work before the referenced LLVM change, as the ABIs are completely different -- I guess you just ended up interpreting the pointer address as the value contained in the struct and that happened to "work".

I believe %X* byval(%X) and %X are equivalent when stack-passing happens for the latter (IOW, my understanding is that byval forces stack-passing and then behaves like ref on Rust pattern bindings), though the pointer form may avoid some inefficiencies (and will always cause stack-passing).

IIRC x86 32-bit ABIs don't have any registers available for arguments "by default" (and the proliferation of e.g. thiscall, regcall, fastcall etc. is about adding registers at all).


Any ABI that doesn't use byval will effectively do quasi-SSA stack-passing once it runs out of registers, and so far I don't even know if we copied this from Clang or if rustc is uniquely inefficient for e.g. large arrays passed by value - the worst case would likely be one where the first few scalar components of the type still use registers, so the value is fragmented (with e.g. 8 registers reserved for argument-passing, I believe [usize; 9] will put the first 8 elements into one register each and then the 9th one on the stack).

Though this discussion is further complicated by:

  • non-byval make_indirect, which replaces values with pointers before register-vs-stack decisions
    • i.e after exhausting registers, the stack-passing is still only for a pointer to the value, without copying the value in memory (like byval would)
    • this is used more by newer ABIs (including extern "Rust" at the moment)
    • in that case, also using byval would result in worse IR, e.g. %X** byval(%X*) when passing X
      • this is arguably additional points for never using byval for stack-passing individual scalars (which appears to be what Clang is doing here to get the simpler i32)
  • cast_to, used to make the scalar components register-sized
    • e.g. turning [u8; 9] into [u32; 3] on a platform with 32-bit integers
    • this is why the example above is [usize; 9] and [u8; 9] (even though if LLVM ever saw [9 x i8], it would end up using one byte per register)

From a quick check, it seems (older) 32-bit ABIs are most susceptible to this inefficiency:

let align = arg.layout.align.abi.bytes();
let total = arg.layout.size;
arg.cast_to(Uniform { unit: if align <= 4 { Reg::i32() } else { Reg::i64() }, total });

if arg.layout.is_aggregate() {
let pad_i32 = !offset.is_aligned(align);
arg.cast_to_and_pad_i32(Uniform { unit: Reg::i32(), total: size }, pad_i32);

if arg.layout.is_aggregate() {
let pad_i32 = !offset.is_aligned(align);
arg.cast_to_and_pad_i32(Uniform { unit: Reg::i32(), total: size }, pad_i32);

Even a 64-bit one (I assume because it's still an old ABI):

arg.cast_to(Uniform { unit, total });

MIPS64 is also affected, but only for structs, not arrays (which makes me wonder if we're actually currently wrong for "structs that contain large arrays").


Given sufficient ABI information, LLVM itself should be able to "relax" byval but I doubt "explicit -> implicit" strength reductions like that would be uncontroversial.

This is one of my gripes with LLVM and ABI: it forces the frontend to think about ABI distinctions like:

  • "registers vs stack" (byval)
  • "register bank" (int/pointer scalars, float scalars, vectors)
  • "register assignment variations" (only some ABIs: inreg, swift{error,context}, etc.)

... but only some of the time, not all of the time, and that leads to both multiple ways to represent certain behaviors (like the one discussed here, with byval vs immediate-passing), and also being forced to use convoluted conversions (like what cast_to causes) in other cases (so LLVM optimizations like inlining have to work extra hard to remove the RustType -> [usize; N] -> RustType roundtrip).

I recently wrote something related to this (Ctrl+F "ABI mapping" in #100698 (comment)), and I still believe it would be one of the nicer solutions overall: decouple "stack slots and SSA scalars/vectors" call dataflow (friendlier to IPO in general - e.g. inlining, but not only) from exporting a function with specific register/stack assignments.

@nikic
Copy link
Contributor

nikic commented Sep 26, 2022

I believe %X* byval(%X) and %X are equivalent when stack-passing happens for the latter (IOW, my understanding is that byval forces stack-passing and then behaves like ref on Rust pattern bindings), though the pointer form may avoid some inefficiencies (and will always cause stack-passing).

Thanks, this is the bit I was missing! The ABIs are incompatible at the LLVM IR level, but become compatible post-legalization.

That does limit the issue to the cross-language LTO case only.

Given sufficient ABI information, LLVM itself should be able to "relax" byval but I doubt "explicit -> implicit" strength reductions like that would be uncontroversial.

LLVM will generally try to relax byval(%T) into individual SROAd %T arguments as part of argument promotion -- but this is only for functions with internal linkage, not for the case which you're talking about (where I doubt such a transform would be legal, because it makes the call UB at the IR level due to ABI mismatch).

The rest

Yes, LLVM's ABI handling is a big pile of poo ... your suggestion does sound reasonable on the surface, but I'm not volunteering to implement it :P

Regardless of the specific IR representation, something that would help a lot is to extract Clang's ABI calculation into something reusable by other frontends (discussed a bit at https://discourse.llvm.org/t/rfc-targetinfo-library/64342, though I doubt something will come of that). While rustc couldn't use that directly (due to other codegen backends), at least it could be used to our computed ABI is correct.

@eddyb
Copy link
Member

eddyb commented Sep 29, 2022

something that would help a lot is to extract Clang's ABI calculation into something reusable by other frontends

Heh, something like that is why rustc_target avoids dependencies on the rest of the compiler, and is generic over the concept of a semantic type and contexts that can provide more information about those semantic types (tho recent changes motivated by performance have been compromising a bit of it, you still only need an arena to use the facilities AFAICT).

My long-term hopes with rustc_target::abi::call was to split "classifying an input/output type" from "assign registers/stack to an input/output" (the latter could be done using declarative information in "classifications"), but I never got around to spend more time on it.

(the rough idea is that once you reduce a type and its layout to the set of possible ABI encodings that apply, it's a "resource calculus" from there on, like the "stateful" part of these algorithms ~always looks like "number of remaining registers per bank" or "stack offset/misalignment", and all the decisions are like "do I fit", i.e. "are there available resources for me to consume")

But for rustc_target to be really good at this, we'd have to go all the way to exact register/stack mappings (which e.g. the Cranelift backend could maybe use directly) and then "reverse engineer" that back into LLVM IR (and if LLVM is willing to expose its own decisions back to us, we could check our own understanding - right now AFAIK this would only be possible by abusing assembly or DWARF, and that's not great).

@apiraino apiraino added the I-compiler-nominated The issue / PR has been nominated for discussion during a compiler team meeting. label Oct 6, 2022
@apiraino
Copy link
Contributor

apiraino commented Oct 6, 2022

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-high regression-untriaged

@rustbot rustbot added P-high High priority regression-untriaged Untriaged performance or correctness regression. and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Oct 6, 2022
@Gankra
Copy link
Contributor

Gankra commented Oct 6, 2022

@Gankra did you run the abi checker for some i686 targets?

I have not, it's on my TODO list to figure out how platforms-that-aren't-host-but-can-run-on-host should be handled by abi-checker, since that's a lot easier than the full cross-compile case but also I'm not sure what problems come up if you build an x86_32 dylib and load it into an x64 process on the various platforms (esp when passing opaque pointers across the dylib boundary).

edit: actually maybe this would be fine if I just make you rebuild the harness itself too...

@Gankra
Copy link
Contributor

Gankra commented Oct 6, 2022

More broadly, I'm not sure abi-cafe would be able to catch this since the ultimate ABI is correct, it's just that llvm freaks out after cross-lang LTO. Maybe this is something abi-cafe should support checking? I'm not sure how hard it is to setup.

llvm has had other bugs like this before too, right? istr something wonky about inlining functions compiled for different "modes" like simd or thumb?

@bjorn3
Copy link
Member

bjorn3 commented Oct 6, 2022

But for rustc_target to be really good at this, we'd have to go all the way to exact register/stack mappings (which e.g. the Cranelift backend could maybe use directly)

Cranelift doesn't allow assigning exact registers. Cranelift is pretty much exactly on the same level as what rustc_target exposes. You tell it the calling convention (for assigning registers and stack locations) and a list of input and output AbiParam where each AbiParam has a primitive type (i8, i16, i32, i64, i128, f32, f64 or vector types), argument extension (none, zero extension, sign extension) and argument purpose (normal, struct argument of given size, struct return, and a couple of ones only meant for use by wasm runtimes). Rustc_target tells exactly how to perform the decomposition from rust values to Cranelift arguments and return values. Unlike LLVM it doesn't have struct types, so struct arguments just specify size (i believe it will also need to specify alignment in the future) and struct returns are performed by returning multiple values rather than a single struct value.

@bjorn3
Copy link
Member

bjorn3 commented Oct 6, 2022

@Gankra cargo run --target i686-unknown-linux-gnu works just fine on an x86_64 host and compiles the tests for i686. I get the following results:

Final Results:
[...]
ptr::c::rustc_calls_cc                   failed
ptr::c::cc_calls_rustc                   failed
[...]
ui128::c::rustc_calls_cc                 fixed (test was busted, congrats!)
ui128::c::cc_calls_rustc                 fixed (test was busted, congrats!)
ui128::c::cc_calls_cc                    failed
[...]

231 tests run - 44 passed, 0 busted, 5 failed, 182 skipped
Error: TestsFailed

The errors I get are:

compiling  ptr::c::rustc_calls_cc
running: "rustc" "--crate-type" "staticlib" "--out-dir" "target/temp/" "--target" "i686-unknown-linux-gnu" "-Cmetadata=ptr_c_rustc_caller" "generated_impls/rustc/ptr_c_rustc_caller.rs"
Failed to build test: rust compile error 
 
warning: associated function `new` is never used
  --> generated_impls/rustc/ptr_c_rustc_caller.rs:33:8
   |
33 |     fn new(val: i128) -> Self {
   |        ^^^
   |
   = note: `#[warn(dead_code)]` on by default

warning: associated function `new` is never used
  --> generated_impls/rustc/ptr_c_rustc_caller.rs:42:8
   |
42 |     fn new(val: u128) -> Self {
   |        ^^^

error: literal out of range for `usize`
   --> generated_impls/rustc/ptr_c_rustc_caller.rs:718:29
    |
718 |         let arg0: *mut () = 0x706050403020100 as *mut ();
    |                             ^^^^^^^^^^^^^^^^^
    |
    = note: the literal `0x706050403020100` (decimal `506097522914230528`) does not fit into the type `usize` and will become `50462976usize`
    = note: `#[deny(overflowing_literals)]` on by default

[...]

and

compiling  sysv_i128_emulation::handwritten::rustc_calls_cc
running: "rustc" "--crate-type" "staticlib" "--out-dir" "target/temp/" "--target" "i686-unknown-linux-gnu" "-Cmetadata=sysv_i128_emulation_handwritten_rustc_caller" "handwritten_impls/rustc
/sysv_i128_emulation_handwritten_rustc_caller.rs"
running: "x86_64-linux-gnu-gcc" "-O0" "-ffunction-sections" "-fdata-sections" "-fPIC" "-m32" "-march=i686" "-Wall" "-Wextra" "-o" "target/temp/handwritten_impls/cc/sysv_i128_emulation_handw
ritten_cc_callee.o" "-c" "handwritten_impls/cc/sysv_i128_emulation_handwritten_cc_callee.c"
cargo:warning=
cargo:warning=
cargo:warning=
cargo:warning=
cargo:warning=
cargo:warning=
cargo:warning=
cargo:warning=
cargo:warning=
cargo:warning=
cargo:warning=handwritten_impls/cc/sysv_i128_emulation_handwritten_cc_callee.c:36:25: error: expected declaration specifiers or ‘...’ before ‘__int128’   36 | typedef void (*functy1)(__int128, __int128, float, __int128, uint8_t, __int128);      |                         ^~~~~~~~handwritten_impls/cc/sysv_i128_emulation_handwritten_cc_callee.c:36:35: error: expected declaration specifiers or ‘...’ before ‘__int128’   36 | typedef void (*functy1)(__int128, __int128, float, __int128, uint8_t, __int128);      |                                   ^~~~~~~~handwritten_impls/cc/sysv_i128_emulation_handwritten_cc_callee.c:36:52: error: expected declaration specifiers or ‘...’ before ‘__int128’   36 | typedef void (*functy1)(__int128, __int128, float, __int128, uint8_t, __int128);      |                                                    ^~~~~~~~handwritten_impls/cc/sysv_i128_emulation_handwritten_cc_callee.c:36:71: error: expected declaration specifiers or ‘...’ before ‘__int128’
cargo:warning=
cargo:warning=
cargo:warning=
cargo:warning=
cargo:warning=
cargo:warning=
cargo:warning=
cargo:warning=
cargo:warning=
cargo:warning=   36 | typedef void (*functy1)(__int128, __int128, float, __int128, uint8_t, __int128);      |                                                                       ^~~~~~~~handwritten_impls/cc/sysv_i128_emulation_handwritten_cc_callee.c:39:27: error: expected declaration specifiers or ‘...’ before ‘__int128’   39 | void callee_native_layout(__int128* arg0, __int128* arg1, __int128* arg2) {      |                           ^~~~~~~~handwritten_impls/cc/sysv_i128_emulation_handwritten_cc_callee.c:39:43: error: expected declaration specifiers or ‘...’ before ‘__int128’   39 | void callee_native_layout(__int128* arg0, __int128* arg1, __int128* arg2) {      |                                           ^~~~~~~~handwritten_impls/cc/sysv_i128_emulation_handwritten_cc_callee.c:39:59: error: expected declaration specifiers or ‘...’ before ‘__int128’   39 | void callee_native_layout(__int128* arg0, __int128* arg1, __int128* arg2) {
[...]

@Gankra
Copy link
Contributor

Gankra commented Oct 7, 2022

Ok so I should just figure out a way to optionally flip on LTO for clang_calls_rustc, nice!

@apiraino
Copy link
Contributor

apiraino commented Oct 13, 2022

Discussed during T-compiler meeting (notes), namely this part:

we have an ABI that is incompatible with Clang's C ABI but LLVM is inlining our code together which violates the ABI constraints leading to the unsoundness. So one fix here is to get LLVM to not do that another is to change our ABI to match Clang's in some or all cases. There's also a possibility of teaching LLVM to do the inlining but in some way that respects ABI requirements?

@rustbot label -i-compiler-nominated

@apiraino apiraino removed the I-compiler-nominated The issue / PR has been nominated for discussion during a compiler team meeting. label Oct 13, 2022
@nikic
Copy link
Contributor

nikic commented Oct 20, 2022

I looked a bit closer into what clang is actually doing here. I referenced the wrong code above, the relevant bit is actually this: https://github.com/llvm/llvm-project/blob/1aaba40dcbe8fdc93d825d1f4e22edaa3e9aa5b1/clang/lib/CodeGen/TargetInfo.cpp#L1891-L1900

In words, what clang is going is to unpack structs <= 128 bits where all parts are 32-bit or 64-bit primitives (pointer, integer, float, enum). Some samples: https://clang.godbolt.org/z/bMjYfxnjj

This is being done on the premise that it does not change the final ABI, while making it easier to optimize the resulting IR. There's also this special gem hidden in there: https://github.com/llvm/llvm-project/blob/1aaba40dcbe8fdc93d825d1f4e22edaa3e9aa5b1/clang/lib/CodeGen/TargetInfo.cpp#L1487-L1492

That is, if it's C++ code it actually makes a difference whether you use struct or class to declare the structure -- the former will be unpacked, while the latter will be passed byval. Unless you're on Windows.

@nikic
Copy link
Contributor

nikic commented Oct 20, 2022

And another extra peculiar case: If we combine this with the fastcall calling convention, we get this: https://clang.godbolt.org/z/MW797ehfj Now we have an additional inreg i32 undef argument on the call, before the i32 that actually passes the unpacked struct.

@pnkfelix
Copy link
Member

Discussed during T-compiler P-high review.. There are a number of short-comings identified here. We should figure out how to best invest our effort to address them, even partially.

I think the next step is to write an MCP proposing some kind of ABI validation (see #65111), perhaps via abi-cafe

@wesleywiser wesleywiser added the O-x86_32 Target: x86 processors, 32 bit (like i686-*) label Nov 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-abi Area: Concerning the application binary interface (ABI). A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness O-x86_32 Target: x86 processors, 32 bit (like i686-*) P-high High priority regression-untriaged Untriaged performance or correctness regression. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

10 participants