Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Improve C types for cross-language LLVM CFI support #3296

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

rcvalle
Copy link
Member

@rcvalle rcvalle commented Jul 26, 2022

This RFC proposes improving C types to be able to identify C char and integer type uses at the time types are encoded for cross-language LLVM CFI support.

As the industry continues to explore Rust adoption, the absence of support for forward-edge control flow protection in the Rust compiler is a major security concern when migrating to Rust by gradually replacing C or C++ with Rust, and C or C++ and Rust -compiled code share the same virtual address space. Thus, support for forward-edge control flow protection needs to be added to the Rust compiler and is a requirement for large-scale secure Rust adoption. For more information about LLVM CFI and cross-language LLVM CFI support, see the design document in the tracking issue rust-lang/rust#89653.

Rendered

@Lokathor
Copy link
Contributor

I don't understand this RFC at all. There seems to be several options of how to do whatever you think this does. An RFC should usually be more concrete about what it's proposing.

If you want to have a discussion about developing a design you could try the rust Zulip, perhaps the T-lang stream would be the appropriate place to start:
https://rust-lang.zulipchat.com/

@rcvalle
Copy link
Member Author

rcvalle commented Jul 27, 2022

Sorry if it's unclear. Let me try to summarize:

LLVM uses type metadata to allow IR modules to aggregate pointers by their types. This type metadata is used by LLVM Control Flow Integrity to test whether a given pointer is associated with a type identifier (i.e., test type membership). Clang and the Rust compiler (for cross-language LLVM CFI support) use the Itanium C++ ABI mangling as type metadata identifiers for function pointers.

As a requirement for cross-language LLVM CFI support, the Rust compiler must be able to identify C char and integer type uses (which are type aliases) at the time types are encoded to be able to encode these correctly using the Itanium C++ ABI mangling. However, at the time types are encoded, all type aliases are already resolved to their respective ty::Ty type representations (i.e., their respective Rust aliased types), making it currently not possible to identify C char and integer type uses from their resolved types.

This RFC proposes improving C types to be able to identify C char and integer type uses at the time types are encoded for cross-language LLVM CFI support by either:

  1. creating a new set of transitional C types in core::ffi as user-defined types using repr(transparent) to be used at the FFI boundary (i.e., for extern function types with the "C" calling convention) when cross-language CFI support is needed (and taking the opportunity to consolidate all C types in core::ffi).
  2. changing the currently existing C types in std::os::raw to user-defined types using repr(transparent).
  3. changing C types to ty::Foreign and changing ty::Foreign to be able to represent them.
  4. creating a new ty::C for representing C types.

Option (1) is opt in for when cross-language CFI support is needed, and requires the user to use the new set of transitional C types for extern function types with the "C" calling convention.

Option (2), (3), and (4) are backward-compatibility breaking changes and will require changes to existing code that use C types.

There is also work in progress in rust-lang/rust#97974 for rust-lang/compiler-team#504 that may also be a potential alternative solution to this issue.

@Kixiron
Copy link
Member

Kixiron commented Jul 27, 2022

I'm not sure how viable that is since it sounds like TBAA, and Rust explicitly doesn't have TBAA

@Lokathor
Copy link
Contributor

Uh, step 2 is... probably not possible. even with an edition change, and added lang support for new types of literals, it's probably simply too much breakage and mismatch that would occur between older code using aliases and newer code using the newtypes.

@comex
Copy link

comex commented Jul 27, 2022

Why can't Rust types just be mapped to the corresponding C types for CFI purposes, e.g. i32 would be mapped to C int?

Is the issue with pairs such as {char, signed char} and {long, long long}, where they're distinct C types but may map to the same Rust type?

If so… is it really necessary for LLVM CFI to distinguish those types in the first place? They need to be distinguished in standard ABI mangling since you can have, e.g., void foo(char) and void foo(signed char) as two distinct function overloads. But there's no fundamental reason they need to be distinguished for CFI, and the security benefit of doing so seems extremely marginal.

@joshtriplett joshtriplett added the T-lang Relevant to the language team, which will review and decide on the RFC. label Jul 28, 2022
@rcvalle
Copy link
Member Author

rcvalle commented Jul 28, 2022

This is a requirement for cross-language LLVM CFI support, which is forward-edge control flow protection for C or C++ and Rust -compiled code "mixed binaries" (i.e., for when C or C++ and Rust -compiled code share the same virtual address space), because Clang distinguishes between those types by using the Itanium C++ ABI mangling as type metadata identifiers for function pointers.

I think we should provide as much comprehensible protection as possible as long as it doesn't introduce any constraints that the language itself doesn't have (besides the intended purpose). In the future, I'd like to go even further and implement something like:

Given three functions with the same return and parameter types A, B, and C, and an indirect call at location L to a function with the same return and parameter types as A, B, and C. If it's determined that it isn't possible for C to be a called at location L, to group C in a different group than A and B, allowing only the members of the group A and B belong to be called at location L.

But this is outside the scope of this RFC.

@comex
Copy link

comex commented Jul 29, 2022

Clang distinguishes between those types by using the Itanium C++ ABI mangling as type metadata identifiers for function pointers.

It does today, but that could change. Though perhaps Android is relying on the existing CFI ABI being stable? I have no experience with it.

I think we should provide as much comprehensible protection as possible as long as it doesn't introduce any constraints that the language itself doesn't have (besides the intended purpose).

Well, that is what you are proposing: introducing a constraint into Rust that it distinguish pairs of types that it currently does not distinguish (and has no need to distinguish). Supporting CFI is highly desirable, but Rust having no input into what the CFI ABI looks like, despite this ABI only being developed recently, would be an unfortunate state of affairs.

Given three functions with the same return and parameter types A, B, and C, and an indirect call at location L to a function with the same return and parameter types as A, B, and C. If it's determined that it isn't possible for C to be a called at location L, to group C in a different group than A and B, allowing only the members of the group A and B belong to be called at location L.

Are you talking about something like C++ vtables, where you have more information than just the function signature, or are you talking about some sort of IR-level static analysis? Regardless, this sort of thing would likely be fine for Rust. Regarding C++ vtables, Rust doesn't provide built-in compatibility with them, nor is there much Rust code that generates or accesses them by hand (AFAIK), so any measures taken specifically for them wouldn't break much Rust code. Meanwhile, Rust's own trait object vtables have an intentionally unstable layout and could perhaps opt into a similar mechanism in the future. Regarding static analysis, that presumably wouldn't break anything as long as it didn't make any unwarranted C-specific assumptions.

@rcvalle
Copy link
Member Author

rcvalle commented Aug 1, 2022

Neither the current implementation nor what this RFC proposes introduces any constraints to the language (besides forward-edge control flow protection). The language already distinguishes between both its explicitly-sized integer types (i.e., i8, i16, i32, …) and C abstract integer types (i.e., c_char, c_short, c_long, …), which actual sizes are implementation defined and may vary across different systems, and defines a mapping between these. The issue is that this mapping is done using type aliases, and type aliases don't live long enough throughout the compilation process to make it possible to identify C char and integer type uses from their resolved types--this is a known issue, see rust-lang/compiler-team#504.

@talchas
Copy link

talchas commented Aug 2, 2022

It would definitely be a breaking change to the language, though not one that is likely to come up fully inside of rust:

#[no_mangle]
extern "C" fn foo(x: c_int) { ... }

#[cfg(int_is_i32)]
extern {
    fn foo(x: i32);
}

is entirely correct today. More realistic versions would be being lazy in handwriting FFI and using i32 instead of c_int and so on (maybe combined with the technical-UB of using *mut u8 for char*).

Also I'm not sure about the existing C/LLVM CFI info - what gets generated for int32_t? That definitely needs to correspond to rust i32, not just c_int.

@rcvalle
Copy link
Member Author

rcvalle commented Aug 2, 2022

To enable CFI and cross-language CFI to work properly, the correct/corresponding C types would need to (and arguably should anyway) be used for extern "C" function declarations, as is explicitly documented in the design document in the tracking issue rust-lang/rust#89653 and this RFC:

  • be able to identify C char and integer type uses at the time types are encoded.
  • not assume that C char and integer types and their respective Rust aliased types can be used interchangeably when forward-edge control flow protection is enabled, at least not at the FFI boundary.

But their resolved types could still be used as arguments at call sites without any problems.

@talchas
Copy link

talchas commented Aug 2, 2022

Yes, my point is that this is absolutely a breaking change and what it buys is "able to implement an existing API that is way more C++-centric than it should be"

@Lokathor
Copy link
Contributor

Lokathor commented Aug 2, 2022

Note: in the presence of cross-language LTO you already have to have the Rust side use the precisely correct definitions as the C side or it can trigger UB.
rust-lang/unsafe-code-guidelines#198
Maybe this is arguably a bug, but there it is.

@talchas
Copy link

talchas commented Aug 2, 2022

While related, that still isn't i32-vs-c_int, just issues with types that are a little more special in rust in general.

@comex
Copy link

comex commented Aug 2, 2022

It would definitely be a breaking change to the language, though not one that is likely to come up fully inside of rust:

Or just

extern "C" { fn foo(x: c_int); }foo(42i32);

Also I'm not sure about the existing C/LLVM CFI info - what gets generated for int32_t? That definitely needs to correspond to rust i32, not just c_int.

In C++, int32_t is on most platforms a type alias for int, so it is presumably treated the same as int.

@comex
Copy link

comex commented Aug 2, 2022

Note: in the presence of cross-language LTO you already have to have the Rust side use the precisely correct definitions as the C side or it can trigger UB.
rust-lang/unsafe-code-guidelines#198
Maybe this is arguably a bug, but there it is.

However, this doesn't cause a problem for pairs like long and long long or char and signed char because they are the same type at the LLVM IR level. The LLVM-IR-level type system is separate from the type annotations used for CFI support, which are applied by the frontend.

(As for i32 versus c_int, I don't think that is a problem in the first place. There is no ambiguity so there is no need to differentiate them, or benefit in doing so.)

@comex
Copy link

comex commented Aug 2, 2022

Neither the current implementation nor what this RFC proposes introduces any constraints to the language (besides forward-edge control flow protection). The language already distinguishes between both its explicitly-sized integer types (i.e., i8, i16, i32, …) and C abstract integer types (i.e., c_char, c_short, c_long, …), which actual sizes are implementation defined and may vary across different systems, and defines a mapping between these.

Rust code that assumes that e.g. c_long equals i64 is not portable, but it's valid platform-specific code.

The issue is that this mapping is done using type aliases, and type aliases don't live long enough throughout the compilation process to make it possible to identify C char and integer type uses from their resolved types--this is a known issue, see rust-lang/compiler-team#504.

The known issue is the compiler not maintaining information about type alias use in order to produce better diagnostics and documentation. In that case, the added complexity of distinguishing a type from its alias is only internal to the compiler and not present in the normative model of the type system. This is very different from changing the type system so that two types that used to be the same are now different. The added complexity is itself a small cost to the language, but more importantly, the backwards compatibility break is a huge cost.

Call it a 'constraint' or not, but it's highly undesirable.

But their resolved types could still be used as arguments at call sites without any problems.

If you're suggesting that arguments of type, e.g. i8, if passed as arguments to these functions, would automatically be coerced to a new type, e.g. a new repr(transparent) c_char type, this kind of coercion would be a new behavior that exists nowhere else in the language, further increasing the added language complexity.

@workingjubilee
Copy link
Contributor

In Rust we want to be able to have actual provenance tracking of various pointer values. In C, it is considered acceptable to cast these back and forth from integers, and so code will pass pointers as integers. In Rust, we increasingly think it is not. So Rust code that wants to have things like "accurately tracked pointers", which is very proximal to "control flow integrity", may decide to override the definition and link against a mildly fictional version of external C code, redeclaring a C uintptr_t as *mut T instead. In doing so, we can get much better sanitization up to the FFI boundary using the MIR interpreter.

This tends to work fine, surprisingly, because most platforms have C calling conventions that don't actually care about this detail. On ones that do, like CHERI / Morello, they also disagree with C at large and implicitly treat uintptr_t as a pointer, or in the case of Motorola 68000, the C implementation for that platform is itself prone to getting this wrong.

So "Rust code should use exactly the same C definitions" is not a very strong "should" to me. The C code should be correct. If the C code is wrong, then Rust programmers who know what they're doing (calling any extern "C" function is unsafe, so anyone who does the preceding is shouldering personal responsibility for any UB it induces, I am not a lawyer, consult your local doctor before writing C FFI, side effects may include headaches, dizziness, and nausea) should be at liberty to correct it.

We could, of course, simply write a C code shim to accomplish the same thing, but it seems a rather laborious effort to produce what is essentially no-op code.

@Gankra
Copy link
Contributor

Gankra commented Aug 2, 2022

override the definition and link against a mildly fictional version of external C code, redeclaring a C uintptr_t as *mut T instead

See rust-lang/rust#95496 for more details -- some system C APIs pun integers and pointers, essentially expecting you to invoke them as so: libc::prctl(PR_SET_NAME, name.as_ptr() as libc::c_ulong, 0, 0, 0). This completely scrambles the ability for tools to reason about pointer escapes so we've been recommending overruling the random nonsense that these kinds of APIs insist on and making them more type accurate.

Similarly, when it comes to tools like bindgens, I've long argued that it's preferable to declare extern APIs in the most native rust form allowed, because rust is simply better at defining FFI interfaces than C or C++, and it doesn't make sense to throw out that information on both sides.

extern fn(val1: Option<&u32>,  val2: &mut u64)

is much more useful signature than

extern fn(val1: *mut u32, val2: *mut u64)

@rcvalle
Copy link
Member Author

rcvalle commented Aug 2, 2022

Or just

extern "C" { fn foo(x: c_int); }foo(42i32);

As long as the extern "C" function declaration continues having the correct/corresponding C types, this example would continue to work without any problems.

The known issue is the compiler not maintaining information about type alias use in order to produce better diagnostics and documentation.

These were cases were the issue manifested, not the issue itself. The issue is indeed the type alias information not living long enough throughout the compilation process (i.e., lack of a ty::Ty kind for type aliases and a DefId).

In that case, the added complexity of distinguishing a type from its alias is only internal to the compiler and not present in the normative model of the type system. This is very different from changing the type system so that two types that used to be the same are now different. The added complexity is itself a small cost to the language, but more importantly, the backwards compatibility break is a huge cost.

And they'll continue to do so. They will continue to be able to be used interchangeably anywhere else in Rust code but at the FFI boundary (i.e., at the extern "C" function declarations). I.e., an extern "C" { fn foo(bar: i32); } will be different than a void foo(int bar) at the FFI boundary (or when called back from C code if passed as a callback, and vice versa), when CFI is enabled.

If you're suggesting that arguments of type, e.g. i8, if passed as arguments to these functions, would automatically be coerced to a new type, e.g. a new repr(transparent) c_char type, this kind of coercion would be a new behavior that exists nowhere else in the language, further increasing the added language complexity.

Since they would need to be used at the FFI boundary only (i.e., at the extern "C" function declarations) there is no new behavior other than the already existing one for repr(transparent). https://doc.rust-lang.org/nomicon/other-reprs.html#reprtransparent says:

Also, passing the struct through FFI where the inner field type is expected on the other side is guaranteed to work. In particular, this is necessary for struct Foo(f32) to always have the same ABI as f32.

This is for option (1), which would add new repr(transparent) types to be used at the FFI boundary (i.e., at the extern "C" function declarations) when cross-language CFI support is needed, and option (2), which would change the currently existing C types to repr(transparent) types. If the work in progress in rust-lang/rust#97974 for rust-lang/compiler-team#504 becomes the solution to this issue, these wouldn't even be necessary, and the currently existing C types will be handled similarly to how c_void and the () type are in the current implementation.

@workingjubilee
Copy link
Contributor

If the work in progress in rust-lang/rust#97974 for rust-lang/compiler-team#504 becomes the solution to this issue, these wouldn't even be necessary, and the currently existing C types will be handled similarly to how c_void and the () type are in the current implementation.

That compiler change doesn't imply any user-facing changes, as the changes are for rustdoc support and getting more information, and rustdoc can run without doing codegen. So I don't expect it will, really. What you are proposing remains a user-facing change, even if it only triggers with CFI enabled: setting aside the deliberate disrespect for C's type definitions that we already express, having to define an interface using its type alias instead of the original type is neither necessary nor correct. And to Rust, c_char is an alias of either i8 or u8. Even if it only breaks with CFI enabled, this causes a language split, where a previously irrelevant type alias is now load-bearing.

That's not acceptable.

@eddyb
Copy link
Member

eddyb commented Aug 2, 2022

I suspect that assigning different semantics based on whether a function signatures used a type alias or the original type¹ (that the type alias is aliasing), is a complete non-starter and some of the hypothetical discussion here might be a waste of everyone's time if that's the case. cc @rust-lang/types

¹ (two types, which even if the compiler may eventually be able to represent differently, exhibit complete type equality even in the absence of subtyping, i.e. in invariant positions)


Oh and it's possible to use generics with FFI:

unsafe extern "C" fn callback<T>(input: T, user_data: *mut c_void) {...}

let user_data = &mut my_data as *mut _ as *mut c_void;
foo_config_query_int("timeout", callback::<c_int>, user_data);
foo_config_query_int32("counter32", callback::<i32>, user_data);
foo_config_query_float("ratio", callback::<f32>, user_data);

callback::<c_int> and callback::<i32> are either the exact same type (today) or completely equal types (in the future, see ¹ above) - either way, it's impossible to tell them apart:

let callback_c_int_or_i32 = [callback::<c_int>, callback::<i32>];
assert_eq!(0, size_of_val(&callback_c_int_or_i32)); // No function pointers, ZST function types.
let fn_ptr: fn(_, _) = callback_c_int_or_i32[i]; // The result cannot soundly depend on `i`.

And because they're not individual declarations (where you could argue that there's some sneaky non-local-quasi-syntactic-sugar going on), but part of the typesystem, you simply cannot make two symbols out of them.

(i.e. this case I am certain of, unlike my "cc @rust-lang/types" above for the weaker one)


Personally I still don't understand how any of this could possibly have ever worked for anything other than C++ vtables and similarly restricted things. Is the answer "we patch our dependencies" and/or "we turn it off for large parts of the build"? Or maybe "there's a dozen indirect calls and/or they barely use any types"?

@rcvalle
Copy link
Member Author

rcvalle commented Aug 2, 2022

That compiler change doesn't imply any user-facing changes, as the changes are for rustdoc support and getting more information, and rustdoc can run without doing codegen. So I don't expect it will, really. What you are proposing remains a user-facing change, even if it only triggers with CFI enabled: setting aside the deliberate disrespect for C's type definitions that we already express, having to define an interface using its type alias instead of the original type is neither necessary nor correct. And to Rust, c_char is an alias of either i8 or u8. Even if it only breaks with CFI enabled, this causes a language split, where a previously irrelevant type alias is now load-bearing.

The changes are not for rustdoc support only--it adds a proper ty::Ty kind (i.e., TyAlias) and a DefId for type aliases, and it'll very likely become a solution to this issue (see rust-lang/rust@11a9bbf#diff-0cc45ef317b8526bab4aad2b698ce2c63741ed8fe9c3d96497fc73a4fd17d419), where at this point C char and integer type uses can be identified from their resolved types, and the currently existing C types could be handled similarly to how c_void and the () type are in the current implementation.

I'm afraid that if using the correct/corresponding C types for extern "C" function declarations1 is a blocker, when CFI is enabled and for cross-language CFI to work properly, cross-language CFI might not be possible (at least not without making changes to how Clang encodes C char and integer types for CFI).

Footnotes

  1. Which arguably should be used anyway, since their actual sizes are implementation defined and may vary across different systems, and the types are aliased correctly accordingly.

@rcvalle
Copy link
Member Author

rcvalle commented Aug 2, 2022

Oh and it's possible to use generics with FFI:

unsafe extern "C" fn callback<T>(input: T, user_data: *mut c_void) {...}

let user_data = &mut my_data as *mut _ as *mut c_void;
foo_config_query_int("timeout", callback::<c_int>, user_data);
foo_config_query_int32("counter32", callback::<i32>, user_data);
foo_config_query_float("ratio", callback::<f32>, user_data);

callback::<c_int> and callback::<i32> are either the exact same type (today) or completely equal types (in the future, see ¹ above) - either way, it's impossible to tell them apart:

let callback_c_int_or_i32 = [callback::<c_int>, callback::<i32>];
assert_eq!(0, size_of_val(&callback_c_int_or_i32)); // No function pointers, ZST function types.
let fn_ptr: fn(_, _) = callback_c_int_or_i32[i]; // The result cannot soundly depend on `i`.

The encoding is done after types are monomorphized (and these examples are supported). See https://github.com/rust-lang/rust/blob/master/src/test/codegen/sanitizer-cfi-emit-type-metadata-id-itanium-cxx-abi.rs for some examples of how function items are encoded.

@workingjubilee
Copy link
Contributor

Even if we could make everything work after resolving type generics and the like, that

#[cfg(target_arch = "x86_64")]
extern "C" {
   fn strncpy(dest: *mut i8, src: *const i8, count: usize) -> *mut i8;
}

#[cfg(target_arch = "aarch64")]
extern "C" {
   fn strncpy(dest: *mut u8, src: *const u8, count: usize) -> *mut u8;
}

works in user code is not a trivial feature of Rust. It is part of the language's definition. Because extern "C" fn is not merely a syntax for FFI bindings against C code. It is Rust code. It declares its validity to Rust and is typechecked in Rust. To compile Rust code says it has a sensical execution flow. To then interrupt its execution via failing to deliver Rust control flow to the target symbol is to break working Rust code.

Our only job, mind, is to deliver the flow of execution to the extern symbol and to have a sufficiently matching calling convention such that the C function can do its job. If the implementation is internally malicious, we don't care. You called C code. What were you hoping for? A pat on the back? But it must not induce linker errors, and CFI schema that Rust supports must not preempt us calling the symbol based on that definition if the call is valid in Rust. That types will remain valid is in fact part of the stability promise moreso than any other behavioral promise.

Opting in to type system changes would cause deep splits between minor versions; it would also create a high maintenance burden in the compiler, since both older and newer versions would have to be supported.

We already have several mitigation measures, such as opt-out or temporary deprecation, that can be used to ease the transition around a soundness fix. Moreover, separating out new type rules so that they can be "opted into" can be very difficult and would complicate the compiler internally; it would also make it harder to reason about the type system as a whole.

We can argue whether or not this should work:

extern "C" fn some_fn(val1: Option<&u32>, val2: &mut u64);

Though it currently does, so it would require more than a small discussion to change that.

But to change the language such that

#[cfg(target_arch = "x86_64")]
type c_char = i8;
#[cfg(target_arch = "aarch64")]
type c_char = u8;

#[cfg(any(target_arch = "aarch64", target_arch = "x86_64"))]
type size_t = usize;

extern "C" {
   fn strncpy(dest: *mut c_char, src: *const c_char, count: size_t) -> *mut c_char;
}

is "more correct" in some way in order to supply a particular kind of integration with the LLVM toolchain's preferred C compilers, and thus everyone will be told they must write that, and not the other thing, is to induce a breaking change.

Which arguably should be used anyway, since their actual sizes are implementation defined and may vary across different systems, and the types are aliased correctly accordingly.

And according to Rust, this is incorrect, because we do not actually support arbitrary C implementations. We support the target platform's C dialect. That's what extern "C" and repr(C) actually mean. That's why we use MSVC and GCC to do linkage instead of doing it ourselves: because they know where their system libraries are and how to link Rust code against them.

@rcvalle
Copy link
Member Author

rcvalle commented Aug 3, 2022

Let me provide some background information from the design document, so the costs (and the risks) vs benefits of the options that are being proposed may be fully understood:

With the increasing popularity of Rust both as a general purpose programming language and as a replacement for C and C++ because of its memory and thread safety guarantees, an increasing number of companies and projects are adopting or migrating to Rust. One of the most common paths to migrate to Rust is to gradually replace C or C++ with Rust in a program written in C or C++.

Rust provides interoperability with foreign code written in C via Foreign Function Interface (FFI). However, foreign code does not provide the same memory and thread safety guarantees that Rust provides, and is susceptible to memory corruption and concurrency issues. Therefore, it is generally accepted that linking foreign C and C++ -compiled code into a program written in Rust may degrade the security of the program.123

While it is also generally believed that replacing sensitive C or C++ with Rust in a program written in C or C++ improves the security of the program, Papaevripides and Athanasopoulos4 demonstrated that it is not always the case, and linking foreign Rust-compiled code into a program written in C or C++ with modern exploit mitigations, such as control flow protection, may actually degrade the security of the program, mostly due to the absence of support for these exploit mitigations in the Rust compiler, mainly forward-edge control flow protection.

Since then, it has been discussed567, and just recently it was formalized8 as a new class of attack (i.e., cross-language attacks). And as companies continue to explore Rust adoption, it has been a major security concern (and a blocker) when migrating to Rust by gradually replacing C or C++ with Rust, and C or C++ and Rust -compiled code share the same virtual address space. It was also one of the major reasons that initiatives such as Rust GCC--which I also fully support--were funded5.

It is my view that, if the Rust compiler is providing interoperability with a language, and wants to provide a path towards Rust adoption by the industry, it must not degrade the security of the program it is being integrated into. Therefore, the cost of using the correct/corresponding C types (either as the way it currently is as aliases, as a new opt-in set of repr(transparent) types, or changing the current aliases to repr(transparent) types) for extern "C" function declarations (and let me add: only when they're used for FFI), only when CFI is enabled, and only when cross-language CFI is needed, seems pretty reasonable to me.

Footnotes

  1. https://stanford-cs242.github.io/f17/assets/projects/2017/songyang.pdf

  2. https://github.com/rust-lang/rust/files/4723836/Control.Flow.Guard.for.Rust.pdf

  3. https://github.com/rust-lang/rust/files/4723840/Control.Flow.Guard.for.LLVM.pdf

  4. https://dl.acm.org/doi/pdf/10.1145/3418898

  5. https://opensrcsec.com/open_source_security_announces_rust_gcc_funding 2

  6. https://twitter.com/spendergrsec/status/1441795010156843019

  7. https://twitter.com/spendergrsec/status/1501143126408368133

  8. https://www.ndss-symposium.org/wp-content/uploads/2022-78-paper.pdf

@Lokathor
Copy link
Contributor

Lokathor commented Aug 3, 2022

This is all very compelling, and we do love security, but breaking old code is simply not acceptable. Not at all.

So, your proposals have to be some path forward that do not break any previous code. Unfortunately, we've got a lot of FFI stuff already designed into the language, and even a small change could end up breaking stuff. I won't speak for anyone else, but I'm not really seeing how to proceed given the absolute rule that we can't break old code.

For example, would just an entirely new set of types be acceptable? Under a core::ffi::cfi sub-module perhaps? Then code that wants to opt in to CFI can clearly and specifically do so by using alternative types.

@rcvalle
Copy link
Member Author

rcvalle commented Aug 3, 2022

For example, would just an entirely new set of types be acceptable? Under a core::ffi::cfi sub-module perhaps? Then code that wants to opt in to CFI can clearly and specifically do so by using alternative types.

This is what is proposed in option (1).

@comex
Copy link

comex commented Dec 7, 2022

Neat; that would make things much simpler if adopted. Do you have a plan for the other half of the problem, struct tags? Off-hand, that seems much more straightforward to solve. In most cases, it would be good enough to just use the Rust struct name, since most FFI bindings to C code use the original struct names... but I have a vague sense that it's semi-common to not use the original names, so perhaps there should be an attribute to override the name for CFI purposes. An override would also presumably be needed for structs whose CFI names are inside a C++ namespace. In any case, I'm curious what you have in mind.

@rcvalle
Copy link
Member Author

rcvalle commented Dec 7, 2022

Neat; that would make things much simpler if adopted. Do you have a plan for the other half of the problem, struct tags? Off-hand, that seems much more straightforward to solve. In most cases, it would be good enough to just use the Rust struct name, since most FFI bindings to C code use the original struct names... but I have a vague sense that it's semi-common to not use the original names, so perhaps there should be an attribute to override the name for CFI purposes. An override would also presumably be needed for structs whose CFI names are inside a C++ namespace. In any case, I'm curious what you have in mind.

Yes, the Rust compiler is already able to handle it, and the way it works today is exactly as you described but it encodes it the same as Clang (i.e., <length><name>, where <name> is <unscoped-name>, which means no namespace information is included). I'm definitely planning to provide attributes to override type and function CFI encodings so the user can have more control over type and function encodings and also function grouping (e.g., by using custom CFI type ids for functions).

@clarfonthey
Copy link
Contributor

So, I've mostly avoided adding feedback on this RFC because my immediate response is that Rust does the right thing and C does the wrong thing here. However, looking at the latest iteration after lots of feedback, I think that this is a lot closer to something we'd like to actually do: give a proper FFI guarantee on the c_* types that LLVM can actually detect.

However, a more recent Rust PR makes me question whether we'll even need this, and if we can just make this work without having to change the actual type aliases.

rust-lang/rust#97974 updates the IR types in the compiler so that aliases are tracked throughout the entire compilation process, with the specific goal of fixing some of the issues with rustdoc (which currently has a lot of trouble due to the fact that type aliases are erased very early on). If this or something like it were merged, we could theoretically know at the very end of compilation that the c_* types were genuinely used in the signature for a function in a way that LLVM can track.

There would still need to be lang items for each of the C types to ensure that the compiler can track them correctly, but I think that this would be much more palatable than having to wrap each of the types in structs. I think that being forced to wrap types before sending them across an FFI boundary is a bit more friction than we want, since right now we can just coerce integer literals into C types.

Note, I haven't read through the whole thread, just read the latest version of the RFC. But I think that we really should be aiming for a situation where:

  1. Rust can distinctly tell if an FFI function is defined with a C type alias or a primitive integer type.
  2. Callers of these FFI functions don't have to change their code to make it work correctly.
  3. C types are still aliases, meaning that we don't have to make separate impls for them and can still pass them to functions that expect ordinary integers.

@rcvalle
Copy link
Member Author

rcvalle commented Dec 7, 2022

@clarfonthey This is the alternative from the RFC I was originally going for, but the feedback in this PR that aliases shouldn't be semantically different from the aliased types made me reconsider--despite CFI causing this between identical types. I also think that the control, explicitness, and enforcement of using explicit CFI types align closer with Rust's philosophy than using the aliases, and now with these types hopefully becoming completely optional, they could be used when these characteristics and absolutely no loss in granularity are preferred.

samitolvanen pushed a commit to llvm/llvm-project that referenced this pull request Feb 1, 2023
This commit adds a new option (i.e.,
`-fsanitize-cfi-icall-normalize-integers`) for normalizing integer types
as vendor extended types for cross-language LLVM CFI/KCFI support with
other languages that can't represent and encode C/C++ integer types.

Specifically, integer types are encoded as their defined representations
(e.g., 8-bit signed integer, 16-bit signed integer, 32-bit signed
integer, ...) for compatibility with languages that define
explicitly-sized integer types (e.g., i8, i16, i32, ..., in Rust).

``-fsanitize-cfi-icall-normalize-integers`` is compatible with
``-fsanitize-cfi-icall-generalize-pointers``.

This helps with providing cross-language CFI support with the Rust
compiler and is an alternative solution for the issue described and
alternatives proposed in the RFC
rust-lang/rfcs#3296.

For more information about LLVM CFI/KCFI and cross-language LLVM
CFI/KCFI support for the Rust compiler, see the design document in the
tracking issue rust-lang/rust#89653.

Reviewed By: pcc, samitolvanen

Differential Revision: https://reviews.llvm.org/D139395
CarlosAlbertoEnciso pushed a commit to SNSystems/llvm-debuginfo-analyzer that referenced this pull request Feb 2, 2023
This commit adds a new option (i.e.,
`-fsanitize-cfi-icall-normalize-integers`) for normalizing integer types
as vendor extended types for cross-language LLVM CFI/KCFI support with
other languages that can't represent and encode C/C++ integer types.

Specifically, integer types are encoded as their defined representations
(e.g., 8-bit signed integer, 16-bit signed integer, 32-bit signed
integer, ...) for compatibility with languages that define
explicitly-sized integer types (e.g., i8, i16, i32, ..., in Rust).

``-fsanitize-cfi-icall-normalize-integers`` is compatible with
``-fsanitize-cfi-icall-generalize-pointers``.

This helps with providing cross-language CFI support with the Rust
compiler and is an alternative solution for the issue described and
alternatives proposed in the RFC
rust-lang/rfcs#3296.

For more information about LLVM CFI/KCFI and cross-language LLVM
CFI/KCFI support for the Rust compiler, see the design document in the
tracking issue rust-lang/rust#89653.

Reviewed By: pcc, samitolvanen

Differential Revision: https://reviews.llvm.org/D139395
samitolvanen pushed a commit to llvm/llvm-project that referenced this pull request Feb 8, 2023
This commit adds a new option (i.e.,
`-fsanitize-cfi-icall-normalize-integers`) for normalizing integer types
as vendor extended types for cross-language LLVM CFI/KCFI support with
other languages that can't represent and encode C/C++ integer types.

Specifically, integer types are encoded as their defined representations
(e.g., 8-bit signed integer, 16-bit signed integer, 32-bit signed
integer, ...) for compatibility with languages that define
explicitly-sized integer types (e.g., i8, i16, i32, ..., in Rust).

``-fsanitize-cfi-icall-normalize-integers`` is compatible with
``-fsanitize-cfi-icall-generalize-pointers``.

This helps with providing cross-language CFI support with the Rust
compiler and is an alternative solution for the issue described and
alternatives proposed in the RFC
rust-lang/rfcs#3296.

For more information about LLVM CFI/KCFI and cross-language LLVM
CFI/KCFI support for the Rust compiler, see the design document in the
tracking issue rust-lang/rust#89653.

Relands b1e9ab7 with fixes.

Reviewed By: pcc, samitolvanen

Differential Revision: https://reviews.llvm.org/D139395
CarlosAlbertoEnciso pushed a commit to SNSystems/llvm-debuginfo-analyzer that referenced this pull request Feb 9, 2023
This commit adds a new option (i.e.,
`-fsanitize-cfi-icall-normalize-integers`) for normalizing integer types
as vendor extended types for cross-language LLVM CFI/KCFI support with
other languages that can't represent and encode C/C++ integer types.

Specifically, integer types are encoded as their defined representations
(e.g., 8-bit signed integer, 16-bit signed integer, 32-bit signed
integer, ...) for compatibility with languages that define
explicitly-sized integer types (e.g., i8, i16, i32, ..., in Rust).

``-fsanitize-cfi-icall-normalize-integers`` is compatible with
``-fsanitize-cfi-icall-generalize-pointers``.

This helps with providing cross-language CFI support with the Rust
compiler and is an alternative solution for the issue described and
alternatives proposed in the RFC
rust-lang/rfcs#3296.

For more information about LLVM CFI/KCFI and cross-language LLVM
CFI/KCFI support for the Rust compiler, see the design document in the
tracking issue rust-lang/rust#89653.

Relands b1e9ab7438a098a18fecda88fc87ef4ccadfcf1e with fixes.

Reviewed By: pcc, samitolvanen

Differential Revision: https://reviews.llvm.org/D139395
skatrak pushed a commit to skatrak/llvm-project-rocm that referenced this pull request Feb 10, 2023
This commit adds a new option (i.e.,
`-fsanitize-cfi-icall-normalize-integers`) for normalizing integer types
as vendor extended types for cross-language LLVM CFI/KCFI support with
other languages that can't represent and encode C/C++ integer types.

Specifically, integer types are encoded as their defined representations
(e.g., 8-bit signed integer, 16-bit signed integer, 32-bit signed
integer, ...) for compatibility with languages that define
explicitly-sized integer types (e.g., i8, i16, i32, ..., in Rust).

``-fsanitize-cfi-icall-normalize-integers`` is compatible with
``-fsanitize-cfi-icall-generalize-pointers``.

This helps with providing cross-language CFI support with the Rust
compiler and is an alternative solution for the issue described and
alternatives proposed in the RFC
rust-lang/rfcs#3296.

For more information about LLVM CFI/KCFI and cross-language LLVM
CFI/KCFI support for the Rust compiler, see the design document in the
tracking issue rust-lang/rust#89653.

Reviewed By: pcc, samitolvanen

Differential Revision: https://reviews.llvm.org/D139395
@rcvalle
Copy link
Member Author

rcvalle commented Feb 13, 2023

The integer normalization option (see #3296 (comment)) has been accepted by Clang/LLVM team. With that, I think we have two options moving forward, and I wanted to hear all your opinions on these:

  1. Withdraw the RFC and extract the rustc-side of integer normalization option from Add cross-language LLVM CFI support to the Rust compiler rust#105452 to submit it separately.

  2. Continue with the RFC and have both the integer normalization option and the new or improved proposed types, the latter now completely optional for projects/teams that need or want the added explicitness and granularity. These options would be compatible and could be defined per compilation unit/crate. The integer normalization option could also be used to override the use of the new or improved types (by normalizing all integers, including the new or improved proposed types) whenever necessary.

    • The advantages of this is that there would be an option available that provides no loss of granularity and degradation of security when a C or C++ project that relies on CFI and has it enabled integrates a Rust library or rewrite parts of it in Rust.
    • The disadvantages of this is that there would be two ways to achieve cross-language CFI and that would be the integer normalization option and the new or improved proposed types, and it would require some consideration before deciding which option to use.

Let me know your thoughts.

@samitolvanen
Copy link

After re-reading the concerns above, I would be inclined to proceed with option 1. Enabling integer normalization in the Linux kernel results in ~1% reduction in unique type hashes, which seems tolerable to me.

@workingjubilee
Copy link
Contributor

workingjubilee commented Feb 18, 2023

Option 1 probably makes more sense, and also has the benefit of being a meaningful step towards a model more plausible for other compiled languages that target LLVM to opt into as well. They do already have to lower to LLVM IR, but they don't necessarily understand C++, so the less C++ such implementations have to speak, the better. Then we could have LLVM CFI in our Ada-C++-Fortran-Haskell-Kotlin-Rust-Swift-Zig polyglot binaries.

It's not impossible the security hit could be meaningful in practice, but if a sample from a real world, security-sensitive codebase tells us we would be pursuing effectively a last 1%, it doesn't seem worth it. Rust may have a bent towards explicitness but it also has a bent towards parsimony and there's been a lot of work on making sure that the "happy path" for Rust code is relatively smooth and ergonomic. That shouldn't necessarily stop just because FFI was involved: people who want to call a -sys crate shouldn't have to learn about a bunch of new CFI-only types, and neither should we assume they know any C, either.

@rcvalle
Copy link
Member Author

rcvalle commented Mar 2, 2023

Thank you for your thoughts on this @samitolvanen and @workingjubilee! Much appreciated. It seems option 1 is the most reasonable next step here now. A 1% granularity loss is a huge win compared to what would've been with a generalized encoding or not having CFI and cross-language CFI support at all.

While the Linux kernel is definitely a great data point, if you don't mind, I'd like to leave this PR open for a bit longer so we could better understand the impact of the integer normalization option1 in the Linux kernel and in other projects and possibly think of and experiment with other options.

@bjorn3 For whenever you have time, I've removed this RFC implementation (i.e., the new proposed types) and all references to it from rust-lang/rust#105452, leaving just the integer normalization option, and marked it ready for review. Others feel free to take a look as well.

Footnotes

  1. And what 1% actually represents (specially in a large code base) from an attacker perspective and exploitation attempt/scenario.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet