Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upMatch on repr(C) enum returned from C library with unknown value leads to UB #36927
Comments
This comment has been minimized.
This comment has been minimized.
|
TL;DR:
It is not a ABI breaking change, as adding a variant to a C enum (as long as the size of the actual integer which represents the enum in memory does not change) has no influence on the registers/memory locations where this enum gets placed when returned or passed as an argument. It is, however, an API-breaking change.
Rustc (its
Matching on |
This comment has been minimized.
This comment has been minimized.
|
Ok, so some research suggests that 1) C and C++ have similar, but slightly different rules about the valid values of an enumeration type. However, it seems that neither are limited to the values defined by the declaration. As far as I can tell, an enumeration type in C can take on any value of the underlying type. This means that adding an enum constant may-or-may-not be a breaking ABI change depending on the compiler and target (because of course it is). Fortunately, most platforms will default to With that in mind, we should probably do two things:
|
This comment has been minimized.
This comment has been minimized.
|
The fact that Rust enums cause UB when they hold a value that isn't a valid variant is the primary reason I don't use Rust enums in winapi and instead use simple integer constants (The other reason being that integer constants don't require any trait impls or methods and significantly reduce compile times). C/C++ will often even use enums as bitflags which definitely don't align with Rust enums.
Windows API uses enums as bitflags in both C and C++ which indicates that it is not undefined behavior in MSVC C/C++. |
This comment has been minimized.
This comment has been minimized.
|
Sure. We purged all Rust enums from rustllvm unless we control the other side. Maybe prohibit |
This comment has been minimized.
This comment has been minimized.
That's wrong. Creating a Rust enum with an out-of-range value is instant UB - we generate |
steveklabnik
added
the
A-lang
label
Oct 7, 2016
This comment has been minimized.
This comment has been minimized.
|
I think that there should be a lint against using an enum type in FFI, except for the case where the enum cannot have been created or modified by foreign code. It is way too easy to mess this up. A newtype of an integer is a much better solution. |
This comment has been minimized.
This comment has been minimized.
|
Perhaps we should allow |
This comment has been minimized.
This comment has been minimized.
|
/subscribe If a C API takes an enum by value, there's a decent chance it's actually bitflags, i.e. you're expected to be able to do |
steveklabnik
added
T-lang
and removed
A-lang
labels
Mar 24, 2017
This comment has been minimized.
This comment has been minimized.
|
The usage of C enums as bitflags can be handled in Rust with things like the bitflags crate (or something similar). So that shouldn't be a big deal. But this issue of undefined behavior for "out of range" enums is a real concern for me. bindgen even generates Rust enums from C enums. Describing this current state of affairs as a "foot gun" is a perfect description. Rust really needs an ergonomic solution to using C enums in a Rust FFI setting. Otherwise Rust's safety promises are moot. |
mjbshaw
referenced this issue
Apr 26, 2017
Open
Using Rust enums to represent C enums can result in undefined behavior #667
This comment has been minimized.
This comment has been minimized.
tonychain
commented
Apr 26, 2017
|
Wouldn't the simplest option be allowing the user to specify the default value to be selected in the event the enum is out-of-range? Something like: #[repr(C, default = Invalid)]
enum Thing {
Invalid,
...At least for the enums I happen to be working with in this FFI, they all support a good choice of a default option. |
This comment has been minimized.
This comment has been minimized.
jeandudey
commented
Apr 27, 2017
Yes, but
I would first approach this with a
I do not agree with this, I do not think that C enumerations are part of Rust's safety measures, what happens is that this problem is unsolvable (in my humble opinion) within the Rust language itself, |
This comment has been minimized.
This comment has been minimized.
I don't think this provides the guarantees that you want! C enums can still have values outside of the explicitly stated variants. AIUI, basically anything within the same bit range is still legal. |
This comment has been minimized.
This comment has been minimized.
|
Note, also, that bindgen allows customizing the enums as needed (generating plain integers, for example). The possibilities would be much larger with rust-lang/rfcs#1758, though. |
This comment has been minimized.
This comment has been minimized.
Or you could use a regular declarative macro to define enums using normal syntax and have them turned into constants like winapi does. |
This comment has been minimized.
This comment has been minimized.
How does generating bindings at compile time help? As I mentioned in rust-lang/rust-bindgen#667 if the Rust program dynamically links to the C library then the C library could easily return an "out of range" enum (e.g. if the user's computer has a slightly different version of the C library; C programmers generally don't consider adding a value to an enum to be an API or ABI breaking change). Whether Rust's bindings are generated at compile time or not is irrelevant here. |
Cldfire
referenced this issue
May 21, 2017
Closed
Passing enum pointers to be set by C may not be a good idea #4
This comment has been minimized.
This comment has been minimized.
|
Before I take the time to switch over to using numerical constants instead of enums in my own project, may I ask if this is still a concern? I just tried running the playground example that was linked in the original post on every rust version from 1.12.0 up to 1.17.0:
On 1.17.0, however, everything seems to be fine:
It exhibits the same behavior when run on play.rust-lang.org; not one of stable, beta, or nightly segfaults. EDIT: It also doesn't segfault even if you don't have a |
This comment has been minimized.
This comment has been minimized.
|
The worst part of UB is that it might just do the "reasonable" thing you'd want... |
This comment has been minimized.
This comment has been minimized.
|
Have you tried it with optimization? In my case whether the undefined behavior manifested depended on surrounding code and if it was optimized. |
This comment has been minimized.
This comment has been minimized.
|
If by 'optimization' you mean release mode, then yeah. Other than that, no. Regardless, what cuviper said made sense; I see now that it would be foolhardy to rely on this behavior :) |
kennytm
referenced this issue
May 26, 2017
Merged
Future-proofing enums/structs with #[non_exhaustive] attribute #2008
Mark-Simulacrum
added
the
C-feature-request
label
Jul 26, 2017
FauxFaux
added a commit
to FauxFaux/zstd-rs
that referenced
this issue
Jan 15, 2018
FauxFaux
added a commit
to FauxFaux/zstd-rs
that referenced
this issue
Jul 4, 2018
per-gron
referenced this issue
Jul 5, 2018
Open
Add #[non_exhaustive] option to enum codegen options #43
This comment has been minimized.
This comment has been minimized.
|
So, it's been two years. re-reading this issue, it seems that this is working as intended, generally. As such, I'm going to give it a close. Anything that changes here seems big enough to require an RFC, and probably should be pursued through those channels. Thanks! |
tcosprojects commentedOct 3, 2016
•
edited
Summary
It is not uncommon for C libraries to add more values to an enum in their API while considering it a non-breaking change. However, with a rust library linked to the C library it causes undefined behavior when the change is made if the rust enum definition is not updated.
Examples
I've reduced the issue down to this sample code where get_bad_kind() represents a call into a C library that returns a new enum value unknown to rust which leads to a crash at runtime (signal: 11, SIGSEGV: invalid memory reference or stack overflow): https://play.rust-lang.org/?gist=f066d8e489a6e220866958065891a812&version=stable&backtrace=0
The C version of this code does not result in this issue, instead it hits the default case and returns -1: https://gist.github.com/tcosprojects/49d0c809d40b8f008e25dacf49c508bf
Tested on
rustc 1.12.0 (3191fba 2016-09-23)
binary: rustc
commit-hash: 3191fba
commit-date: 2016-09-23
host: x86_64-unknown-linux-gnu
release: 1.12.0
and
rustc 1.12.0 (3191fba 2016-09-23)
binary: rustc
commit-hash: 3191fba
commit-date: 2016-09-23
host: i686-pc-windows-msvc
release: 1.12.0
Discussion
Discussing the issue on the #rust channel led to the following:
A binding library would need to match the rust definition of the enum with the C definition exactly at all times when linked to a C library to prevent either this undefined behavior or passing unsupported flags to older versions of the C library. This could be done in the cargo build script that checks the C library version and enables feature flags for these enum values. It would still not not guarantee safety though when using dynamically linked libraries. Or it could be resolved by not using repr(C) enums and instead translating them at runtime with a function that handles invalid values. If the latter should be done, it would be helpful to have it mentioned in the docs somewhere. It also makes me uncertain what the purpose of repr(C) enum is, if not to be compatible with and used with C FFI calls.
In any case, as a C programmer this bug was surprising and took some significant debugging. It required going into the assembly output of the match statement to see it jump based on the constant of the highest value from the rust enum definition and then corrupt the stack to pin down the cause of the crash.
A warning about this compatibility issue regarding repr(C) enum in the FFI docs would be helpful. I suspect it is not uncommon for rust bindings libraries to use repr(C) enum this way while expecting it to work.
If the match expression could jump to unreachable!() in debug builds when encountering an unknown value on a repr(C) enum type it would help catch this issue without so much debugging.
Questions