Skip to content
This repository has been archived by the owner on Apr 5, 2024. It is now read-only.

#[repr(C)] C-like enums and out of range values #41

Closed
nikomatsakis opened this issue Jun 15, 2017 · 25 comments
Closed

#[repr(C)] C-like enums and out of range values #41

nikomatsakis opened this issue Jun 15, 2017 · 25 comments

Comments

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Jun 15, 2017

@sdroege asked me the following on IRC, and I .. wasn't sure of the answer:

nmatsakis: if i have a repr(C) enum with values A=1, B=2. and a C function with that enum as return type returns "3", is that leading us to undefined behaviour already? is it guaranteed that converting the enum to the corresponding integer type will give "3" and nothing bad happens, and the only possible bad thing that could happen is that if there is a exhaustive match (for the compiler) on the enum... that it's actually not exhaustive (because 3) and from there on things go wrong?

I wasn't sure what I thought the answer should be.

@arielb1
Copy link
Contributor

arielb1 commented Jun 15, 2017

With the current implementation this is UB - we emit !range metadata on loads.

I think that if we want to allow "invalid" values in enums, we need to think about it more - we e.g. probably want to allow things like comparisons.

@le-jzr
Copy link

le-jzr commented Jun 15, 2017

From the programmer's (and FFI bindings) perspective, it would be ideal if C-like enums with explicit #[repr(...)] were considered valid for the entire range of the underlying int* type. It would be basically saying "the enum is allowed to have values that don't have name assigned". However, exhaustive matching would not be possible for such enums.

@RalfJung
Copy link
Member

RalfJung commented Jun 15, 2017

C++ seems to treat out-of-rage casts to enum as "unspecified values", which means after the cast it can have any valid value of the target range. Casting back to int does not have to yield the same result, it could return anything. I couldn't find the relevant part in the C standard.

@le-jzr
Copy link

le-jzr commented Jun 15, 2017

It says in Annex I of N1570 (C11) that an implementation may generate warnings when

A value is given to an object of an enumerated type other than by assignment of an
enumeration constant that is a member of that type, or an enumeration object that has
the same type, or the value of a function that returns the same enumerated type
(6.7.2.2).

I couldn't find any other mention anywhere, and I checked the lists of implementation-defined and undefined behaviors.

@strega-nil
Copy link

strega-nil commented Jun 16, 2017

@RalfJung incorrect - http://eel.is/c++draft/expr.static.cast#10

A value of integral or enumeration type can be explicitly converted to a complete enumeration type. The value is unchanged if the original value is within the range of the enumeration values ([dcl.enum]). Otherwise, the behavior is undefined.

http://eel.is/c++draft/dcl.enum#8

For an enumeration whose underlying type is fixed, the values of the enumeration are the values of the underlying type. Otherwise, for an enumeration where e[min] is the smallest enumerator and e[max] is the largest, the values of the enumeration are the values in the range b[min] to b[max], defined as follows: Let K be 1 for a two's complement representation and 0 for a ones' complement or sign-magnitude representation. b[max] is the smallest value greater than or equal to max(|e[min]| − K, |e[max]|) and equal to 2M − 1, where M is a non-negative integer. b[min] is zero if e[min] is non-negative and −(b[max] + K) otherwise. The size of the smallest bit-field large enough to hold all the values of the enumeration type is max(M, 1) if b[min] is zero and M + 1 otherwise. It is possible to define an enumeration that has values not defined by any of its enumerators. If the enumerator-list is empty, the values of the enumeration are as if the enumeration had a single enumerator with value 0.

(basically, this all says that the range of an enum is enough to hold all of the variants or'd together, for bitflags)

@RalfJung
Copy link
Member

RalfJung commented Jun 16, 2017

I stand corrected. And it seems the rules changed, because the one I quoted above still says out-of-range is unspecified value, not UB. That was C++14, your link seems to be a draft of C++17. Thanks!

@strega-nil
Copy link

strega-nil commented Jun 16, 2017

@RalfJung it looks like that was changed with CWG1766. Seems it was to make it non-constexpr?

@sdroege
Copy link

sdroege commented Jun 16, 2017

AFAIU in C it's not undefined behaviour (or implementation-defined behaviour) to use values that are in range of the underlying integer type but not part of the enum. It basically is just syntactic sugar around integers and giving names to specific ones. Compiler warnings for using values outside the range of the enum were only added relatively recently (few years ago) too.
I guess it would be useful if repr(C) enums could have the same behaviour, it makes interfacing with C libraries a bit more convenient (if you have to otherwise use integers and explicitly convert between those and the enum), or safer (if you still use repr(C) enums but the C library could give you values outside the range of the enum defined in Rust, e.g. because a newer version added a new value in a backwards compatible way).

For C++ the story seems to be a bit different as @ubsan is quoting above.

@le-jzr
Copy link

le-jzr commented Jun 16, 2017

It's also worth noting that enums are often not even used in C, due to their abysmal benefit. Instead a lot of enumerations are just a bunch of #define constants.

@arielb1
Copy link
Contributor

arielb1 commented Jun 16, 2017

@sdroege

I consider the Rust FFI equivalent of a C enum to be a newtype integer with a set of constants. All sorts of "semi-invalid" values are basically armed bombs waiting to UB up your program, to be touched very carefully, but with C-style enums, out-of-range values are not "invalid" in any way or form - using an "old" header with a "new" library is a to-be-explicitly-supported usecase.

@RalfJung
Copy link
Member

I guess one question is whether repr should really have such a fundamental effect on the type (giving up on exhaustiveness) -- that goes beyond just deciding how values are represented in memory. Maybe there should instead just be a macro to define a newtype + constants like @arielb1 mentioned.

@le-jzr
Copy link

le-jzr commented Jun 16, 2017

Or it could piggyback on the non-exhaustive enum RFC that's currently (hopefully) in the process of being accepted. We could define that #[repr(C/iXX/uXX)] + non-exhaustive => well defined for all values of the underlying​ type.

@sdroege
Copy link

sdroege commented Jun 17, 2017

I consider the Rust FFI equivalent of a C enum to be a newtype integer with a set of constants.

What's the exact meaning of repr(C) for an enum then, if not that it is "like an enum in C and compatible with that over FFI boundaries"?

@nagisa
Copy link
Member

nagisa commented Jun 17, 2017

@sdroege as repr in the attribute implies, it only affects the representation in memory, not semantics. This is true for all items which admit a repr(C) attribute.

@sdroege
Copy link

sdroege commented Jun 17, 2017

@sdroege as repr in the attribute implies, it only affects the representation in memory, not semantics. This is true for all items which admit a repr(C) attribute.

While that makes sense, it seems like a potential footgun, at least I expected it to also behave like in C :) Using repr(C) enums for bindings to C libraries seems rather dangerous and not a good idea in that case, and like @arielb1 said some newtype around an integer of the correct size would seem like the better option. But what would be the point of repr(C) enums then?

@le-jzr
Copy link

le-jzr commented Jun 17, 2017

As long as #[repr(C/u*/i*)] on enums exist without a convenient alternative, they will be used incorrectly. Newtype integer with constants is something that looks seriously out of place in Rust, and safely converting from int to enum is currently very inconvenient. It might seem that I am overly pessimistic, but I have seen way too many transmutes of external integers to be anything but.

@RalfJung
Copy link
Member

@sdroege as repr in the attribute implies, it only affects the representation in memory, not semantics. This is true for all items which admit a repr(C) attribute.

The question is -- how does it affect representation in memory? C doesn't have tagged unions, so "this is laid out like C would" doesn't make any sense for enum.

@strega-nil
Copy link

strega-nil commented Jun 18, 2017

@sdroege you seem to misunderstand the section I quoted - this is only for enums with no fixed underlying type. Any enums with fixed underlying type are treated as if they are newtyped integers.

@RalfJung I agree. I don't think people should be using enums for this - they should be using unions wrapped in structs. It would be nice to be able to overload matching for types like this.

@nikomatsakis
Copy link
Contributor Author

I believe that the original question stated:

if i have a repr(C) enum with values A=1, B=2.

which we only accept if none of the variants in the enum have data associated with them. So @RalfJung this question:

C doesn't have tagged unions, so "this is laid out like C would" doesn't make any sense for enum.

is sort of besides the point, I think.

That said, I think I would agree with @arielb1's original comment:

With the current implementation this is UB - we emit !range metadata on loads.

In other words, this is an error today, which presumably implies some speed win in some scenarios, and I see no reason not to keep this as UB. Enums in Rust are more meaningful than in C, where they are often used (righly or wrongly) as glorified integers. I think that's ok.

@nikomatsakis
Copy link
Contributor Author

(But I do think we need to arrive at some better set of guiding principles by which to make such decisions.)

@le-jzr
Copy link

le-jzr commented Jul 17, 2017

I see no reason not to keep this as UB.

I stand by my earlier comment. It's very hazardous to use C-like enums in FFI, and the danger is immensely counterintuitive to boot. Saying "eh, well, it shouldn't be used that way" is just hiding head in sand. The issue exists and it needs to be fixed somehow, even if just by finding a way to teach everyone the proper way to do enums in FFI. Personally, I'd consider pretty much any other alternative easier to achieve.

@sdroege
Copy link

sdroege commented Jul 17, 2017

I also disagree: if you see a repr(C) enum you would intuitively assume that it behaves like an enum in C. If it doesn't (i.e. is not a glorified integer), that is surprising and can result in hard to track down bugs and maybe worse, makes repr(C) enums basically useless. What else than C FFI would you use it for? For all the other cases you can just use plain enums, or any of the other repr(X) variants if you want to ensure that it's stored as some kind of integer.

IMHO it would make sense to deprecated repr(C) enums in that case, and make definition of them a compiler warning at least.

@vlisivka
Copy link

vlisivka commented Nov 8, 2018

Is it possible to mark repr(C) enum's as unsafe, so access to them will be allowed in unsafe blocks only, but keep C-like behaviour in unsafe blocks? Or maybe we should add _Unknown_(i32) variant by default to repr(C) enum's somewhat.

This behaviour affects Prost: i32 type is used instead of enum type, because Protobuf defines that enum variable must be able to hold values outside of enum range to be compatible with future versions. As alternative solution, it's proposed to use _Unknonwn_(i32) variant, but this solution cannot be implemented in current version of Rust, because C-like enums cannot contain tags.

In turn, this problem hurts me as user of Prost. I want to have easy to use interface, out of box serialization/deserialization to JSON, and so on.

@sdroege
Copy link

sdroege commented May 30, 2019

Was there any update on this? It should probably also be mentioned in the unsafe code guidelines in the enums section.

Edit: Created an issue there rust-lang/unsafe-code-guidelines#137

@RalfJung
Copy link
Member

This is basically subsumed by rust-lang/unsafe-code-guidelines#69: any enum value must have a valid discriminant value.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants