Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upFormally define repr(u32, i8, etc...) and repr(C) on enums with payloads #2195
Conversation
joshtriplett
added
the
T-lang
label
Oct 31, 2017
This comment has been minimized.
This comment has been minimized.
|
I really like this, and I think it'll significantly improve FFI. A few bits of feedback:
|
This comment has been minimized.
This comment has been minimized.
jld
commented
Oct 31, 2017
|
There's another representation that's slightly different from the one proposed: a union of structs where each struct's first field is the discriminant. (For FFI there could be an extra union case with only the discriminant, but I believe it's valid C to assume a struct's first field is at offset 0 and access it by pointer casting.) The difference from the struct-of-tag-and-payload representation shows up in cases like this: #[repr(u8)]
enum MyEnum {
Word(u64),
Bytes([u8; 15]),
}If the whole payload is a union, it gets 8-aligned and the enum takes 24 bytes; if each variant is laid out separately, it takes 16 bytes. Yet this isn't an “enum optimization” in any of the ways that cause the problems identified in this RFC: the C/C++ interface is well-defined and fairly straightforward, and the discriminant's meaning is unaffected by the bits stored in the data fields. This is, incidentally, how rustc currently handles enums when no optimizations or special cases apply; note this comment, currently in /// General-case enums: for each case there is a struct, and they
/// all start with a field for the discriminant. |
This comment has been minimized.
This comment has been minimized.
|
@jld That's an interesting distinction. Such a representation would always have an advantage on size, but might potentially prove more annoying to define and access from non-C languages. On balance, though, since Rust already uses that layout, using the same layout for the Can I suggest switching the RFC to that representation, observing that Rust uses that representation by default, and mentioning the "two-element struct" representation as an alternative? |
This comment has been minimized.
This comment has been minimized.
|
I agree with @jld here, though it may turn out that C++ compatibility concerns are the deciding factor. |
arielb1
reviewed
Oct 31, 2017
| The correct C definition is essentially the same, but with the `enum class` replaced with a plain integer of the appropriate type. | ||
| In addition, it is defined for Rust programs to cast/reinterpret/transmute such an enum into the equivalent tag+union Rust definition above. Seperately manipulating the tag and payload is also defined. The tag and payload need only be in a consistent/initialized state when the value is matched on (which includes Dropping it). This means that the contents of a `#[repr(X)]` enum cannot be inspected by outer enums for storage optimizations -- `Option<MyEnum>` must be larger than `MyEnum`. |
This comment has been minimized.
This comment has been minimized.
arielb1
Oct 31, 2017
•
Contributor
This means that the contents of a
#[repr(X)]enum cannot be inspected by outer enums for storage optimizations --Option<MyEnum>must be larger thanMyEnum.
When is that needed? This breaks the rule that by-value enums must always be "valid", and I don't see you using it (e..g. if dest is initialized when the function ends, then an Option<MyEnum> would still work).
This comment has been minimized.
This comment has been minimized.
Gankro
Oct 31, 2017
Author
Contributor
Yes, having reflected on this more, I believe you're correct. I'd just like @bluss to verify that their original use-case only requires the payload to be opaque. And it would be fine for Option<Flag> to use the the Flag's tag to represent None.
This comment has been minimized.
This comment has been minimized.
Gankro
Oct 31, 2017
Author
Contributor
Having reflected on this even more, here is an interesting case where this matters:
I am recursively doing this in-place deserialization process, and at some point I need to deserialize Option<MyEnum> in place. To do so, I need to do:
*result = Some(mem::uninitialized());
if let Some(ref mut my_enum) = result {
deserialize_my_enum(my_enum);
}
That is, if we make MyEnum "super opaque" it means Option<MyEnum> effectively becomes a pseudo-repr(X) type, and we can defer initialization of the payload... except that this runs afoul of my "must be valid when you match" rule. It's possible we could create more fine-grained rules to allow it, though.
Alternatively we can declare this to be unsupported, and force people who want this to make a #[repr(u8)] OpaqueOption<T> { Some(T), None }.
This comment has been minimized.
This comment has been minimized.
bluss
Oct 31, 2017
•
Yes it seems entirely right, it's just the payload.
It's about crate nodrop and arrayvec's use of it to store an uninitialized and/or partly uninitialized payload.
Yes as I understand it's ok if the enclosing Option uses the repr(u8) enum's tag. @eddyb has in fact already implemented that in the big enum layout PR(!), and it works fine with nodrop, even if it broke a test I wrote. Because my test was buggy.
I was envisioning a ManuallyDrop with more guarantees or a MaybeUninitialized or stable fully generic unions (without Copy bound) would the way to bring the arrayvec and nodrop use case to sound ground. (I've been discussing ideas with a few others the past weeks (breadcrumby link).)
If the uninitialized-in-repr-enum use case gets blessed by way of this rfc I think it's great if it can do so on other merits. But crate nodrop provides a working proof of concept.
This comment has been minimized.
This comment has been minimized.
|
Also, the representation should be more of the form: type Tag = uX;
#[repr(C)]
union MyEnum {
tag: Tag,
variant_1: Variant1,
variant_2: Variant2
}
#[repr(C)]
struct Variant1 {
tag: Tag,
field1: Field1,
...
}
#[repr(C)]
struct Variant2 {
tag: Tag,
field1: Field1,
...
}Because e.g. #[repr(u8)]
pub enum X {
Y { a: u8, b: u32 },
Z,
}
fn main() {
assert_eq!(std::mem::size_of::<X>(), 8);
}where the enum is represented as
Where the first field of the enum is allowed to not have the same alignment as the enum. |
This comment has been minimized.
This comment has been minimized.
|
@jld @arielb1 @pythonesque That (union of variants each containing the tag and their payload) is indeed what Rust enums use, as an optimization, but it's not, AFAIK, typically used in C or C++, and harder to work with in those languages than "tag followed by union of payloads". I think that minimizing the padding through this method, just like field reordering, shouldn't apply to C-compatible types as I do not see the optimization benefits justifying the complexity. |
This comment has been minimized.
This comment has been minimized.
|
Would it be reasonable to allow syntax like |
This comment has been minimized.
This comment has been minimized.
|
@glaebhoerl I'm pretty sure that's how it already works (except for the union-of-(tag, payload) vs (tag, union-of-payloads)). |
mystor
reviewed
Oct 31, 2017
| * Probably should be a field to avoid conflicts with user-defined methods | ||
| * Might need `#[repr(pub(X))]` for API design reasons | ||
| * Compiler-generated definitions for the Repr types | ||
| * With inherent type aliases on the enum? (`MyEnum::Tag`, `MyEnum::Payload`, `MyEnum::PayloadA`, etc.) |
This comment has been minimized.
This comment has been minimized.
mystor
Oct 31, 2017
I think that these should probably be implemented separate from this RFC. I'd be inclined to instead allow the ecosystem to generate these declarations using a Custom Derive like #[derive(EnumRepr)] which would define an my_enum_repr (or similar) module with Tag, Payload, and structs for each payload.
Right now I don't think we have the tools to expose these in a nice way, especially if we consider a theoretical enum with a variant called Tag or Payload (this change would be breaking for those enums).
This comment has been minimized.
This comment has been minimized.
Gankro
Oct 31, 2017
Author
Contributor
Yes, I was willing to punt on these since serde and cbindgen could both generate these automatically if need be.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
mystor
commented
Oct 31, 2017
|
I'm worried that this has limited usefulness without the ability to define untagged unions with non- |
est31
reviewed
Oct 31, 2017
| # Drawbacks | ||
| [drawbacks]: #drawbacks | ||
| There aren't really any drawbacks: we already decided to do this. This is just making it official. |
This comment has been minimized.
This comment has been minimized.
est31
Oct 31, 2017
Contributor
Saying "it has been decided already" is neither a good attitude towards the RFC process, nor is it a substitute for a drawback section!
This comment has been minimized.
This comment has been minimized.
|
Would people like me to decouple the proposal so |
This comment has been minimized.
This comment has been minimized.
|
@Gankro That sounds reasonable, though it needs very careful documentation. And I don't think either one should inhibit optimization of |
This comment has been minimized.
This comment has been minimized.
|
Note that the Rust And I believe C doesn't actually care about accessing unions in weird ways, so it's also valid there. So the only reason to care about the |
This comment has been minimized.
This comment has been minimized.
RReverser
commented
Oct 31, 2017
Yeah, ergonomically having separate tag and actual value structs is more convenient. Otherwise you will have an extra wrapper struct for each union variant which already has a nested named struct type, and that |
Gankro
changed the title
Formally define repr(X) on enums with payloads
Formally define repr(u32, i8, etc...) and repr(C) on enums with payloads
Oct 31, 2017
This comment has been minimized.
This comment has been minimized.
|
I have significantly updated the contents of the RFC based on feedback:
In addition, the Drawbacks section has been properly filled out, and two Real "unresolved questions" have been added:
|
This comment has been minimized.
This comment has been minimized.
RReverser
commented
Nov 1, 2017
|
Just to reconfirm: this doesn't impose similar On one hand, such restrictions would help with "it's very bad" note in the parsing example, but on another, it might be not a sufficient reason to limit the usefulness of tagged enums (and it's a breaking change). |
This comment has been minimized.
This comment has been minimized.
|
This RFC only defines the layout. So you won't be able to write out a MyEnumRepr type for non-Copy payloads in Rust today, but when/if unions remove that restriction you will be able to. (and you can do whatever in C(++)) |
This comment has been minimized.
This comment has been minimized.
RReverser
commented
Nov 1, 2017
|
@Gankro Right, what I wanted to reconfirm is that this RFC doesn't impose additional restriction on variant contents. And since it doesn't, that "it would be very bad if we panicked past this point" comment seems a bit excessive, since for |
Gankro
referenced this pull request
Nov 1, 2017
Closed
Implement Support For repr(int) non-C-like enums #78
This comment has been minimized.
This comment has been minimized.
|
This specific enum isn't the best example of it, but e.g. |
This comment has been minimized.
This comment has been minimized.
RReverser
commented
Nov 1, 2017
Ah yes, indeed, but then the problem is much more generic than this proposal. |
This was referenced Nov 1, 2017
rkruppe
referenced this pull request
Dec 16, 2017
Closed
#[repr(C)] silently accepted on ADT enums #31692
This comment has been minimized.
This comment has been minimized.
upsuper
commented
Jan 18, 2018
|
If it is possible to get the tag value of enum, it would also help Servo to shrink some lookup functions which are currently compiled to a jump table or lookup table. See Gecko bug 1428285. |
This comment has been minimized.
This comment has been minimized.
|
@rfcbot fcp merge We discussed this today in the @rust-lang/lang meeting. This RFC is basically documenting the existing behavior of the compiler, if I understand correctly. Clearly, it's of use for projects that are attempting to do low-level optimizations that operate directly on enums and other Rust values. Having a defined layout here thus enables a lot of use cases and doesn't seem to have a lot of downside (at least none were brought up here that I saw in skimming the thread), and we've clearly iterated a bit on the precise iteration and wound up at a pretty flexible point. (Further, there is always room to add more repr flags and things in the future as needed, so the commitment here is low.) |
This comment has been minimized.
This comment has been minimized.
rfcbot
commented
Jan 18, 2018
•
|
Team member @nikomatsakis has proposed to merge this. The next step is review by the rest of the tagged teams: No concerns currently listed. Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! See this document for info about what commands tagged team members can give me. |
rfcbot
added
the
proposed-final-comment-period
label
Jan 18, 2018
nikomatsakis
removed
the
I-nominated
label
Jan 18, 2018
sfackler
referenced this pull request
Jan 30, 2018
Closed
Unclear calling convention for returning an enum to C from `extern "C"` functions #47878
This comment has been minimized.
This comment has been minimized.
rfcbot
commented
Jan 31, 2018
|
|
rfcbot
added
final-comment-period
and removed
proposed-final-comment-period
labels
Jan 31, 2018
This comment has been minimized.
This comment has been minimized.
|
Some data points:
|
SimonSapin
referenced this pull request
Feb 6, 2018
Merged
Generate some PropertyDeclaration methods by hand 🐉🐲 #19956
nox
added a commit
to servo/servo
that referenced
this pull request
Feb 6, 2018
nox
added a commit
to servo/servo
that referenced
this pull request
Feb 6, 2018
bors-servo
added a commit
to servo/servo
that referenced
this pull request
Feb 6, 2018
moz-v2v-gh
pushed a commit
to mozilla/gecko-dev
that referenced
this pull request
Feb 6, 2018
julienw
pushed a commit
to julienw/mozilla-central
that referenced
this pull request
Feb 7, 2018
This comment has been minimized.
This comment has been minimized.
rfcbot
commented
Feb 10, 2018
|
The final comment period is now complete. |
aturon
merged commit 92f44db
into
rust-lang:master
Feb 14, 2018
This comment has been minimized.
This comment has been minimized.
|
This RFC has been merged! Thanks @Gankro! |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
golddranks
commented
Jun 3, 2018
|
The rendered link is broken. |


Gankro commentedOct 31, 2017
•
edited by mbrubeck
Formally define the enum
#[repr(u32, i8, etc..)]and#[repr(C)]attributes to force a non-C-like enum to have defined layouts. This serves two purposes: allowing low-level Rust code to independently initialize the tag and payload, and allowing C(++) to safely manipulate these types.Rendered