Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formally define repr(u32, i8, etc...) and repr(C) on enums with payloads #2195

Merged
merged 5 commits into from Feb 14, 2018

Conversation

@Gankro
Copy link
Contributor

Gankro commented Oct 31, 2017

Formally define the enum #[repr(u32, i8, etc..)] and #[repr(C)] attributes to force a non-C-like enum to have defined layouts. This serves two purposes: allowing low-level Rust code to independently initialize the tag and payload, and allowing C(++) to safely manipulate these types.

Rendered

@joshtriplett joshtriplett added the T-lang label Oct 31, 2017

@joshtriplett

This comment has been minimized.

Copy link
Member

joshtriplett commented Oct 31, 2017

I really like this, and I think it'll significantly improve FFI.

A few bits of feedback:

  • I'd prefer the alternative where "The tag shouldn't be opaque to outer enums". I don't think that would break any reasonable assumptions made in C code. C can't make any assumptions about the layout of Option<MyEnum> (other than Option on pointer types mapping NULL to None), since Option doesn't use repr (and should not use repr since it would invalidate enum layout optimizations).
  • Please explicitly state that the documentation for repr(X) should explicitly state that native Rust code should not use that attribute unless it needs to manipulate an enum at the raw memory level or pass it to C code non-opaquely, as doing so inhibits useful optimizations.
  • Regarding "A field/method for the tag/payload", the former would be the discriminant_value function (still not stabilized). I don't know that we need a field for the payload, since it won't necessarily have a corresponding type.
  • Nice branch name. 👍
@jld

This comment has been minimized.

Copy link

jld commented Oct 31, 2017

There's another representation that's slightly different from the one proposed: a union of structs where each struct's first field is the discriminant. (For FFI there could be an extra union case with only the discriminant, but I believe it's valid C to assume a struct's first field is at offset 0 and access it by pointer casting.) The difference from the struct-of-tag-and-payload representation shows up in cases like this:

#[repr(u8)]
enum MyEnum {
  Word(u64),
  Bytes([u8; 15]),
}

If the whole payload is a union, it gets 8-aligned and the enum takes 24 bytes; if each variant is laid out separately, it takes 16 bytes. Yet this isn't an “enum optimization” in any of the ways that cause the problems identified in this RFC: the C/C++ interface is well-defined and fairly straightforward, and the discriminant's meaning is unaffected by the bits stored in the data fields.

This is, incidentally, how rustc currently handles enums when no optimizations or special cases apply; note this comment, currently in rustc::ty::layout, which dates back to rust-lang/rust@a301db7:

    /// General-case enums: for each case there is a struct, and they
    /// all start with a field for the discriminant.
@joshtriplett

This comment has been minimized.

Copy link
Member

joshtriplett commented Oct 31, 2017

@jld That's an interesting distinction.

Such a representation would always have an advantage on size, but might potentially prove more annoying to define and access from non-C languages. On balance, though, since Rust already uses that layout, using the same layout for the repr case seems reasonable to me.

Can I suggest switching the RFC to that representation, observing that Rust uses that representation by default, and mentioning the "two-element struct" representation as an alternative?

@pythonesque

This comment has been minimized.

Copy link
Contributor

pythonesque commented Oct 31, 2017

I agree with @jld here, though it may turn out that C++ compatibility concerns are the deciding factor.

The correct C definition is essentially the same, but with the `enum class` replaced with a plain integer of the appropriate type.
In addition, it is defined for Rust programs to cast/reinterpret/transmute such an enum into the equivalent tag+union Rust definition above. Seperately manipulating the tag and payload is also defined. The tag and payload need only be in a consistent/initialized state when the value is matched on (which includes Dropping it). This means that the contents of a `#[repr(X)]` enum cannot be inspected by outer enums for storage optimizations -- `Option<MyEnum>` must be larger than `MyEnum`.

This comment has been minimized.

@arielb1

arielb1 Oct 31, 2017

Contributor

This means that the contents of a #[repr(X)] enum cannot be inspected by outer enums for storage optimizations -- Option<MyEnum> must be larger than MyEnum.

When is that needed? This breaks the rule that by-value enums must always be "valid", and I don't see you using it (e..g. if dest is initialized when the function ends, then an Option<MyEnum> would still work).

This comment has been minimized.

@Gankro

Gankro Oct 31, 2017

Author Contributor

Yes, having reflected on this more, I believe you're correct. I'd just like @bluss to verify that their original use-case only requires the payload to be opaque. And it would be fine for Option<Flag> to use the the Flag's tag to represent None.

This comment has been minimized.

@Gankro

Gankro Oct 31, 2017

Author Contributor

Having reflected on this even more, here is an interesting case where this matters:

I am recursively doing this in-place deserialization process, and at some point I need to deserialize Option<MyEnum> in place. To do so, I need to do:

*result = Some(mem::uninitialized());
if let Some(ref mut my_enum) = result {
  deserialize_my_enum(my_enum);
}

That is, if we make MyEnum "super opaque" it means Option<MyEnum> effectively becomes a pseudo-repr(X) type, and we can defer initialization of the payload... except that this runs afoul of my "must be valid when you match" rule. It's possible we could create more fine-grained rules to allow it, though.

Alternatively we can declare this to be unsupported, and force people who want this to make a #[repr(u8)] OpaqueOption<T> { Some(T), None }.

This comment has been minimized.

@bluss

bluss Oct 31, 2017

Yes it seems entirely right, it's just the payload.

It's about crate nodrop and arrayvec's use of it to store an uninitialized and/or partly uninitialized payload.

Yes as I understand it's ok if the enclosing Option uses the repr(u8) enum's tag. @eddyb has in fact already implemented that in the big enum layout PR(!), and it works fine with nodrop, even if it broke a test I wrote. Because my test was buggy.

I was envisioning a ManuallyDrop with more guarantees or a MaybeUninitialized or stable fully generic unions (without Copy bound) would the way to bring the arrayvec and nodrop use case to sound ground. (I've been discussing ideas with a few others the past weeks (breadcrumby link).)

If the uninitialized-in-repr-enum use case gets blessed by way of this rfc I think it's great if it can do so on other merits. But crate nodrop provides a working proof of concept.

@arielb1

This comment has been minimized.

Copy link
Contributor

arielb1 commented Oct 31, 2017

Also, the representation should be more of the form:

type Tag = uX;

#[repr(C)]
union MyEnum {
    tag: Tag,
    variant_1: Variant1,
    variant_2: Variant2
}

#[repr(C)]
struct Variant1 {
    tag: Tag,
    field1: Field1,
    ...
}

#[repr(C)]
struct Variant2 {
    tag: Tag,
    field1: Field1,
    ...
}

Because e.g.

#[repr(u8)]
pub enum X {
    Y { a: u8, b: u32 },
    Z,
}

fn main() {
    assert_eq!(std::mem::size_of::<X>(), 8);
}

where the enum is represented as

   0     1     2     3     4     5     6     7
+-----+-----+-----+-----+-----+-----+-----+-----+
| tag |  a  | <padding> |  -------- b --------- |
+-----+-----+-----------+-----------------------+

Where the first field of the enum is allowed to not have the same alignment as the enum.

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Oct 31, 2017

@jld @arielb1 @pythonesque That (union of variants each containing the tag and their payload) is indeed what Rust enums use, as an optimization, but it's not, AFAIK, typically used in C or C++, and harder to work with in those languages than "tag followed by union of payloads".

I think that minimizing the padding through this method, just like field reordering, shouldn't apply to C-compatible types as I do not see the optimization benefits justifying the complexity.

@glaebhoerl

This comment has been minimized.

Copy link
Contributor

glaebhoerl commented Oct 31, 2017

Would it be reasonable to allow syntax like #[repr(C, u8)] to specify both "must be C-compatible" and the size of the tag, instead of just assuming the former whenever the latter is done? Then whether to do the alignment optimization, as well as whether to do #[repr(C)] on the member structs as mentioned in the RFC, could both depend on whether or not C representation was requested (without taking away the ability specify the tag size at the same time). If not it's probably simplest to just always do the FFI-appropriate thing.

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Oct 31, 2017

@glaebhoerl I'm pretty sure that's how it already works (except for the union-of-(tag, payload) vs (tag, union-of-payloads)).

* Probably should be a field to avoid conflicts with user-defined methods
* Might need `#[repr(pub(X))]` for API design reasons
* Compiler-generated definitions for the Repr types
* With inherent type aliases on the enum? (`MyEnum::Tag`, `MyEnum::Payload`, `MyEnum::PayloadA`, etc.)

This comment has been minimized.

@mystor

mystor Oct 31, 2017

I think that these should probably be implemented separate from this RFC. I'd be inclined to instead allow the ecosystem to generate these declarations using a Custom Derive like #[derive(EnumRepr)] which would define an my_enum_repr (or similar) module with Tag, Payload, and structs for each payload.

Right now I don't think we have the tools to expose these in a nice way, especially if we consider a theoretical enum with a variant called Tag or Payload (this change would be breaking for those enums).

This comment has been minimized.

@Gankro

Gankro Oct 31, 2017

Author Contributor

Yes, I was willing to punt on these since serde and cbindgen could both generate these automatically if need be.

This comment has been minimized.

@eddyb

eddyb Oct 31, 2017

Member

This sounds partly like #1450.

@mystor

This comment has been minimized.

Copy link

mystor commented Oct 31, 2017

I'm worried that this has limited usefulness without the ability to define untagged unions with non-Copy variants, which IIRC has its stability blocked on #1897 or similar.
I suppose it can still be used for copy variants and for C/C++ FFI right now.

# Drawbacks
[drawbacks]: #drawbacks
There aren't really any drawbacks: we already decided to do this. This is just making it official.

This comment has been minimized.

@est31

est31 Oct 31, 2017

Contributor

Saying "it has been decided already" is neither a good attitude towards the RFC process, nor is it a substitute for a drawback section!

@Gankro

This comment has been minimized.

Copy link
Contributor Author

Gankro commented Oct 31, 2017

Would people like me to decouple the proposal so repr(u32) gets the Rust union(tag) layout, and repr(C, u32) gets the tag, union layout?

@joshtriplett

This comment has been minimized.

Copy link
Member

joshtriplett commented Oct 31, 2017

@Gankro That sounds reasonable, though it needs very careful documentation.

And I don't think either one should inhibit optimization of Option<MyEnum>.

@Gankro

This comment has been minimized.

Copy link
Contributor Author

Gankro commented Oct 31, 2017

Note that the Rust union(tag) layout is explicitly declared valid to manipulate in C++ >= 11 (not sure about earlier versions):

screen shot 2017-10-31 at 4 21 52 pm

And I believe C doesn't actually care about accessing unions in weird ways, so it's also valid there. So the only reason to care about the (tag, union)-style is because it's a bit more ergonomic to manipulate in low-level code, and might make it easier to work with existing C(++) codebases.

@RReverser

This comment has been minimized.

Copy link

RReverser commented Oct 31, 2017

because it's a bit more ergonomic to manipulate in low-level code

Yeah, ergonomically having separate tag and actual value structs is more convenient. Otherwise you will have an extra wrapper struct for each union variant which already has a nested named struct type, and that tag in each of variants will be ignored anyway when it's not part of the union.

@Gankro Gankro changed the title Formally define repr(X) on enums with payloads Formally define repr(u32, i8, etc...) and repr(C) on enums with payloads Oct 31, 2017

@Gankro

This comment has been minimized.

Copy link
Contributor Author

Gankro commented Oct 31, 2017

I have significantly updated the contents of the RFC based on feedback:

  • repr(u32, i8, etc...) now uses the current Rust union(tag) layout for compatibility and performance reasons
  • repr(C) now forces the alternative (tag, union) layout that was originally proposed.

In addition, the Drawbacks section has been properly filled out, and two Real "unresolved questions" have been added:

  • Should Option<MyEnum> be forced to be larger than MyEnum? (should the tag be opaque?)
  • Should every payload in a repr(C) enum's union be a struct?
@RReverser

This comment has been minimized.

Copy link

RReverser commented Nov 1, 2017

Just to reconfirm: this doesn't impose similar Copy restrictions as union do?

On one hand, such restrictions would help with "it's very bad" note in the parsing example, but on another, it might be not a sufficient reason to limit the usefulness of tagged enums (and it's a breaking change).

@Gankro

This comment has been minimized.

Copy link
Contributor Author

Gankro commented Nov 1, 2017

This RFC only defines the layout. So you won't be able to write out a MyEnumRepr type for non-Copy payloads in Rust today, but when/if unions remove that restriction you will be able to. (and you can do whatever in C(++))

@RReverser

This comment has been minimized.

Copy link

RReverser commented Nov 1, 2017

@Gankro Right, what I wanted to reconfirm is that this RFC doesn't impose additional restriction on variant contents. And since it doesn't, that "it would be very bad if we panicked past this point" comment seems a bit excessive, since for Copy types (as currently limited by union) it's not scary at all.

@Gankro

This comment has been minimized.

Copy link
Contributor Author

Gankro commented Nov 1, 2017

This specific enum isn't the best example of it, but e.g. Option<u32>, bool, and char are all Copyable but have Undefined values that you could get by failing to complete the parse, effectively transmuting one variant into the other. This itself would only be a problem if the caller had a catch_unwind, though.

@RReverser

This comment has been minimized.

Copy link

RReverser commented Nov 1, 2017

This itself would only be a problem if the caller had a catch_unwind, though.

Ah yes, indeed, but then the problem is much more generic than this proposal.

@upsuper

This comment has been minimized.

Copy link

upsuper commented Jan 18, 2018

If it is possible to get the tag value of enum, it would also help Servo to shrink some lookup functions which are currently compiled to a jump table or lookup table. See Gecko bug 1428285.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Jan 18, 2018

@rfcbot fcp merge

We discussed this today in the @rust-lang/lang meeting. This RFC is basically documenting the existing behavior of the compiler, if I understand correctly. Clearly, it's of use for projects that are attempting to do low-level optimizations that operate directly on enums and other Rust values. Having a defined layout here thus enables a lot of use cases and doesn't seem to have a lot of downside (at least none were brought up here that I saw in skimming the thread), and we've clearly iterated a bit on the precise iteration and wound up at a pretty flexible point. (Further, there is always room to add more repr flags and things in the future as needed, so the commitment here is low.)

@rfcbot

This comment has been minimized.

Copy link

rfcbot commented Jan 18, 2018

Team member @nikomatsakis has proposed to merge this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot

This comment has been minimized.

Copy link

rfcbot commented Jan 31, 2018

🔔 This is now entering its final comment period, as per the review above. 🔔

@Gankro

This comment has been minimized.

Copy link
Contributor Author

Gankro commented Jan 31, 2018

Some data points:

  • webrender (and therefore servo and firefox-nightly-with-a-flag) is already relying on this RFC to optimize deserialization of the draw commands sent over IPC. This is done through a minor fork of serde_derive that I maintain.

  • cbindgen now supports generating C and C++ headers for both kinds of enum layout, which we intend to use to make the rust<->C++ FFI boundary in firefox more rust-idiomatic and clean.

nox added a commit to servo/servo that referenced this pull request Feb 6, 2018

Make LonghandId just be the discriminants of PropertyDeclaration 🐉🐲
This relies on the fact that `#[repr(u16)]` on enums ensure the discriminant will
be the very first field of the enum value. This has always been the case, but it
has been formally defined only since rust-lang/rfcs#2195.

https://bugzilla.mozilla.org/show_bug.cgi?id=1428285

nox added a commit to servo/servo that referenced this pull request Feb 6, 2018

Make LonghandId just be the discriminants of PropertyDeclaration 🐉🐲
This relies on the fact that `#[repr(u16)]` on enums ensure the discriminant will
be the very first field of the enum value. This has always been the case, but it
has been formally defined only since rust-lang/rfcs#2195.

https://bugzilla.mozilla.org/show_bug.cgi?id=1428285

bors-servo added a commit to servo/servo that referenced this pull request Feb 6, 2018

Auto merge of #19956 - servo:derive-all-the-things, r=emilio
Generate some PropertyDeclaration methods by hand 🐉🐲

We rely on rust-lang/rfcs#2195 to make some enums share the same discriminants by definition.

This fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1428285.

<!-- Reviewable:start -->
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/19956)
<!-- Reviewable:end -->

moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this pull request Feb 6, 2018

servo: Merge #19956 - Generate some PropertyDeclaration methods by hand
🐉🐲 (from servo:derive-all-the-things); r=emilio

We rely on rust-lang/rfcs#2195 to make some enums share the same discriminants by definition.

This fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1428285.

Source-Repo: https://github.com/servo/servo
Source-Revision: 9faf0cdce50379dfb8125d7a9c913babd56382e2

--HG--
extra : subtree_source : https%3A//hg.mozilla.org/projects/converted-servo-linear
extra : subtree_revision : 3a3b2399906a6d2fbb5c6235447dcf233619e461

julienw pushed a commit to julienw/mozilla-central that referenced this pull request Feb 7, 2018

servo: Merge #19956 - Generate some PropertyDeclaration methods by hand
🐉🐲 (from servo:derive-all-the-things); r=emilio

We rely on rust-lang/rfcs#2195 to make some enums share the same discriminants by definition.

This fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1428285.

Source-Repo: https://github.com/servo/servo
Source-Revision: 9faf0cdce50379dfb8125d7a9c913babd56382e2
@rfcbot

This comment has been minimized.

Copy link

rfcbot commented Feb 10, 2018

The final comment period is now complete.

@aturon aturon merged commit 92f44db into rust-lang:master Feb 14, 2018

@aturon

This comment has been minimized.

Copy link
Member

aturon commented Feb 14, 2018

This RFC has been merged! Thanks @Gankro!

@Gankro

This comment has been minimized.

Copy link
Contributor Author

Gankro commented Feb 15, 2018

👏🎊🎉
d0c084ab-e38a-4423-970e-4fc4d8b72fb3

@golddranks

This comment has been minimized.

Copy link

golddranks commented Jun 3, 2018

The rendered link is broken.

@Havvy

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.