Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow arbitrary enums to have explicit discriminants #2363

Open
wants to merge 2 commits into
base: master
from

Conversation

Projects
None yet
@nox
Copy link
Contributor

commented Mar 16, 2018

Summary

This RFC gives users a way to control the discriminants of variants of all enumerations, not just the ones that are shaped like C-like enums (i.e. where all the variants have no fields).

Thanks

Thanks to Mazdak Farrokhzad (@Centril) and Simon Sapin (@SimonSapin) for the reviews, and my local bakery for their delicious baguettes. 馃馃


Rendered

@Centril Centril added the T-lang label Mar 16, 2018

This introduces one more knob to the representation of enumerations.

# Rationale and alternatives
[alternatives]: #alternatives

This comment has been minimized.

Copy link
@oli-obk

oli-obk Mar 16, 2018

Contributor

I think you have encountered the XY Problem. You are finding a solution to the problem that different enums don't give guarantees about their variant indices, when what you originally wanted was converting enums whose variants have fields into enums whose variants have no fields.

So here's my alternative:

Allow annotating enums with #[discriminant(OtherEnum)], where OtherEnum must have a variant for each variant in the annotated enum (with matching name). The annotated enum will then take all its discriminant values from OtherEnum.

as casts could then convert from annotated enums to their discriminant enum.

This comment has been minimized.

Copy link
@eddyb

eddyb Mar 16, 2018

Member

That seems more convoluted, IMO. Definitely harder to implement, at least.

This comment has been minimized.

Copy link
@oli-obk

oli-obk Mar 16, 2018

Contributor

Sure, but it has a path forward for having this operation in safe code.

Alternatively a procedural macro could generate said conversions with this RFC. So the compiler magic might not be necessary.

This comment has been minimized.

Copy link
@nox

nox Mar 16, 2018

Author Contributor

This would be great and nice, if only real world allowed me to do that.

I simplified my use case for the sake of the RFC: PropertyDeclaration actually has more variants than LonghandId, thanks to custom CSS properties (and some other stuff that I don't need to mention here):

#[repr(u16)]
enum PropertyDeclaration {
    Color(Color),
    Height(Length),
    InlineSize(Length),
    TransformOrigin(TransformOrigin),
    Custom(CustomDeclaration),
}

struct CustomDeclaration {
    name: Name,
    value: Value,
}

pub enum PropertyDeclarationId<'a> {
    Longhand(LonghandId),
    Custom(&'a Name),
}

impl PropertyDeclaration {
    pub fn id(&self) -> PropertyDeclarationId {
        if let PropertyDeclaration::Custom(ref declaration) =  *self {
            return PropertyDeclarationId::Custom(&declaration.name);
        }
        let id = unsafe { *(self as *const _ as *const LonghandId) };
        PropertyDeclarationId::Longhand(id)
    }
}

In general, I think this alternative is too specific to the exact use case described in the RFC, and cannot fulfil more intricate ones like the actual stuff I require in Servo, or use cases I could imagine for this feature with FFI, @Gankro may have an opinion on that regard here.

Edit: I'll edit this RFC to include ite nonetheless.

This comment has been minimized.

Copy link
@Diggsey

Diggsey Mar 16, 2018

Contributor

You mention in the RFC that rust is generating a 4KB jump table for the required match expression - this seems excessive, even if the discriminants don't match exactly. How many variants does this enum have? Maybe part of the solution should be improving the code that rust generates in this case.

This comment has been minimized.

Copy link
@nox

nox Mar 16, 2018

Author Contributor

@Diggsey There are >320 variants, so far. I welcome any rustc improvement but they won't be able to remove all jump tables if the discriminants of the common variants to AnimationValue and PropertyDeclaration don't coincide.

This comment has been minimized.

Copy link
@nox

nox Mar 16, 2018

Author Contributor

@Diggsey Also if you are curious, here is the PR where PropertyDeclaration::id was slimmed down from 4KB to mere 96 bytes, and there is the PR where I simplified AnimationValue::id.

@burdges

This comment has been minimized.

Copy link

commented Mar 17, 2018

It'd be cool if this worked with ATCs to permit defining both discriminant and enum together:

trail Fields { type Id<T>; }
struct YesFields {}
impl Fields for YesFields { type It<T> = T; }
struct NoFields {}
impl Fields for NoFields { type It<T> = (); }

enum MyEnumInner<F: Fields> {
    Variant1(F::Id::<u64>) = MyDiscriminant::Variant1(()),
    Variant2(F::Id::<String>) = MyDiscriminant::Variant2(()),
    ...
}

pub type MyEnum = MyEnumInner<YesFields>;
pub type MyDiscriminant = MyEnumInner<NoFields>;

We've a circular definition above, but rustc could presumably break this cycle.

@nox

This comment has been minimized.

Copy link
Contributor Author

commented Mar 17, 2018

@burdges That seems unrelated to the RFC described in this pull request.

@mark-i-m

This comment has been minimized.

Copy link
Contributor

commented Mar 19, 2018

I can totally sympathize with this RFC. I have often wanted to go from an enum with fields to a value of another type.

My biggest two gripes against this RFC are

  1. The casting here is annoying: LonghandId::Color as u16,
  2. This really shouldn't require any unsafe code at all: unsafe { *(self as *const Self as *const LonghandId) }

In that light I propose the following variant/alternative:

// same as before
#[derive(Clone, Copy)]
#[repr(u16)]
enum LonghandId {
    Color,
    Height,
    InlineSize,
    TransformOrigin,
}

// explicitly mention that the given type is the descriminant (it can also be a struct!), 
// but the compiler needs to be able to prove at compile time that two values of the
// descriminant type are actually different. Note: different means different bit patterns, 
// not different in the `Eq` sense.
#[repr(LonghandId)]
enum AnimationValue {
    Color(Color) = LonghandId::Color, // no cast
    Height(Length) = LonghandId::Height, // also each descriminant assignment must be unique!
    InlineSize(Void) = LonghandId::InlineSize,
    TransformOrigin(TransformOrigin) = LonghandId::TransformOrigin,
}

impl AnimationValue {
    fn id(&self) -> LonghandId {
        (*self) as LonghandId // compiler can just return the descriminant! Completely safe!
    }
}
@burdges

This comment has been minimized.

Copy link

commented Mar 20, 2018

We could support discriminants with their own fields too, making this these enums into extensions. I suppose #[extends(LonghandId)] would be an alternative to #[repr(LonghandId)], but certainly #[repr(..)] makes sense.

@nox

This comment has been minimized.

Copy link
Contributor Author

commented Mar 20, 2018

Please, I have already addressed that this doesn't fit my use case in #2363 (comment) and the RFC already includes such an encoding as an alternative, except with #[discriminant(Foo)] instead of
#[repr(Foo)].

Removing the need for unsafe code for the cast is a nice thought, but again it is unrelated to this RFC.

@mark-i-m

This comment has been minimized.

Copy link
Contributor

commented Mar 21, 2018

@nox Sorry if this is a stupid question, but I still don't understand how the two proposals are not functionally equivalent. It seems like everything you can do in one proposal, you can do in the other. What am I missing?

@RalfJung

This comment has been minimized.

Copy link
Member

commented Mar 21, 2018

I assume this is restricted to enums with a non-Rust repr? Otherwise, doesn't the part that says "if no set explicitly, start with 0 and count up" clash with the niche-filling enum optimizations?

@SimonSapin

This comment has been minimized.

Copy link
Contributor

commented Mar 21, 2018

@RalfJung I had the same reaction. nox and eddyb explained that this RFC controls the discriminant of each variant, but this is a separate concept from the tag that may or may not be part of the memory representation. #[repr] forces a tag to be present, containing the discriminant.

But yes, since mem::Discriminant is opaque this new syntax is only useful when also used with #[repr].

@nox

This comment has been minimized.

Copy link
Contributor Author

commented Mar 21, 2018

@mark-i-m #2363 (comment) One of the enums with fields has more variants than LonghandId. Your (and @oli-obk's) alternative is a strict subset of what is allowed by this RFC.

@RalfJung

This comment has been minimized.

Copy link
Member

commented Mar 21, 2018

@SimonSapin I am not sure I understand, but if the feature is not useful for repr(Rust), maybe it'd be better to not allow it for now?

@nox

This comment has been minimized.

Copy link
Contributor Author

commented Mar 21, 2018

@RalfJung I listed it in the unresolved questions. I could imagine rustc managing to use niche-filling optimisations for 2 different variants with an inner enum, if it could see that they use disjoint sets of discriminants and that there is an easy way to compute the discriminant from the niche value. This is quite theoretical though so I don't mind mandating a #[repr].

@Evrey

This comment has been minimized.

Copy link

commented Mar 21, 2018

I realy like the Variant(...) = ID, syntax. It keeps enums compact and prevents error-prone double typing.

With #[discriminant(X)] you'd have a bunch of problems and questions:

  • Is X truely just a numeric C-style enum?
  • Is X an enum at all?
  • Does X have the same #[repr({integer})]?
  • Does X have the very same number of variants?
  • Do Xs variants have the exact same names?
  • Do Xs variants have the exact same order?
  • Aaand having an additional X more than doubles your lines of code by adding redundant bloat.

Also, #[discriminant = {expr}] doubles your lines of code, or triples with readability line breaks, while having a more "magic"-ish syntax than just using the well-known Variant = {expr},.


There are two problems left for my taste:

  1. How does one name those explicit discriminants?
  2. How does one get those discriminants?

Now, about the first point... I especially like the Variant(...) = {expr}, proposal, because it looks just like C-style enums with Variant = {expr},. We do not want to match on magic numbers, especially not for huge enums, which is why Servo has those MyEnum and MyEnumId redundancies. It is much more understandable to match on MyEnumId::X than writing 0x1023. But wait, MyEnumId::X is how you get the discriminant of a C-style enum! So, why not allow this?

#[repr(u8)]
enum X {
    A(u32) = 0,
    B(f32) = 1,
}

assert!(X::A(42).id() == X::A);
assert!(0_u8 == X::A);

This could, however, mean trouble if type A = X::A is a thing. (It isn't.) Whatever. This way you do not need a second Id enum at all, while also not having to learn new syntax.

So... how do we get that discriminant? Writing a pointer-converting boiler plate function called fn id() all the time is no fun, especially for new magic syntax that is supposed to reduce the amount of boiler plate code. We already already have core::mem::discriminant, but this one is of not much help here. You can only use this for equality checks, it takes tremendous 64 bits, always, no matter the #[repr({integer})], writing mem::discriminant(X::A) == mem::discriminant(x) is painfully verbose, and you cannot really get the integer-representation out of it for e.g. FFI or RPC.

I'm not really up to date on the #[repr({integer})]-on-enums thing, but what about making this attribute implementing a const fn id_{integer}(&self) ->{integer} on the enum? Or implement an EnumRepr<{integer}> trait on it or whatever. Okay, the name may be different. fn id() alone might clash too much with common function names, fn discriminant() is so verbose that you might as well just impl Into<{integer}> on core::mem::Discriminant.

An alternative would be to treat enums logically as a (Discriminant, UnionOfVariants), or C-style enums as just (Discriminant). Then, instead of having to decide on a magic function name or trait names, we write this: X::A(42).0 == X::A. This has quite some downsides, however, like what does X::A(42).1 yield, do we want that, is it safe, what about non-#[repr({integer})]-enums, especially for those with highly optimised layouts like Option<bool>, etc.


Edit: Just noticed this:

  • under the default representation, the specified discriminants are interpreted as isize;

Under the default representation, there might be heavy layout optimisations at work merging discriminants of nested enums into a completely different numeric set. In addition to that core::mem::Discriminant chose u64 in its opaque implementation. I.e., applying this syntax to default representation enums is either nonsensical or must block layout optimisations. The exact numeric discriminant type is debatable.

@nox

This comment has been minimized.

Copy link
Contributor Author

commented Mar 21, 2018

@Evrey I'll add a commit listing all your arguments against #[discriminant], thanks!


How does one name those explicit discriminants?

We can't interpret X::A as some sort of magical ID enum, because X::A is already a function of type u32 -> X in your sample. Note that in Servo we also store the LonghandId enum in various places, without any PropertyDeclaration or AnimationValue anywhere. Please also note that as I mentioned multiple times already, PropertyDeclaration has more variants than LonghandId, so we can't directly tie the two in any way. So even if there was some magical way to say Enum::Discriminant to refer to that enum's discriminant type, we would still have a LonghandId enum in Servo.

How does one get those discriminants?

I agree that a way to safely get the u8 discriminant of an #[repr(u8)] enum value would be useful, but that's orthogonal to that RFC: after all #[repr(u8)] and #[repr(C, u8)] on enums landed without such an API.

@mark-i-m

This comment has been minimized.

Copy link
Contributor

commented Mar 22, 2018

I see... but I still have the same two annoyances against the current form:

  • The casting here is annoying: LonghandId::Color as u16
  • This really shouldn't require any unsafe code at all: unsafe { *(self as *const Self as *const LonghandId) }

Especially the second one seems like an essential part of any reasonable solution IMHO...

@SimonSapin

This comment has been minimized.

Copy link
Contributor

commented Mar 22, 2018

@mark-i-m I think that these two points are valid independently of this RFC. (Re unsafe code, see RFC 2195 in particular.) Improvements there can be proposed separately without necessarily blocking this RFC.

@nox

This comment has been minimized.

Copy link
Contributor Author

commented Mar 22, 2018

The casting here is annoying: LonghandId::Color as u16

This really shouldn't require any unsafe code at all: unsafe { *(self as *const Self as *const LonghandId) }

Given what I said about PropertyDeclaration having more variants than LonghandId, the casting cannot be avoided in my particular use case. Do you see why?

@main--

This comment has been minimized.

Copy link

commented Mar 22, 2018

This is unfortunately quite painful to use, given now all methods matching against AnimationValue need to have dummy arms for all of these variants:

Note that this is #1872. Feels like a quite elegant workaround to me - the unsafe cast asserts that the ! variants are dead and then you can just match without thinking about them.

@nox

This comment has been minimized.

Copy link
Contributor Author

commented Mar 22, 2018

@main-- I know about this RFC, but it has been postponed and I don't think I should have to rely on that instead of just not having ! variants.

@main--

This comment has been minimized.

Copy link

commented Mar 22, 2018

@nox I mean, you're essentially trying to come up with a way to define Rust enums with gaps in their discriminator space - this RFC proposes to explicitly specify the remaining ones whereas I suggested to list the missing ones. It's fundamentally the same thing, depending on the usecase either one may be more convenient. Yet I favor the ! solution as it doesn't add new syntax and complexity to enums but instead simply relies on a fix to match behavior that should have happened a long time ago (in my opinion). It just feels like a much more generic approach to me that can help with this problem and several others too.

@nox

This comment has been minimized.

Copy link
Contributor Author

commented Mar 22, 2018

Having to specifically define variants that are uninhabited for no other reason than to make 2 discriminants in 2 different enums coincide is a damn workaround that I will not qualify as "fundamentally the same thing" as being able to omit said variants. Why would I pollute my mind and my type definition with variants that I won't use?

@main--

This comment has been minimized.

Copy link

commented Mar 22, 2018

I guess if most of your variants are dead it makes considerably less sense than, say, one of thirty. The thing is - comparing a language feature tailored specifically to this problem to a general mechanism that the language already offers (minus a papercut) just isn't fair, hence my reluctance to call it a workaround.

My point is that the alternative is not simply omitting the dead variants. Instead, you have to manually specify discriminants. When I say it's the same thing what I mean is that we're talking about two different ways to specify enum discriminants (same outcome). Your proposal introduces a new language feature which is clearly an improvement over the (mostly) existing workaround, I merely doubt that it's enough of an improvement to justify the complexity.

@nox

This comment has been minimized.

Copy link
Contributor Author

commented Mar 22, 2018

This is a very small addition to the language, semantics-wise and implementation-wise. As far as I remember discussions with @eddyb, this will be easily implementable.

@SimonSapin

This comment has been minimized.

Copy link
Contributor

commented Mar 22, 2018

I don鈥檛 believe added complexity is significant here, the same syntax already exists on field-less variants.

@mark-i-m

This comment has been minimized.

Copy link
Contributor

commented Mar 23, 2018

I went back and re-read the RFC and I think my objections are more specific about the example. The actual proposal is just about allowing Variant(..) = Disc, which I think is quite reasonable. I don't think y it's the best solution to the give example use case, but I'm not opposed to the rfc on its own merit...

@SimonSapin

This comment has been minimized.

Copy link
Contributor

commented Mar 23, 2018

(Aside: there鈥檚 a bunch of things in Stylo that are objectionable ;) (I say this as the author of many of these things.) A lot of it can be improved but the devil is in the corner cases, and better solutions are often not as easy as they first seem.)

@clarfon

This comment has been minimized.

Copy link
Contributor

commented Mar 30, 2018

I may be not reading correctly, but does this work for repr(Rust) enums too? And would:

enum A {
   B = 0,
}

have size zero or one? Also, would:

enum B {
    B = 1,
    C(!) = 256,
} 

have size 0 or 2?

(These are super esoteric examples but it seems like a valid thing to ask considering how imho these don't make sense, but would be accepted.)

@eddyb

This comment has been minimized.

Copy link
Member

commented Sep 11, 2018

I think mem::discriminant should expose discriminants for things like ordering, or even just outright providing the value, all we need is a way to handle signedness (which we can do by having Discriminant<T> query T's "discriminant signedness" with an intrinsic or something).

There's an important distinction to make, though:

  • "discriminants" are the explicit values written by the user
    • when not provided explicitly, they're defined to be "previous + 1", starting at 0
  • "tags" are the implementation-defined in-memory encoding
    • (only for the cases where the implementation isn't using invalid values aka "niches")
    • they can differ from the discriminant (only for repr(Rust)) and are never exposed
    • they only match today for performance reasons (decoding is a noop)

(see also rust-lang/rust#49938)

@nox

This comment has been minimized.

Copy link
Contributor Author

commented Dec 20, 2018

What is blocking this?

@nox

This comment has been minimized.

Copy link
Contributor Author

commented Apr 9, 2019

Ping on this. How do we go forward for this RFC?

@eddyb

This comment has been minimized.

Copy link
Member

commented Apr 9, 2019

cc @Centril @joshtriplett How would we start an FCP merge?

@eddyb eddyb added the I-nominated label Apr 9, 2019

@joshtriplett

This comment has been minimized.

Copy link
Member

commented Apr 9, 2019

I like this, and it seems reasonable to me. I agree with @eddyb, let's talk about it in a lang team meeting. And I think we should fcp merge this if we get consensus in that meeting.

@Centril

This comment has been minimized.

Copy link
Contributor

commented Apr 11, 2019

This RFC seems small and reasonable,

@rfcbot merge

@rfcbot

This comment has been minimized.

Copy link

commented Apr 11, 2019

Team member @Centril has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@pnkfelix

This comment has been minimized.

Copy link
Member

commented Apr 11, 2019

My only comment is that I wish the syntax, somehow, had the numeric value closer to the variant name itself.

But I don't actually have any good suggestion for how to accomplish this. The main options I considered were

enum ParisianSandwichIngredient {
    0 = Bread(BreadKind),
    1 = Ham(HamKind),
    2 = Butter(ButterKind),
}

or

enum ParisianSandwichIngredient {
    0: Bread(BreadKind),
    1: Ham(HamKind),
    2: Butter(ButterKind),
}

or

enum ParisianSandwichIngredient {
    Bread = 0(BreadKind),
    Ham = 1(HamKind),
    Butter = 2(ButterKind),
}

but none of these strike me as better. So really my complaint is not with this RFC, but rather my past self for not trying to incorporate this feature five years ago in the pre-1.0 days.

@eddyb

This comment has been minimized.

Copy link
Member

commented Apr 12, 2019

@pnkfelix I'd say I prefer 0 => Bread(BreadKind), but I agree with your conclusion.

@joshtriplett

This comment has been minimized.

Copy link
Member

commented Apr 12, 2019

@nox

This comment has been minimized.

Copy link
Contributor Author

commented Apr 12, 2019

I don't like this syntax at all.

It's completely at odds with C-like enums, which means that if you currently have a C-like enum and then add a field to one of its variants, you must rewrite the entire type definition.

It also puts more emphasis on the discriminant rather than the damn variant, even though the discriminants most probably aren't the most important part of the enum. It's certainly not the most important part of my use case.

That syntax also makes the example from the RFC horrible:

#[repr(u16)]
enum AnimationValue {
    LonghandId::Color as u16 => Color(Color),
    LonghandId::Height as u16 => Height(Length),
    LonghandId::TransformOrigin as u16 => TransformOrigin(TransformOrigin),
}

It may also be a problem to macro authors, who may want to be passed an enum definition and emit explicit discriminants for them, and now they must first check whether any of the variants have fields.

Edit: But I do like a lot that @pnkfelix named his example ParisianSandwichIngredient.

@joshtriplett

This comment has been minimized.

Copy link
Member

commented Apr 12, 2019

@nox

This comment has been minimized.

Copy link
Contributor Author

commented Apr 12, 2019

I'd taken it as implicit that the same syntax would be permitted for C-like enums.

Oh I see. That part is fine then. I still don't think it makes for a readable type definition in presence of complicated discriminant expressions.

@joshtriplett

This comment has been minimized.

Copy link
Member

commented Apr 12, 2019

@burdges

This comment has been minimized.

Copy link

commented Apr 12, 2019

It's easier to make the other form work with { } although maybe this just adds confusion.

enum ParisianSandwichIngredient {
    Bread = 0  { bread_kind: BreadKind },
    Ham = 1 { ham_kind: HamKind },
    Butter(ButterKind) = 2,
}
@withoutboats

This comment has been minimized.

Copy link
Contributor

commented Apr 12, 2019

I'd prefer not to invent a new syntax for this feature and just keep with the obvious extension to what we have, even if its not perfect.

@eddyb

This comment has been minimized.

Copy link
Member

commented Apr 13, 2019

Yeah, I was only replying to @pnkfelix, but I'm not proposing/in favor of a syntax different than the one in this RFC.

@SimonSapin SimonSapin referenced this pull request Apr 15, 2019

Open

Discriminant bits #2684

@rfcbot

This comment has been minimized.

Copy link

commented Apr 24, 2019

馃敂 This is now entering its final comment period, as per the review above. 馃敂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can鈥檛 perform that action at this time.