New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC for anonymous variant types, a minimal ad-hoc sum type #2587

Open
wants to merge 6 commits into
base: master
from

Conversation

Projects
None yet
@eaglgenes101

eaglgenes101 commented Nov 5, 2018

Add anonymous variant types, a natural anonymous parallel to enums much like tuples are an anonymous parallel to structs.

This RFC is intentionally minimal to simplify implementation and reasoning about interactions, while remaining amenable to extensions through the ecosystem or through future proposals.

Rendered

Thanks to everyone that helped me identify points that I may have missed in the Internals thread and on Reddit.

Create 0000-anonymous-variants.md
We will go to space today!

@Centril Centril added the T-lang label Nov 5, 2018

@newpavlov

This comment has been minimized.

newpavlov commented Nov 5, 2018

UPD: See this Pre-RFC.

To not undermine the work put into this RFC, but I think that the proposed solution is quite sub-optimal and we should pursue the proper "anonymous union-types" (i.e. for which (u64 | u32 | u64) produces the same type as for (u32 | u64) and (u64 | u32)), with a proper ergonomic matching syntax additions. Of course it will require to present a solution for the generic code matching problem.

One possibility for how this functionality could look is:

struct Err1;
struct Err2(u32);
fn foo() -> (u32 | () | A | B | C) { .. }
fn bar() -> Result<(), (Err1 | Err2)> { .. }

match foo() {
    // if result has type A, the value will be stored in the `a`
    a: A => { .. }
    // we can match with a value as well
    1: u32 => { .. }
    // we probably can allow omitting explicit type if it can be inferred
    () => { .. }
    // `b` will have type (A | B | C), probably we don't want to diverge too much
    // from how matching works today
    b => { .. }
}

match bar() {
    Ok(()) => { .. },
    Err(Err1) => { .. },
    Err(e: Err2) => { .. },
}

For generic matching problem I think we should just specify that match arms are tested in order, so if on monomorphization of matching on (U | V) both types will be the same (say u32), then we will get the following code:

match uv_enum {
    v: u32 => { .. },
    v: u32 => { .. },
}

In other words only the first match arm will be executed, and the second arm will be removed. (though compiler should probably emit unreachable_patterns warning) Yes, it could lead to some bugs, but I think this behaviour will be easy to understand and find thanks to the warning.

Regarding memory representation of this type (A | B | C) could desugar to something like this:

union __AnonUnionPayloadABC { f1: A, f2: B, f3: C }
struct __AnonUnionABC {
    discriminant: TypeId,
    payload: __AnonUnionPayloadABC,
}

It will make converting from (u32 | f32) to (u32 | f32 | f64) quite easy for compiler (make sure that destination is equal or bigger and just copy bytes), and matching will be desugared to comparison of TypeIds, but as a drawback you will always have 64 bit discriminant, while in the most cases 8 bits will be more than enough.

Regarding how types like ((A | B) | (C | D)) should be handled, from the user perspective ideally it should be equivalent to (A | B | C | D), but I am not sure how exactly it should be implemented in compiler.

Show resolved Hide resolved text/0000-anonymous-variants.md Outdated
// And then match on it
match x {
(_ | _)::0(val) => assert_eq!(val, 1_i32),

This comment has been minimized.

@scottmcm

scottmcm Nov 5, 2018

Member

nit: (_) => is already stabilized as a valid (though warned-about) pattern in nightly, so I suspect this needs to be <(_ | _)>::0(val) => for the same reason you can't do ()::clone(&()).

This comment has been minimized.

@eaglgenes101

eaglgenes101 Nov 5, 2018

Changed, especially since the angled brackets are also required for tuple associated items. I'm not particularly fond of the kirby-boss syntax, but consistency helps ease implementation, which is my primary concern.

This comment has been minimized.

@eaglgenes101

eaglgenes101 Nov 7, 2018

Changed back, it's now an unresolved question, with me leaning towards the syntax depicted above.

@Centril

This comment has been minimized.

Contributor

Centril commented Nov 5, 2018

Assuming that we do want to provide structural coproducts...

As I noted on the internals thread I think that the most natural way to provide structurally typed coproducts is to take enums and then try to move minimally from them. To me that seems beneficial both for reuse of compiler machinery and to make the learning distance smaller. What does that entail practically? Keep the existing syntax and type system notions of variants but allow them to be summed together in an ad-hoc structural fashion. Two candidate syntaxes for that are (1):

type Foo = Bar | Bar(u8, f32) | Quux { field: String };
// This variant is inspired by or-patterns to give a sense of disjunction.
// Tho this syntax is likely ambiguous if we have `pat ::= ... | pat ":" type ;`
// so we will instead need:
type Foo = (Bar | Bar(u8, f32) | Quux { field: String });

and (2):

type Foo = enum { Bar, Baz(u8, f32), Quux { field: String } };

Here (2) is the most minimally different syntax from nominally typed enums. The only difference between (2) and the enums we have today is that the name has been dropped. For those who are concerned by the syntactic complication of Rust this should be appealing or at least least worst.

Another benefit of both (1) and (2) is that only the type grammar changes. Zero changes need to happen to the expression and pattern grammars; this is beneficial for making the language easier to learn. To pattern match and construct things, you simply write:

let a_foo: Foo = Baz(1, 1.0);

match a_foo {
    Bar => expr,
    Baz(x, y) => expr,
    Quux { field } => expr,
}

Both of (1) and (2) also have nice properties:

  1. The order of the variants commute. This is nice in a mathematical beauty way... but what does it buy us practically? If you can reorder the variants freely, then it is more refactoring-friendly.

  2. Because the variants have names, you can have different unit variants, e.g.

fn foo() -> enum { A, B } { 
    ...
}

This is useful for the structural nature because it allows you to invent new "flag conditions" freely.

  1. It allows easy refactoring towards a nominal type; An IDE should easily be able to take enum { A, B } and make it into a normal enum because everything beside the name of the type is already there, so there's very little to fill in.
@H2CO3

This comment has been minimized.

H2CO3 commented Nov 5, 2018

@newpavlov I think that should be a different proposal on its own. I also think that sum types are way less problematic than union types exactly because they don't try to second guess the user and filter out duplicates structurally. That filtering quickly becomes a highly nontrivial issue as soon as you start adding e.g. generics (which was pointed out on the internals forum as a possibility).

@H2CO3

This comment has been minimized.

H2CO3 commented Nov 5, 2018

That said, even for anonymous variants / anonymous sum types, I don't find the motivation convincing enough and the gains in convenience sufficient compared to the extent to which it grows the language. I have already explained why in the internals thread. However, it seems that for opinions to be counted in an RFC at all, they have to be re-iterated here, so there we go.

@eaglgenes101

This comment has been minimized.

eaglgenes101 commented Nov 5, 2018

A large factor in me deciding to use positions rather than identifiers to identify variants was that Rust currently lacks the ability to have placeholders for names and to generify over name, and to add that would require extra groundwork to make happen. And similar proposals which had just a few more conveniences than this proposal have been shot down for complexity!

Here, if you don't believe me: Similar proposal, which is almost the same as this one besides being less detailed and using a more ergonomic syntax

eaglgenes101 added some commits Nov 5, 2018

@drXor

This comment has been minimized.

drXor commented Nov 5, 2018

A quick read of the grammar indicates that parenteses are not necessary. Is this correct?

Furthermore, we should be clear that a sum can contain unnameble summands, yes?

fn foo() -> impl Copy {
  if cond {
    <_>::0(|| 0)
  } else {
    <_>::1(1)
  }
}
@eaglgenes101

This comment has been minimized.

eaglgenes101 commented Nov 5, 2018

The parentheses are not necessary, but I put an eye on the possibility on later extensions, and if multi-field variants become a thing and commas are used to separate the fields of a variant, I don't want ambiguity to result from things like this:

(f32 | i32, i32 | f64); //  Tuple of two anonymous variant types, or an anonymous variant type whose second variant has multiple fields? 

And I'm pretty sure if you make the number of variants clear and the type of each variant unambiguous, it should work. Your example as is wouldn't, but this would, as it specifies that there are two variants in one of the match arms:

fn foo() -> impl Copy {
  if cond {
    <(_|_)>::0(|| 0)
  } else {
    <_>::1(1)
  }
}
@skysch

This comment has been minimized.

skysch commented Nov 5, 2018

And similar proposals which had just a few more conveniences than this proposal have been shot down for complexity!

Here, if you don't believe me: Similar proposal, which is almost the same as this one besides being less detailed and using a more ergonomic syntax

I don't think that the previous proposals were rejected on the grounds of complexity per se, but on the grounds that the complexity was too high with respect to the advantages offered. Personally, I don't think this proposal changes that in any significant way. (Certainly not if the intention is for follow-up proposals to put things into roughly the same state. If you can't afford the car, offering to purchase the parts and assembly separately does not make it cheaper.)

@eaglgenes101

This comment has been minimized.

eaglgenes101 commented Nov 6, 2018

If you can't afford the car, offering to purchase the parts and assembly separately does not make it cheaper.

That analogy kind of implies that the proposal will be useless unless all the doodads are in place, which is not the case here. I counter-propose the analogy of a car mortgage, where the payment might be somewhat more in the long run, but spreading out the costs makes it more affordable than paying it all at once. And this proposal is designed to be easily extendible by the ecosystem even in the short term, so the doodads can be hashed out by competeing ecosystem solutions rather than being stuck in RFC discussions for months at a time and risking nonacceptance.

@eddyb

This comment has been minimized.

Member

eddyb commented Nov 6, 2018

I think the problem with this feature, and a few others, including more union-y one, is that it tries to preserve pattern-matching as a primitive even for unknown types.

A + B strictly monotonically increases information.
That is, if we use |A| to denote the number of possible values of a type, with |!| = 0 and |()| = 1, then |A + B| = |A| + |B|, which is greater than either A or B, if neither is 0 (uninhabited).
This is because you can always tell them apart, e.g.: |T + T| = |(bool, T)| = 2 * |T|.


What I want to use an anonymous "choice of one type from several" for, is not pattern-matching, but static trait dispatch - which would be done automatically by the compiler, with an enum-like tag.
Note that it's still possible to allow pattern-matching if the types are known to be disjoint.

That is, A | B where |(A | B)| = |A ∪ B| = |{ x | x ∈ A ∨ x ∈ B }|
(with the {...} in that part being set notation).
For T | T, that's just T, each value shows up once, there's no semantic duplication or space waste.

We can probably even make e.g. if x { y } else { z } have type typeof(y) | typeof(z) that collapses for all the code that compiles today, into just one type (the current one).

EDIT: to give a concrete example of the usecase I'm talking about:

fn foo() -> impl Iterator<Item = X> {
    if cond() {
        bar()
    } else {
        baz()
    }
}

I want this to work without using Either (which doesn't scale beyond two cases).
The impl Trait examples in this thread look unergonomic to the point where it might be easier to come up with a library-based solution that encodes a number into a tree of Eithers.

@eaglgenes101

This comment has been minimized.

eaglgenes101 commented Nov 6, 2018

Yes, I admit as much that this proposal doesn't reach into that area. It's similar to problems from not being able to unsize into a dynamic trait object an enum whose variants consist of a single field, all of which implement a particular trait, automatically. (At least not without the help of macros of some kind, the most recent reasonably popular one of which was implemented via macro_rules! and last updated over two years ago. Yes, the fact that there aren't more recent proc macro alternatives still confounds me.)

Anonymous variant types, by virtue of their similar semantics, should also benefit from any features added to enums. In particular, unsizing on enums into a trait object implemented by all of its variant fields will also help anonymous variant types resolve their most noticeable deficiency: the inability to dispatch over an anonymous variant type as a whole.

Perhaps this might be a feature that is worth the extra complexity to implement initially, but as of now, I'm not convinced that it won't sink the rest of the proposal under its own weight.

@eddyb

This comment has been minimized.

Member

eddyb commented Nov 6, 2018

Yes, I admit as much that this proposal doesn't reach into that area.

Fair enough.

I just don't see the point of a related RFC that doesn't tackle -> impl Trait ergonomics, I guess.

@burdges

This comment has been minimized.

burdges commented Nov 6, 2018

I donno if I understand @eddyb but yes traits sound key here: We could have enum Trait behave exactly like dyn Trait except that (a) all variants must be clear at compile time and (b) std does not expose the vtable pointer. As a result, rustc could eventually implement enum Trait as variants, not vtables. After that works then one could explore weakening trait object restrictions for enum Trait.

@eddyb

This comment has been minimized.

Member

eddyb commented Nov 6, 2018

FWIW, I never meant vtables, you'd still have tags but only where needed.

@eaglgenes101

This comment has been minimized.

eaglgenes101 commented Nov 6, 2018

I've changed the syntax of variants (for both calling and matching) from stuff like (_|_)::0 to stuff like <(_|_)>::0 for consistency with type-associated identifiers and to simplify the grammar, but I've been since told that the second would require extra work to be able to work with inferred type placeholders. Should I change it back? (I never felt good about the kirby-boss syntax anyway, and the first one I liked better for its consistency with enums. However, it would entail a bit of extra rules on the grammar, or so I've been told. Perhaps a technical trait is in order?)

@H2CO3

This comment has been minimized.

H2CO3 commented Nov 6, 2018

What I want to use an anonymous "choice of one type from several" for, is not pattern-matching, but static trait dispatch - which would be done automatically by the compiler, with an enum-like tag.
Note that it's still possible to allow pattern-matching if the types are known to be disjoint.

That is, A | B where |(A | B)| = |A ∪ B| = |{ x | x ∈ A ∨ x ∈ B }|

Ie. you want union types, not sum types. That is fine. They are problematic in a language with real, parametric generics though, for a number of reasons, and even in the absence of generics, they can carry surprises..

We can probably even make e.g. if x { y } else { z } have type typeof(y) | typeof(z) that collapses for all the code that compiles today, into just one type (the current one).

And then basically every if expression with two arbitrary (different) types in its two cases would typecheck? That sounds Bad™.

@Centril

This comment has been minimized.

Contributor

Centril commented Nov 6, 2018

@H2CO3 so to clarify, I believe @eddyb didn't want to add these union types to the surface language but rather as an implementation strategy behind the scenes for -> impl Trait.

@H2CO3

This comment has been minimized.

H2CO3 commented Nov 6, 2018

Aaah, okay – sorry, misunderstood that. That, I would support.

@eddyb

This comment has been minimized.

Member

eddyb commented Nov 6, 2018

@H2CO3 I'm actually be curious of what you mean by interactions between union types and parametricity if you can't pattern-match on them (unless you have a proof of disjointness)?

In fact, we can rely on lifetime parametricity to even do Invariant<'a> | Invariant<'static> (since no code that's generated could possibly depend on either of the cases being "active").
It's then effectively exists 'x.Invariant<'x>, which you could pass to some function/closure F: for<'a> FnOnce(Invariant<'a>).

@H2CO3

This comment has been minimized.

H2CO3 commented Nov 6, 2018

@eddyb I'm not sure what you mean about pattern matching. I've explained the problem in the post I linked. What should the following code do?

fn foo<T>(x: T | bool) {}

foo::<bool>(false);

I.e., the question is: if bool | bool == bool, then how can the compiler determine that and generate correct code (or an appropriate diagnostic) without performing additional type checking after monomorphization?

(I believe the codegen issue is much less serious if these union types are not exposed at the language level, because I could then imagine just "not doing anything", i.e. duplicating dispatch logic for every variant of the same type, which could take up more space but it would be otherwise 100% correct wrt. semantics, and probably let MIR optimization get rid of it.

But if you mean that the language should actually expose these types in a manner that they can be spelled out, other than behind an impl Trait like eg. closures, then the question of e.g. diagnostics and type checking in general still stands.)

@eddyb

This comment has been minimized.

Member

eddyb commented Nov 6, 2018

@H2CO3 foo::<bool> takes a bool because (T | bool)[bool/T] = (bool | bool) = bool.
Further checking isn't needed as long as you've already checked foo under the assumption that T could be anything, including bool.
So, e.g. you can't allow foo to check whether x is T. You can, however, let foo call x.clone() if you add a T: Clone bound because (T | bool): Clone can automatically be implemented.

The conditions for <A as Trait<B>> to be auto-implemented when e.g. B = T | U are:

  • <A as Trait<T>>::X = <A as Trait<U>>::X for all associated consts/types X
  • all the required methods of Trait have a signature with:
    • exactly one occurrence of B in args' types, and at most one in the return type
    • the argument is of type B, &B, &mut B or Pin<&mut B>
    • the return type (if it includes B) is exactly B
      • longer-term, it would also be possible to support by-value wrappers e.g. Option<B>, similar to how we could have Option<T> coerce to Option<U> if T can coerce to U
  • default methods can still have a conforming signature, in which case the compiler can ignore the default (e.g. Iterator::size_hint, which would be useful for performance)

I believe that includes Iterator and Future, which are the primary usecases here.

@H2CO3

This comment has been minimized.

H2CO3 commented Nov 6, 2018

foo::<bool> takes a bool because (T | bool)[bool/T] = (bool | bool) = bool

Certainly. I know that and you know that. But how does the union operation itself manifest at the language surface if you are not allowed to look at the type signature after instantiating foo::<T> with T = bool?

As mentioned before, I'm not concerned about trait implementations alone. The low-level details can certainly be implemented in a number of ways, including the naive approach I described above. The problem is with situations which were, again, mentioned in the internals thread, for example if spelling out the union of a type with itself is to be disallowed. Enforcing that condition can only happen after generic instantiation.

@eddyb

This comment has been minimized.

Member

eddyb commented Nov 7, 2018

FWIW I would not use the word "enum" around union types, I feel like that would be confusing.

@RalfJung Okay maybe I should've stated my point more clearly:
All of these proposals, whether they are sum or union, have a problem in common that they can solve, and that's -> impl Trait with multiple return types.
(arguably error types, too, but that's more or less the same problem anyway)

Now, if there is a proposal in this area that does not intend to solve that problem, I feel like it should state that outright. IMO, this greatly reduces the benefit/cost ratio of the proposed feature.

@eaglgenes101

This comment has been minimized.

eaglgenes101 commented Nov 7, 2018

@newpavlov

In my opnion such enums is a great generalization of existing enums, which can be viewed as a sugar for implicit creation of new wrapper types to be used the generalized enum. It also logical considering an ability to import enum variants and maybe in future use them as stand-alone types. And they fit greatly into type ascription in patterns proposal.

The variants already have type related to the enum type encompassing them, since enum unit variants are already instances of their encompassing type, and enum variants with fields are already instances of functions to their encompassing type. (This leaves behind struct-like variants, which cannot be instantiated as a thing by themselves, but must be immediately followed by field declarations within braces.)

enum SampleEnum {
    UnitVariant, // of type SampleEnum
    NiladicVariant(), // of type fn() -> SampleEnum
    MonadicVariant(()), // of type fn(()) -> SampleEnum
    DyadicVariant((), ()), // of type fn((), ()) -> SampleEnum
}
@glaebhoerl

This comment has been minimized.

Contributor

glaebhoerl commented Nov 7, 2018

@eddyb I think the two features are (or should be) almost entirely unrelated: #2414

IMO, this greatly reduces the benefit/cost ratio of the proposed feature.

I think it reduces both the cost and the benefit, while maintaining a good ratio. The tradeoffs are more or less exactly the same as with named structs vs. tuples. Could we get on just fine without tuples, only structs, and maybe a Pair<A, B> type in the standard library analogous to Result? Sure, we could. Are tuples nonetheless a nice little convenience to have available sometimes? Yes, they are. Are they fairly straightforward to implement? Also yes. Same things apply in the case of named sum types and anonymous sum types.

@dlight

This comment has been minimized.

dlight commented Nov 7, 2018

@eddyb

EDIT: to give a concrete example of the usecase I'm talking about:

fn foo() -> impl Iterator<Item = X> {
    if cond() {
        bar()
    } else {
        baz()
    }
}

I think this is confusing because looking at the code without knowing bar and baz I'd expect they would have the same type. I'd support this if you had syntax that would make it clear that you're generating an union type under the scenes, like (mock up syntax)

fn foo() -> impl Iterator<Item = X> {
    if cond() {
        union bar()
    } else {
        union baz()
    }
}

Or even

fn foo() -> union impl Iterator<Item = X> {
    if cond() {
        bar()
    } else {
        baz()
    }
}
@eaglgenes101

This comment has been minimized.

eaglgenes101 commented Nov 7, 2018

Also, for the next individuals thinking to ask why I decided on enum-like semantics rather than algebraic union semantics for these types, let me quote some paragraphs from my own RFC proposal. I had a pre-RFC period, during which points such as this were brought up, so I have a whole section and a paragraph responding to such a well-tread idea.

The choice of anonymous sum type for the proposed type is twofold. First, it allows for almost all the compiler machinery already used for enums to be reused for anonymous variant types. Enums have a whole bunch of compiler machinery dedicated to making them work and optimizing them, and duplicating much of that work just to give different semantics to a new family of types would be quite a bit for a proposal that aims to minimize implementation complexity.

Second, such types have much simpler interactions with themselves and the rest of the type system.

It may seem to be intuitive for (T|T) to be equivalent to T or (T|), or to forbid it, but there are a number of ways which a user may unwittingly create such a type, which would have to be treated as a special case. Perhaps the type was actually (U|V), where at one particular point, U and V both had the same type of T for a particular monomorphization. Perhaps the type was generated through codegen, and it happened that the user wanted to combine two errors that happened to have the same type. Perhaps the type is in generic code, written by a programmer expecting that in all cases the second case occurs at some point, so a refactoring which changes types in a seemingly unrelated part of code causes hangs because the first case is now catching all the values.

(U|V) being equivalent to (V|U) would have similar problems: what if both U and V are T? One could specify that they want the variants the other way around by specifying (V|U), but how would one specify that they wanted the variants the other way around for (T|T)? Clearly, there are a number of details to consider for algebraic union types.

Algebraic sum types are simple in comparison: (T|T) is separate from (T|) which in turn is separate from T, (U|V) is distinct from (V|U) (but can be converted with a simple shim function that also works for (T|T)), and the variants will stay distinct in generic code no matter which types are used for the variants.

The next alternative are algebraic union types, which are the primary rival to algebraic sum types I found. Their semantics make it so that the type is implicitly flattened, and types deduplicated and unordered. I considered this too, but I decided against it because they would require substantial groundwork on the compiler to create, and would require at least one of type distinctness conditions, the ability to quantify over the constituent types generically, tolerating the potential for compile errors at a distance possibly very far removed from the place the types were declared, or tolerating the potential for confusing runtime semantics resulting from the first match arm of a match statement catching cases intended for other match arms.

If you still want algebraic union types, you're free to file a rival or companion RFC proposal to this one, but I looked into them already, and I decided that the details associated with making algebraic union types work predictably and without gotchas didn't fit the proposal I wanted to make, which aims to be minimal.

@eddyb

This comment has been minimized.

Member

eddyb commented Nov 7, 2018

First, it allows for almost all the compiler machinery already used for enums to be reused for anonymous variant types. Enums have a whole bunch of compiler machinery dedicated to making them work and optimizing them, and duplicating much of that work just to give different semantics to a new family of types would be quite a bit for a proposal that aims to minimize implementation complexity.

I will otherwise stop replying to this RFC but I wanted to point out that this is not the case: for code generation, a monomorphic and normalized A | B | C would be equivalent to A + B + C.

Pretty much all of the provided downsides of union types come from the desire to work with union types only by pattern-matching on them, instead of having that an optional feature predicated on the local disjointness of the types (which is almost trivial to check).

In general, I feel like statements made regarding the difficulty/scale of compiler implementations should be evaluated by people involved with the implementation, and not just accepted as handwaved.

@eaglgenes101

This comment has been minimized.

eaglgenes101 commented Nov 7, 2018

Pretty much all of the provided downsides of union types come from the desire to work with union types only by pattern-matching on them, instead of having that an optional feature predicated on the local disjointness of the types (which is almost trivial to check).

Which makes it nearly impossible to do matches in a generic context as generics currently are, or allows for the compiler to end up type erroring suddenly because two types declared some distance apart got combined at a third point far removed from both of them into a particular algebraic union type monomorphization. Hence, for a union type, one of the following has to be chosen: don't have useful matching, get groundwork for generic type distinctness down, get groundwork for generic quantification over variants down, allow for confusing compile-time type errors, or allow for confusing runtime semantics.

Also, almost of the cases that aren't matching are going to either be performing dispatch on the value as a whole to do something wih the field or just throwing it down for the caller to deal with. I deliberately made my proposal highly amenable for the ecosystem to make up for its lack of featurefullness, so the first I expect can be done with a wrapper type that should be available for import from a crate, and the second is already trivially handled by this proposal as long as all the constituent variants are of types that implement Sized (and for types that don't, you can always box them up, since Box implements Sized).

@OvermindDL1

This comment has been minimized.

OvermindDL1 commented Nov 7, 2018

Just listing some ways that other languages handle this;;

For note, OCaml's Polymorphic Variants are essentially anonymous variant types in the global namespace. They have a tag and optional data and get compiled to the same machine code as a tuple. The data-less polymorphic variant of:

`SomeName

Gets compiled down into the tuple that is basically defined as (int, ()) where int is 56690559 (it uses a global hash/registry depending on the backend), which becomes basically just (56690559, ()) which is in assembly just the integer of 56690559. A polymorphic variant like:

`SomeName 42

Becomes compiled to (56690559, 42), and:

`SomeName (42, "string")

Becomes compiled to (56690559, (42, "string")) essentially. Although in OCaml since all types are known then it 'can' pack it pretty well (although the default backends tag all types for introspection so you can safely 'break' the type system at times, think unsafe in rust).

Another form of global variant type that does not store data is Erlang/BEAM's atom type, which is a string that is interned into a global registry and becomes just an integral index into that array everywhere (even the true/false boolean values are just atoms on the BEAM). They cannot hold data but if you want that then the common pattern is just a 'tagged tuple' like {atom, whatever, data, here} where the first value in the tuple is the atom and the other types are whatever as normal.

Both forms are common in many languages.

@Ixrec

This comment has been minimized.

Contributor

Ixrec commented Nov 7, 2018

I think it's worth mentioning that @eddyb's strawman

fn foo() -> union impl Iterator<Item = X> {
    if cond() {
        bar()
    } else {
        baz()
    }
}

is something that's come up in several past threads, usually under a name like enum impl Trait.

I am personally also of the opinion that anonymous enums and structural records and so on are sorely lacking in motivation, while enum impl Trait would actually do something about the "one-off error enums are annoying" problem statement I assume most of us have in mind here. As far as I know, the only reason we stopped discussing it for a while was because we ran out of stuff to say and needed to wait for impl Trait to get stabilized (which it finally has!).

I believe the most recent discussion of that feature has been happening on #2414, where most of the debate seems to be over whether the "marker" (enum for me, union in eddyb's comment) should go on the return type or on every returned value (just like in eddyb's comment), and how this would all integrate with the ? operator.

@newpavlov

This comment has been minimized.

newpavlov commented Nov 7, 2018

I've wrote a draft with an alternative proposal.

@eaglgenes101

This comment has been minimized.

eaglgenes101 commented Nov 7, 2018

Since many viewers seem to have concerns about the syntax, I am thinking of adding some syntactic sugar as a concession to this reality. Here's the idea:

A variant selection pattern is added as syntactic sugar to identify a variant of an anonymous variant type (and perhaps may be extended to enums in future RFCs). It is entirely surface-level, only utilizing type information that is locally known, and has the form:

$bindpattern:pat : $varianttype:ty in $sumtype:ty

The effect of this statement slightly differs inside and outside a match context:

  • Outside of a match context, if $varianttype is a type or a type placeholder that identifies exactly one of the variants of $sumtype given type information locally known in the context it is written, then $bindpattern is bound to the field of the variant uniquely identified by the type. Otherwise, compilation fails. If the variant selection pattern corresponds to more than one type, then the user is directed to narrow down the type. If the variant selection pattern corresponds to no types, then this is identified, and the user is reminded that the variant selection pattern can only infer a variant based on locally known type information.
  • Inside of a match context, no match and one match acts similarly. However, if there exists a single concrete type that $varianttype identifies that can be unified with multiple variants that match the type placeholder, then the match statement will match all of those variants, and on match, the field of the matching variant will be bound to $bindpattern. Otherwise compilation fails as usual.

Should I go forward with this and incorporate it into my RFC?

@matthieu-m

This comment has been minimized.

matthieu-m commented Nov 7, 2018

As I mentioned on the pre-RFC, it seems to me that this RFC introduces two concepts at once:

  1. Anonymous enums: the ability to define an enum without a name.
  2. Anonymous variants: the ability, within an enum, to define variants without a name.

For me, those two features are orthogonals, and their benefits should be evaluated separately. They may, together, allow more than the sum of their parts, a nice advantage to mention, however in the mean time I would rather see their individual benefits listed separately.

Examples of separation of concerns:

//  Anonymous variants are to enums what tuple structs are to structs:
enum Foo { u8 | u16 | u32 }
struct Bar(u8, u16, u32)

Whereas:

//  Anonymous enums are to enums what impl Traits in return position are to traits.
-> enum { Ok(T), NetworkError(io::Network::Error), SyntaxError(String) }
-> impl Iterator<Item = T>

And of course:

//  Anonymous enums with anonymous variants are the sum type equivalent of tuples.
enum { u8 | u16 | u32 }
(u8, u16, u32)

On a personal note, I am not convinced by the appeal of anonymous variants, but find that anonymous could reduce a lot of the boilerplate involved in returning detailed errors from a function.

@glaebhoerl

This comment has been minimized.

Contributor

glaebhoerl commented Nov 7, 2018

Should I go forward with this and incorporate it into my RFC?

(Personally I would prefer to keep things simple and symmetric with tuples, and we don't have type-based projection from tuples either. We also -- despite repeated requests - don't have type-based injection into Result, a feature which seems like it has similar tradeoffs to this one.)

I haven't followed all the discussion and don't have time for a detailed read-through of the RFC, but from a brief skim I don't see it mentioned (apart from the issue itself being linked): the syntax I came up with for anonymous sum literals and patterns way back when was positional, to mirror tuples. That is, the two variants of a two-way anonymous enum would be (x|!) and (!|x), with ! standing for all the positions where a value isn't; the variants of a three-way would be (x|!|!), (!|x|!), and (!|!|x); and so on. If the motivation for the above was that the currently-proposed numeric-variants syntax is felt to be awkward, has this maybe also been considered? (It seems, of course, unwieldy for large anonymous-sums, but as with tuples, small ones seem like they would be the majority of the use cases.)

@eaglgenes101

This comment has been minimized.

eaglgenes101 commented Nov 7, 2018

It would seem that it should be reasonable to implement such a syntax within a macro (unfortunately, such a syntax would be ambiguous with alternations in matches outside of such a macro):

anon_pat!(x|!|!|!); // expands to (_|_|_|_|)::0(x)

However, people seem to want to match by type, hence the direction I took with my idea for syntactic sugar.

@drXor

This comment has been minimized.

drXor commented Nov 7, 2018

Here's a table that summarizes what a bunch of people seem to want. I wrote it down in internals at some point, and it feels relevant.

Product Sum
Type & field names * Structs * Enums
Type names * Tuple-structs enum K(A | B);
Field names Records enum { A(i32), }
Anonymous * Tuples Sums

Starred cells are in the language. This proposal intends to fix the bottom-right-most cell.

What @eddyb seems to want doesn't fit into this table; rather, he's interested in what is functionally equivalent to a C union with type-tags, which, I think, solves a completely different problem. Enum-based RPIT is a totally unrelated thing that can be implemented, somewhat sketchily, via this proposal. I think -> enum impl Trait should be a separate mechanism, and the compiler should warn when using a sum where -> enum impl Trait would do.

Further, re: matching on types, this interacts very unpleasantly with generics: https://internals.rust-lang.org/t/pre-rfc-sum-enums/8782/4?u=drxor

@matthieu-m

This comment has been minimized.

matthieu-m commented Nov 8, 2018

A quick question: syntax-wise, for anonymous variants, would _::0 be sufficient for pattern matching and constructing? Inference should be able to deduce the type, and _:: is there only to differentiate from a regular integer.

It would be a bit more lightweight that (_ | _ | _)::2 for example.

@eaglgenes101

This comment has been minimized.

eaglgenes101 commented Nov 8, 2018

Currently, Rust doesn't (can't?) infer identifier paths, so _::None can't be inferred to be Option::None. However, this could be special-cased for patterns of the form "_::" $variantnum:num so type inference can be directed to do its work and figure out the number of variants as well as the type of each variant. Is it worth it, though?

@earthengine

This comment has been minimized.

earthengine commented Nov 9, 2018

Also I hate having to write

match v {
    (_|_|_|_)::2(10) => ...
}

I would write

match v {
    (_*2|10|..) => ...
}
@KrishnaSannasi

This comment has been minimized.

KrishnaSannasi commented Nov 9, 2018

@earthengine I don't think we should introduce so much new syntax (* in pattern matching, which is not done anywhere else in Rust), it could just be this:

match v {
    (_|_|10|_) => ...
}

Personally, I think that we could use the enum keyword like this:

match v {
    enum::2(10) => ...
}

and that would be clear enough, that way we know that we are dealing with an anonymous enum, it is short, and it doesn't introduce ambiguity like this

match v {
    (20|_|10|_) => ...
}

It looks like it could be valid syntax because _ is supposed to be an "ignore me" placeholder. But this isn't valid syntax. I think would be bad for newer users, as it would be rather confusing.

@earthengine

This comment has been minimized.

earthengine commented Nov 9, 2018

Ok I think I am with enum::2(10). Or, if necessory, enum::<4>2(10)?

@eaglgenes101

This comment has been minimized.

eaglgenes101 commented Nov 9, 2018

Currently, for ergonomics, I'm looking at lifting the anon_pat! macro I floated above to incorporation into the proposal, as quoted from my current working copy of the RFC proposal:

As a syntactic aid, the proc macro anon_pat! is provided in the standard library, which in many cases can be more convenient than typing out the type name and its variant number. Inside, the tokens consists of a number of sequence of vertical bar separated token sequences, which follow the following rules:

  • All but one of these sequences consist of either a single exclamation mark, representing a single uninhabited variants, or an exclamation mark followed by an asterisk, then a number, representing that number of uninhabited variants.
  • The odd sequence out must be @, an expression, or a pattern. If it is an @, then the macro expands to the corresponding variant itself. Otherwise, the macro expands to put the expression or pattern in a pair of parentheses after the variant.
  • If there is any number of exceptional sequences other than one, then compilation fails. If multiple exceptional sequences are adjacent, then a proc macro diagnostic suggests parenthesizing the sequences together so they will be parsed as one.

Some examples of this macro are below:

// Expands to (_|_|_|)::2(5);
anon_pat!(!|!|5); 

// Expands to (_|_|_|_|_|_|_|)::3(0.0_f32);
anon_pat!(!*3|0.0_f32|!*3);

// Expands to (_|_|_|)::0;
anon_pat!(@|!*2);

// Syntax error, no inhabited variants
anon_pat!(!*4|!|!*5);

// Syntax error, multiple inhabited variants (parenthization suggested)
anon_pat!(!|4_u64|5_u64|!);

This macro is not blessed in any way other than being provided in the Rust standard library, and only utilizes proc-macro APIs that are available to the rest of Rust.

Even though there exist operations that use the exclamation mark, the multiplication sign, and the vertical bar, I don't expect the specific patterns I chose to mark uninhabited variants to be confused with valid Rust expressions.

@Centril Centril added the data-types label Nov 10, 2018

@kestred

This comment has been minimized.

kestred commented Nov 11, 2018

FWIW~

I (last week) was feeling negatively towards this RFC, with my top two reasons being:

  1. Some iterations on the proposed syntax have been sigil heavy. Rust is well known for being (perhaps too) sigil heavy; which means that even though I definitely have clear uses for an Anonymous Discriminated Type, I know that I would really have never used it due to code legibility concerns.
  2. I'd been thinking about variations (such as the per-type discriminates rather the order discriminates) and wasn't quite convinced we should commit to a designs space restricting that

As of today, I'm favorable towards the RFC because both reasons have separately been addressed.
I have been playing around in my existing code-basis and in a new project with the hypothetical stabilization of this RFC, as well as its many variations, predecessors, and tip of the hat ideas; including both sum and union semantics.

  1. The (_|_|_) match syntax in the most recent iteration of the RFC to feels very familiar to
    (and would fit right in with) many of the tuple (_, _) match patterns that are already in code.
    On the other hand <_>, <(_|_)>, and almost any other variation relying on <> (or {})
    noticeably reduced my code comprehension speed when reading match statements.

    Additionally, I expect the majority of the uses cases where anonymous variants will be used
    is in 2- and 3- variant form, because as the type gets longer than a line or two,
    there is less motivation to avoid defining a named enum. Also, most functions that benefit
    from writing a quick return type aren't long enough to generate more than a small number
    of variants.

  2. In modifying my existing codebases and in a new project, I initially assumed that I would prefer
    anonymous union-types for both error handling and in generic code.

    For error handling, I found that union-typing turned out to be either no benefit or even that
    it was an anti-pattern where it encouraged me to conflate errors in a way in the mid-section
    of plumbing that made it difficult to handle errors correctly at the interface/application-level.

    In generic code, I mostly didn't have any sample code that cared either way. When there was
    a distinction, it always favored existing enum / sum-typing semantics; specifically:

    • When handling discriminated return types, it cared about which function the value
      returned from, and not what type it was.

    • When ingesting or passing-around discriminated types, it cared about what traits
      were impl'd for each variant which might be differently asserted for each variant
      even if they had the same type:

      fn encode<T, U>(out: &mut Out, data: (T | U))  where T: ToFoo, U: ToBar { .. }

    I think there are still valid use cases for anonymous unions, but I suspect they are
    niche enough to not need pleasant syntax.

Syntax Notes

(on other comments)

enum { bool, i32 }
match v {
   enum::2(10) => ...
}```

I'm generally not in favor of this still because I feel it looses parity with tuples for anonymous types and would cause rust to be harder to teach because of the inconsistency (i.e. if enum::2 why not struct::2 for tuples).

The (_|_|10|_)-type syntax seems reasonable, but doesn't feel motivating enough to add to the RFC; the current (_|_)::2 should be at least enough to get us on to nightly so we can start experimenting with code that actually compiles. It would have plenty of opportunities to change prior to stabilization, and it also seems very minimal to deprecate and rustfix if a better syntax is found for rust2021.

enum { A(i32), }
type _ = A(i32) | B | C;

I experimented with these as well, but found that either:

  • variant names were so short as to be meaning-minimal (as in ErrT, ErrU) and I would've been
    totally happy with auto-generated names
  • variant names quickly became to long and there was no good reason not to define a real enum

It seems that we might want to think about this style of data type in the same context as Record Types (at least in a rationale section), and I'd punt it to a different RFC.

match val {
   (!|!|x|!)
}

Not excited by this in match patterns, because it draws far too much attention. The bang character ! is explicitly an attention grabber, which where its used to draw attention to things like Hey this is a macro, or Hey this is block scoped not item scoped.

While I appreciate the natural extension of the never type (!), clustering so many bangs in one spot causes my eyes to glaze over.

† - edit, my reaction to this I think is overstated here, see followup #2587 (comment)

Errata

As I mentioned on the pre-RFC, it seems to me that this RFC introduces two concepts at once:

  • Anonymous enums: the ability to define an enum without a name.
  • Anonymous variants: the ability, within an enum, to define variants without a name.

For me, those two features are orthogonals, and their benefits should be evaluated separately. They > may, together, allow more than the sum of their parts, a nice advantage to mention, however in the > mean time I would rather see their individual benefits listed separately.

I definitely agree that the concepts are orthogonal (and did a 👍 ++), but since we don't have record types to go off of already in the language, I'm worried an anonymous enum-only RFC (the more valuable piece IMO) would only lead to more bike-shedding.

To me this RFC only reads as more complex because it is importantly including all of the historical context, but as @eaglgenes101 intended it as close to minimal as the feature could be to provide a good cost/benefit ratio.

With type ascription in patterns appearing plausibly-going-to-happen, I continue to feel strongly that > the following is the right way for "anonymous enums" to be consumed:

match x {
    y: Ipv4Addr => ...,
    z: Ipv6Addr => ...,
}```

This reminded me of similar code I already write all the time with
tuples of options, which provides me more confidence that this feature is well motivated

// maybe pattern syntax  (_, _)::0(y): Ipv4Addr is fine too though
match x {
   (y: Ipv4Addr | _) => { .. }
   (_ | z: Ipv6Addr)  => { .. }
}

// current tuples of options
match x {
   (Some(y), _) => { .. }
   (_, Some(z)) => { .. }
}
@glaebhoerl

This comment has been minimized.

Contributor

glaebhoerl commented Nov 11, 2018

So the issue with using (_|_|x|_) for patterns is that _ means "anything could be here", not that "nothing is here". It's a wildcard. Relatedly, it's probably ambiguous with disjunctive patterns.

I agree that (!|!|x|!) is not entirely ideal, but I'm not sure what a better option would be. Leaving them out -(||x|) - just looks weird, and ASCII is unfortunately very finite. One other possibility I had considered way back when was (.|.|x|.), but i'm not enamored of it either.

@kestred

This comment has been minimized.

kestred commented Nov 11, 2018

Relatedly, it's probably ambiguous with disjunctive patterns.

For full reference:

  • The pattern (_|_) => {} is always a parser error.
  • On stable (_) => {} is an error.
  • On beta/nightly (_) => {} is a warning. (rust-lang/rust#48500)

However; it still isn't ambiguous with disjunctive patterns.
Consider the parallels with tuples.

match expr { (_)  => print!("Matches any `x` expression") }  // emits compiler warning
match (3,) { (_,) => print!("Matches only tuple expression") }
match (3|) { (_|) => print!("Matches only sum expression") }

† - note, I think (_|)::0(_) (the proposed syntax) is still a sufficient pattern matching syntax
to get this into the compiler as a nightly feature that we can experiment with; but I wanted
to point out that AFAIK the "in-band" placeholder is not ambiguous with existing syntax.

‡ - edit, I see the potential issue with (_|_) which is that it doesn't provide capability to
match on particular variant (e.g. ::1) but ignore the value.

I agree that (!|!|x|!) is not entirely ideal, but I'm not sure what a better option would be.
Leaving them out -(||x|) - just looks weird, and ASCII is unfortunately very finite.

Fair, I think my reaction to this was too gut--- with spacing (_ | ! | !) and short tuples (_| !) its not as noisy. This seems to further imply that we need basic support for anonymous variants on nightly so we could experiment through macros.

match_anon!( x {
   _*$&%@~  =>  ..
})
@RalfJung

This comment has been minimized.

Member

RalfJung commented Nov 12, 2018

How does it not help replace them?

If the intention is to replace enums, they should have the usual pattern matching. Union types cannot have that. enum { Left(T), Right(T) } should be ale to distinguish the two sides.


@eddyb

All of these proposals, whether they are sum or union, have a problem in common that they can solve, and that's -> impl Trait with multiple return types.
(arguably error types, too, but that's more or less the same problem anyway)

Now, if there is a proposal in this area that does not intend to solve that problem, I feel like it should state that outright. IMO, this greatly reduces the benefit/cost ratio of the proposed feature.

I disagree. I do not think RFCs have to list the problems they do not want to solve. That seems like a rather strange requirement.

You may think this RFC should solve that problem and it does not, and that is fair criticism.


@glaebhoerl wrote

I think it reduces both the cost and the benefit, while maintaining a good ratio. The tradeoffs are more or less exactly the same as with named structs vs. tuples. Could we get on just fine without tuples, only structs, and maybe a Pair<A, B> type in the standard library analogous to Result? Sure, we could. Are tuples nonetheless a nice little convenience to have available sometimes? Yes, they are. Are they fairly straightforward to implement? Also yes. Same things apply in the case of named sum types and anonymous sum types.

and I have nothing to add to that, my thoughts exactly. :)


@Ixrec

I am personally also of the opinion that anonymous enums and structural records and so on are sorely lacking in motivation, while enum impl Trait would actually do something about the "one-off error enums are annoying" problem statement I assume most of us have in mind here.

I am not sure what you mean, but this would solve one-off enums, because this is literally proposing exactly that: anonymous, structural enums. If this got accepted I have a few functions that I would immediately change to return an anonymous enum instead of Result.


@matthieu-m

  1. Anonymous enums: the ability to define an enum without a name.
  2. Anonymous variants: the ability, within an enum, to define variants without a name.

Ah, good point. I had not thought about that.


In terms of pattern matching syntax, I have to say I think I will dislike anything that requires me to spell out the number of variants. Pattern matching on an enum is similar to projecting on a tuple, and so just like we can write tuple.2 without saying how many fields tuple has, we should be able to do a match and check if this is the 2nd enum variant without repeating how many variants the enum has.

If we follow the proposals here, then to check all cases of an n-variant anonymous enum, we would have to write n branches with n cases in the patterns each, leading to an overall pattern size of n². I do not think we should build in quadratic complexity this way.

@glaebhoerl

This comment has been minimized.

Contributor

glaebhoerl commented Nov 12, 2018

In terms of pattern matching syntax, I have to say I think I will dislike anything that requires me to spell out the number of variants. Pattern matching on an enum is similar to projecting on a tuple, and so just like we can write tuple.2 without saying how many fields tuple has, we should be able to do a match and check if this is the 2nd enum variant without repeating how many variants the enum has.

If we follow the proposals here, then to check all cases of an n-variant anonymous enum, we would have to write n branches with n cases in the patterns each, leading to an overall pattern size of n². I do not think we should build in quadratic complexity this way.

I agree the quadratic blowup would be unfortunate (though hope the impact would be limited in practice by the arities typically being small (though that may end up being a consequence rather than a cause!)).

The underlying asymmetry in the language which causes this issue to be fraught, though, is that Rust has type-based name resolution for struct fields -- I can say foo.bar without saying which type bar is a field of (a generalization of saying tuple.2 without saying how many fields tuple has) -- but not for enum variants: I can't match on a pattern like Bar(foo) without saying which enum Bar is a variant of!

(The meaning of "saying" here is slightly nuanced. It means "specifying at the level of name resolution, rather than at the level of type checking and inference".)

That, in turn, descends from the grand old wart of interpreting patterns as either binding a new name or referencing an existing one based on whether an existing one is in scope. That was the rock upon which an earlier proposal to apply TDNR to enum variants (at least within match) foundered: #1949

Now, maybe we could make an exception for anonymous enums. After all, precisely since their variants don't have a name as such, their interpretation could be unambiguous. (But we might have to build out much of the general-case infrastructure for it, while only being able to benefit from it in the specific case.)

@burdges

This comment has been minimized.

burdges commented Nov 12, 2018

As an aside, the union Trait approach could support "additional trait matching" instead of or along with "pattern matching", so..

fn foo(...) -> union Iterator<Item=T> {
    ...
}

let x = foo(...);
if x satisfies DoubleEndedIterator { ... } 
else if x is Chain<_,_> { ... }
@eaglgenes101

This comment has been minimized.

eaglgenes101 commented Nov 13, 2018

It seems like the quadratic text space required, while worse than what could be, shouldn't be a concern in practice. For small numbers of cases, typing the correct number of underscores and vertical bars shouldn't take up much time and space. For larger number of cases, one can synthesize a proc macro with relative ease that generates a anonymous variant type placeholder from the number of variants desired, which clamps down the text space usage of a full match to linearithmic. If that isn't enough to not flood the screen with type declaration macros, then I say you're getting to the point where you should probably rethink what in the world you're doing with so many variants, and maybe consider refactoring to a proper named enum, a trait, or a struct/tuple of orthogonal enumerable parts.

@glaebhoerl

This comment has been minimized.

Contributor

glaebhoerl commented Nov 13, 2018

Yeah. Another thing that occurred to me right after I posted my previous comment was that there already is a quadratic, or at least n*k blowup, which exerts downward pressure on the number of components in tuples as well as anonymous sums -- namely you have to write out all of the components each time in type signatures. Unlike with named types. Now, you could introduce a type synonym for it, but if you're writing a type definition then you may well as write a struct or enum definition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment