Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anonymous sum types #294

Open
rust-highfive opened this issue Sep 24, 2014 · 64 comments
Open

Anonymous sum types #294

rust-highfive opened this issue Sep 24, 2014 · 64 comments
Labels
A-data-types RFCs about data-types A-structural-typing Proposals relating to structural typing. A-sum-types Sum types related proposals. A-syntax Syntax related proposals & ideas T-lang Relevant to the language team, which will review and decide on the RFC.

Comments

@rust-highfive
Copy link

Issue by glaebhoerl
Saturday Aug 03, 2013 at 23:58 GMT

For earlier discussion, see rust-lang/rust#8277

This issue was labelled with: B-RFC in the Rust repository


Rust has an anonymous form of product types (structs), namely tuples, but not sum types (enums). One reason is that it's not obvious what syntax they could use, especially their variants. The first variant of an anonymous sum type with three variants needs to be syntactically distinct not just from the second and third variant of the same type, but also from the first variant of all other anonymous sum types with different numbers of variants.

Here's an idea I think is decent:

A type would look like this: (~str|int|int). In other words, very similar to a tuple, but with pipes instead of commas (signifying or instead of and).

A value would have the same shape (also like tuples), with a value of appropriate type in one of the "slots" and nothing in the rest:

let foo: (~str|int|int) = (!|!|666);
match foo {
    (s|!|!) => println(fmt!("string in first: %?", s)),
    (!|n|!) => println(fmt!("int in second: %?", n)),
    (!|!|m) => println(fmt!("int in third: %?", m))
} 

(Nothing is a bikeshed, other possible colors for it include whitespace, ., and -. _ means something is there we're just not giving it a name, so it's not suitable for "nothing is there". ! has nothing-connotations from the negation operator and the return type of functions that don't.)

I'm not sure whether this conflicts syntax-wise with closures and/or negation.

Another necessary condition for this should be demand for it. This ticket is to keep a record of the idea, in case someone else has demand but not syntax. (If the Bikesheds section of the wiki is a better place, I'm happy to move it.)

SEE ALSO

@huonw
Copy link
Member

huonw commented Jul 16, 2015

cc #402, #514, #1154

@ticki
Copy link
Contributor

ticki commented Dec 10, 2015

What's the state of this?

@nrc nrc added the T-lang Relevant to the language team, which will review and decide on the RFC. label Aug 17, 2016
@Rufflewind
Copy link

Compared to tuples, anonymous enums would become increasingly tedious to use since a match statement would have N^2 pipe (|) characters. At the expense of type inference, it may be better to go with a syntax like:

let foo: enum(String, int, int) = enum 2(666);
match foo {
    enum 0(s) => println!("string in first: {:?}", s),
    enum 1(n) => println!("int in second: {:?}", n),
    enum 2(m) => println!("int in third: {:?}", m),
}

The syntax would be compatible with a future extension that allows enums to be declared with named choices:

let foo: enum { Red(String), Green(int), Blue(int) } = enum Blue(666);
match foo {
    enum Red(s) => println!("string in first: {:?}", s),
    enum Green(n) => println!("int in second: {:?}", n),
    enum Blue(m) => println!("int in third: {:?}", m),
}

@eddyb
Copy link
Member

eddyb commented Jan 27, 2017

I think the feature would be more useful without allowing matching, just doing trait dispatch. I guess it's a different feature, where T|T has T's representation, as opposed to one bit more.

@plietar
Copy link

plietar commented Feb 17, 2017

@eddyb I've been putting some thoughts into a feature like that
I've posted about it on irlo : https://internals.rust-lang.org/t/pre-rfc-anonymous-enum-which-automatically-implement-forwarding-traits/4806

@burdges
Copy link

burdges commented Feb 17, 2017

I'd think an Alloca<Trait> analog of Box<Trait> would provide the same functionality as this return enum expr extension of -> impl Trait idea, except there is dynamic dispatch in Alloca<Trait> so optimization suffers.

@OvermindDL1
Copy link

Passing by, but if you are curious in syntax's then OCaml has anonymous sum types called Polymorphic Variants. Basically they are just a name, like `Blah, which can have optional values. An example of the syntax:

# let three = `Int 3;;
val three : [> `Int of int ] = `Int 3
# let four = `Float 4.;;
val four : [> `Float of float ] = `Float 4.
# let nan = `Not_a_number;;
val nan : [> `Not_a_number ] = `Not_a_number
# let list = [three; four; nan];;
val list  : [> `Float of float | `Int of int | `Not_a_number ] list

The val lines are the types of the let assignments, left in to see how the typing works.

In the back-end at assembly time the names are given a globally unique integer (in the current implementation it is via hashing, a chance of collision but overall the chance is extremely low as well as warnings can be put in place to catch them), however I've seen talk of making a global registry so they just get incremented on first access efficiently.

A plain Polymorphic Variant with no data is represented internally as an integer:

`Blah

Becomes the integer 737303889 (yes I checked), and comparing those are trivial.
For Polymorphic variants that can hold data (either a single element or a tuple of elements) such as:

`Blah (42, 6.28)

Gets encoded internally as an array of two fields in assembly, the first is the above number as before, the second is the pointer to the data of the tuple (although in most cases these all get inlined into the same memory in OCaml due to inlining and optimization passes). In the typing system the above would be [> Blah of int * float ](in OCaml the types of a tuple are separated by*`).

However, about Polymorphic variants is that they can be opened or closed. Any system can pass any of them that they want, including passing through if you want. For example, a simple way to handle something like a generic event in OCaml would be like:

let f next x = match x with
  | `Blah x -> do_something_with x
  | `Foobar -> do_something_else ()
  | unhandled -> next unhandled

Which is entirely type safe, dependent on what each function handles down the chain and all.

The big thing on the typing system is that things can be open or close typed, I.E. they either accept any amount of Polymorphic Variants or a closed set of Polymorphic Variants. If something like anonymous sum type here were to be accepted then that concept would be exceedingly useful while being very easy and very fast to statically type.

@burdges
Copy link

burdges commented Mar 15, 2017

Anonymous sum types might interact with -> impl Trait : At present, this code snippet cannot compile because the iterators have different types :

match x {
    A(a) => once(a).chain(foo),
    B(b) => once(bar).chain(foo).chain(b),
}

You could make this make sense with an anonymous sum type of the form impl Iterator | impl Iterator, that itself becomes an Iterator, but inferring any type like that sounds like chaos.

One could do it in std with enums like :

enum TwoIterators<A,B> {
    IterA(A),
    IterB(B),
}

impl Iterator for TwoIterators where .. { .. }

so the above code becomes

match x {
    A(a) => TwoIterators::IterA( once(a).chain(foo) ),
    B(b) => TwoIterators::IterB( once(bar).chain(foo).chain(b) ),
}

I could imagine some enum Trait sugar that did basically this too. You cannot delegate associated types or constants to an enum at runtime like this, so an enum Trait must enforce that they all agree across all the variants.

@dobkeratops
Copy link

this might sound like a weird hack , but how about just making A|B sugar for 'Either', i suppose it might get even weirder to start composing A|B|C as Either<A,Either<B,C>> or have that mapping to something . What if there was some sort of general purpose 'operator overloading' in the 'type syntax' , allowing people code to experiment with various possibilities - see what gains traction
(i had yet another suggestion about allowing general purpose substitutions, e.g. type Either<A,Either<B,C>> = Any3<A,B,C> .. etc https://www.reddit.com/r/rust/comments/6n53oa/type_substitutions_specialization_idea/ now imagine recovering ~T === Box ~[T] ... type Box<RawSlice> = Vec .. through a completely general purpose means )

@strega-nil
Copy link

@dobkeratops I'd rather just have a variant style type, i.e., with variadics.

@Sgeo
Copy link

Sgeo commented Aug 7, 2017

I wrote some code that could potentially fit into a library now that type macros are stable: https://gist.github.com/Sgeo/ecee21895815fb2066e3

Would people be interested in this as a crate?

@Ekleog
Copy link

Ekleog commented Apr 22, 2018

I've just come upon this issue, while looking for a way to avoid having some gross code that simply doesn't want to go away (actually it's slowly increasing, started at 8 variants and passed by 9 before reaching 12):

use tokio::prelude::*;

pub enum FutIn12<T, E, F1, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11, F12>
where
    F1: Future<Item = T, Error = E>, // ...
{
    Fut1(F1), // ...
}

impl<T, E, F1, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11, F12> Future
    for FutIn12<T, E, F1, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11, F12>
where
    F1: Future<Item = T, Error = E>, // ...
{
    type Item = T;
    type Error = E;

    fn poll(&mut self) -> Result<Async<Self::Item>, Self::Error> {
        use FutIn12::*;
        match *self {
            Fut1(ref mut f) => f.poll(), // ...
        }
    }
}

I was thus thinking that it'd be great to have anonymous sum types that automatically derived the traits shared by all their variants, so that I could get rid of this code and just have my -> impl Future<Item = (), Error = ()> function return the futures in its various branches (with some syntax, that ideally wouldn't expose the total number of variants but let rustc infer it from the returning points, so that adding a branch doesn't require changing all the other return points), and have the anonymous sum type match the -> impl Future return type.

@glaebhoerl
Copy link
Contributor

As I wrote here I think this use case would be better addressed by something modelled after how closures work.

@alexreg
Copy link

alexreg commented Apr 22, 2018

I don’t think it would be wise to make anonymous sum types nominally typed, as you seem to suggest. Structural typing, as with tuples, is far more useful and less surprising to the programmer.

@Pauan
Copy link

Pauan commented Apr 22, 2018

@alexreg What they're saying is that the specific use-case of wanting to return impl Trait with different types in each branch is better handled by a secret nominal type, similar to how closures are implemented.

Therefore, anonymous sum types are separate (and mostly unrelated) from that use case.

@alexreg
Copy link

alexreg commented Apr 22, 2018

@Pauan Oh, well I agree with that. As long as we consider these things two separate features, fine. Thanks for clarifying.

@Ekleog
Copy link

Ekleog commented Apr 23, 2018

Oh indeed good point, thanks! Just opened #2414 to track this separate feature, as I wasn't able to find any other open issue|RFC for it :)

@eaglgenes101
Copy link

I'm planning to get out a pull request for this proposed RFC. Most of you following this thread probably know that a number of proposals like this were rejected for being too complex, so its focus is minimalism and implementation simplicity rather than ergonomics and features. Any words before I get it out? (I've asked this question in multiple other areas to try to collect as much feedback before getting the proposed RFC out, fyi)

https://internals.rust-lang.org/t/pre-rfc-anonymous-variant-types/8707/76

@Centril Centril added A-syntax Syntax related proposals & ideas A-structural-typing Proposals relating to structural typing. A-sum-types Sum types related proposals. A-data-types RFCs about data-types labels Nov 26, 2018
@Yokinman
Copy link

My two cents; I think the motivation for anonymous enums is broadly:

  • Prototyping.
  • Single-use return types.

For these use cases, I think it's important that anonymous enums are easy to refactor. In the case of anonymous structs, the transformation is pretty painless. You just introduce a new tuple struct, and prefix each tuple with the struct's name:

fn split(text: &'static str, at: usize) -> (&'static str, &'static str) {
    (&text[..at], &text[at..])
}

assert_eq!(split("testing", 4), ("test", "ing"));
+ #[derive(Debug, PartialEq)]
+ struct Split(&'static str, &'static str);

! fn split(text: &'static str, at: usize) -> Split {
!     Split(&text[..at], &text[at..])
  }

! assert_eq!(split("testing", 4), Split("test", "ing"));

On the other hand, I don't think structurally anonymous enums would be as helpful for prototyping since there isn't an equivalent in explicit enums. The transformation would likely involve tedious renaming and rearranging for every pattern:

fn add_one(value: String | i64) -> String | i64 {
   match value {
       mut x: String => { 
           x.push_str("1"); 
           x 
       }
       y: i64 => {
           y + 1
       }
   }
}

fn something(value: String | i64) {
    match value {
        x: String => println!("String: {x}"),
        y: i64 => println!("i64: {y}"),
    }
}
+ #[derive(Debug, PartialEq)]
+ enum AddOne {
+     String(String),
+     i64(i64),
+ }

! fn add_one(value: AddOne) -> AddOne {
      match value {
!         AddOne::String(mut x) => { 
              x.push_str("1"); 
!             AddOne::String(x) 
          }
!         AddOne::i64(y) => {
!             AddOne::i64(y + 1)
          }
      }
  }

! fn something(value: AddOne) {
      match value {
!         AddOne::String(x) => println!("String: {x}"),
!         AddOne::i64(y) => println!("i64: {y}"),
      }
  }

Not to mention that the names would likely be changed from String and i64 (either to add camel case, to be more descriptive, or to name syntactical types like: (T,), [T; N], fn(T) -> U).

I also think that anonymous enums should be able to represent more common stateful enum types similar to std::option::Option, std::cmp::Ordering, and std::ops::Bound. Without this, I think most would end up doing a similarly awkward transformation from an anonymous struct instead:

use std::cmp::Ordering;

fn max(a: i64, b: i64) -> (i64, Ordering) {
   match a.cmp(&b) {
       Ordering::Less => (b, Ordering::Less),
       Ordering::Equal => (a, Ordering::Equal),
       Ordering::Greater => (a, Ordering::Greater),
   }
}

assert_eq!(max(4, 7), (7, Ordering::Less));
  use std::cmp::Ordering;

+ enum SomeOrdering {
+     Less(i64),
+     Equal(i64),
+     Greater(i64),
+ }

! fn max(a: i64, b: i64) -> SomeOrdering {
     match a.cmp(&b) {
!        Ordering::Less => SomeOrdering::Less(b),
!        Ordering::Equal => SomeOrdering::Equal(a),
!        Ordering::Greater => SomeOrdering::Greater(a),
      }
  }

! assert!(matches!(max(4, 7), SomeOrdering::Less(7)));

I would prefer a more general syntax where the variants are explicitly named and referenced using something like return type notation. If the anonymous enum is converted into an explicit one in the future, any code referencing its variants could still function with zero refactoring required.

use std::cmp::Ordering;

fn max(a: i64, b: i64) -> enum {
    Less(i64),
    Equal(i64),
    Greater(i64),
} {
    match a.cmp(&b) {
        Ordering::Less => max()::Less(b),
        Ordering::Equal => max()::Equal(a),
        Ordering::Greater => max()::Greater(a),
    }
}

assert!(matches!(max(4, 7), max()::Less(7)));
  use std::cmp::Ordering;

+ enum SomeOrdering {
+     Less(i64),
+     Equal(i64),
+     Greater(i64),
+ }

! fn max(a: i64, b: i64) -> SomeOrdering {
      match a.cmp(&b) {
          Ordering::Less => max()::Less(b),
          Ordering::Equal => max()::Equal(a),
          Ordering::Greater => max()::Greater(a),
      }
  }

  assert!(matches!(max(4, 7), max()::Less(7)));

Alternatively, maybe a general syntax for explicit enums with anonymous variants could be defined, although it seems niche and awkward to me.

Your friend,
Yokin

@ModProg
Copy link

ModProg commented Jul 20, 2024

I found this issue while trying to do a struct where I wanted to contain either syn::Token![#] or syn::Token![$] which would have matched a union type, though I actually thought to make the variants + their order relevant (same as it is for tuples) to make it fully compatible with enums where two variants can contain the same value.

The syntax I imagined was

let mut value: <u8, u8, u8> = ::0(1);
value = ::1(2);
value = ::2(3);


match a {
    ::0(a) => println!("first: {a}"),
    ::1(b) | ::2(b) => println!("second or third: {b}"),
}

Using number indexes, inspired by how tuples work.

@nasso
Copy link

nasso commented Jul 21, 2024

i dont like the order being relevant... addition is commutative, so different permutations of the same sum type should be equivalent (like reordering fields in a struct). i dont want to have to convert the output of a function to pass it to another if they disagree on the order

i think what most of us want is a sum type like <a, b, c> that behaves like an enum for which the variants are named after the types, not their position in the "union". writing <a, a, a> wouldn't even be possible, or would just be equivalent to <a>

for a tuple, (a, b, c) is not the same as (c, b, a) because a tuple is a struct with each field named after the position of each type/value. it allows for (a, a, a)

though your suggestion is interesting because it's a nice parallel with how tuples work, i don't think this is what we're looking for

@programmerjake
Copy link
Member

I like having order matter since it fixes a hard problem:
for fn f<T, U>() -> <T, U> what happens when T = U?
It's also analogous to tuples in that both have their fields/variants be position-based numbers instead of being names.
There's also prior art:
C++'s std::variant behaves like this, where you can use std::get<N>(my_variant) to get the Nth alternative and throw an error if my_variant's current alternative isn't the Nth alternative. (you can also access by type but only if the type is unique)

@teohhanhui
Copy link

I like having order matter since it fixes a hard problem:
for fn f<T, U>() -> <T, U> what happens when T = U?

writing <a, a, a> wouldn't even be possible, or would just be equivalent to <a>

Doesn't that address the issue?

@programmerjake
Copy link
Member

programmerjake commented Jul 21, 2024

I like having order matter since it fixes a hard problem:
for fn f<T, U>() -> <T, U> what happens when T = U?

writing <a, a, a> wouldn't even be possible, or would just be equivalent to <a>

Doesn't that address the issue?

no, because you should be able to write <T, U> for generic T and U (since otherwise it is severely limited), and if you try to define <T, U> to somehow behave differently when you use that generic code with types where T = U then you run into unsoundness caused by the same reasons as why general specialization is unsound: it doesn't properly account for lifetimes, which allows you to write code that makes the compiler e.g. transmute a &'a u8 to &'static u8 so you can read after the memory has been freed, which is UB.

@programmerjake
Copy link
Member

if you try to define <T, U> to somehow behave differently when you use that generic code with types where T = U then you run into unsoundness caused by the same reasons as why general specialization is unsound: it doesn't properly account for lifetimes, which allows you to write code that makes the compiler e.g. transmute a &'a u8 to &'static u8 so you can read after the memory has been freed, which is UB.

this is because the compiler erases all lifetime information before it generates the de-genericified code where all generics are substituted by actual types, which means that it can't tell the difference between &'a u8 and &'static u8 at that point since they both end up as &'erased u8

@nasso
Copy link

nasso commented Jul 23, 2024

both are valid use cases but they really are different

you're describing C++'s std::variant (which is nice), but some of us want TypeScript's unions (but better)

i think it wouldn't be too difficult to implement std::variant when Rust eventually gets variadic generics. it would probably work like C++'s (similar to Either<L, R> but with an arbitrary number of generics)

but TypeScript-like unions, for which the order isn't relevant, would probably require compiler-level support (so that <a, b, c> is equivalent to <b, a, c> and all other permutations). i don't think it's possible in Rust today to write a generic type (with >1 parameter) for which changing the order of the type parameters doesn't change the identity of the type

i wanna be able to write a function that returns some Result<T, <E1, E2, ..., En>> and not have to worry about the order in which the errors are laid out in my error type. it's just an unordered set (bag) of errors. and ? just adds to that bag, wherever it feels like. and i can have some handle_error(e: <E1, E2, ..., En>) function. or maybe its handle_error(e: <E4, E2, E6, E8, ..., En>) etc... it just matches over the type of error, not the position in the sum type

@glaebhoerl
Copy link
Contributor

For the record, this particular issue is explicitly about anonymous sum types, not union types. I.e. with position/tag/discriminant-based matching (like Rust enums), not type-based matching (like TS). I don't know if there's already an issue for the latter, but it might be worth opening one if people are interested in it.

@teohhanhui
Copy link

@glaebhoerl I don't know if Wikipedia is wrong here, but:

In type theory, a union has a sum type; this corresponds to disjoint union in mathematics.

Maybe that's why the confusion.

I don't think anyone is asking for an untagged union: https://news.ycombinator.com/item?id=32018886

So I think we're actually asking for the same thing, i.e. tagged union, a.k.a. sum type.

@tesaguri
Copy link

tesaguri commented Jul 29, 2024

I don't think anyone is asking for an untagged union: https://news.ycombinator.com/item?id=32018886

The thread you quoted seems to be about Haskell-like languages, and I guess the untagged union in that context differs from what you imagine (maybe the union keyword of C and, well, Rust, which, unlike TypeScript, doesn't really have runtime type tags at all and is thus inherently unsafe?).

A union type is like a set union. It differs from sum type (corresponds to Rust's enum E { A(T), B(U) }) in that the union type of same types $T$ and $T$ equals $T$ (similar to $S \cup S = S$) while the sum type of same types $T$ and $T$ doesn't equal, nor is a supertype of $T$.

Some people in this issue have proposed the union type in this sense in addition to the sum type.

@Keavon
Copy link

Keavon commented Jul 29, 2024

Yeah, that Haskell-related discussion doesn't really make sense here. We already have tagged unions in Rust, they're called enums. Each tag is the variant's name. Our goal here is having untagged (or anonymous) ones so you can define T | U | V (like String | u64 | bool) and match based on each type without needing to previously declare (and in some cases import into the current scope) that tagged wrapper type (the enum type).

@Diggsey
Copy link
Contributor

Diggsey commented Jul 30, 2024

@Keavon you seem to be confusing tagged/untagged with named/anonymous, they are very different things. Rust already has untagged and tagged named unions, what it's missing are anonymous tagged unions.

"tag" refers to the enum discriminant. Untagged unions do not have a discriminant, and so are unsafe to access. See https://en.wikipedia.org/wiki/Tagged_union

The names of enum variants are not called tags - they have associated tags, for example:

enum Foo {
    A, // Might have tag 0
    B, // Might have tag 1
    C // Might have tag 2
}

Tags also exist for anonymous enums, since the compiler still needs to differentiate which variant is selected, for example:

type Foo = A /* might get tag 0 */ | B /* might get tag 1 */ | C /* might get tag 2 */

@Keavon
Copy link

Keavon commented Jul 30, 2024

I see, thanks for pointing out my terminology error. If I'm reading what you explained correctly, I think you're responding to this part of my second sentence above:

Our goal here is having untagged (or anonymous) ones...

(is that right?)

Rephrasing what I wrote above, then, I think that I was describing a goal of having the compiler's type system figure out the tags behind the scenes, allowing you to write code with anonymous variant names as well as anonymous enum types. So behind the scenes, it would be tagged (using your illustrations of // Might have tag 0, etc.) but those variant names shouldn't be given by the user, and neither should the entire type be given by the user either unless the user decides to typedef it with your final code block example:

type Foo = A /* might get tag 0 */ | B /* might get tag 1 */ | C /* might get tag 2 */

The result should be an equivalent to TypeScript's approach, however with the ability for the compiler to discriminate between the types at compile time so the code doesn't have to match based on something kind a kind: string field required in TS for its ability to work at runtime once it becomes JS. Is that roughly accurate now?

@Rufflewind
Copy link

The key distinction lies in this scenario:

type Foo<A, B> = A | B;

type Bar = Foo<i32, i32>; // !!

fn bar_consumer(bar: Bar) {
  match bar {
  // ??
  }
}

Option 1: Union type (like TypeScript)

In this case, Foo<A, B> flattens to a single-member union containing just i32 (or alternatively the compiler could treat it as indistinguishable to i32).

type Foo<A, B> = A | B;

type Bar = Foo<i32, i32>;

fn bar_consumer(bar: Bar) {
  match bar {
    i: i32 => ...,
    // No other options are possible.
  }
}

Option 2: Sum type (like standard Rust enums)

With a sum type, each choice in the anonymous enum must be assigned a locally unique name. Here, I chose to use the type parameter itself as the name. There are other alternatives of course, like giving them integer names analogous to tuples (.0, .1, .2, etc) but I imagine the ergonomics would be poor.

type Foo<A, B> = A | B;

type Bar = Foo<i32, i32>;  // Remains a two-member union

fn bar_consumer(bar: Bar) {
  match bar {
    A(i: i32) => ...,
    B(j: i32) => ...,
  }
}

@Keavon
Copy link

Keavon commented Jul 30, 2024

Thanks for laying that out clearly @Rufflewind, that's easy to understand.

Perhaps I come from a biased perspective as a TypeScript user with only a cursory understanding of type theory, but to me option 1 seems like the obvious, no-brainer, only good solution. But I must be missing the perspective to make a fair comparison. Are there downsides to option 1, and upsides to option 2? Is there a compelling reason to consider option 2 or are people just discussing it because it's technically the name of this issue?

@programmerjake
Copy link
Member

Perhaps I come from a biased perspective as a TypeScript user with only a cursory understanding of type theory, but to me option 1 seems like the obvious, no-brainer, only good solution. But I must be missing the perspective to make a fair comparison. Are there downsides to option 1,

yes there are downsides, imo critically so for Rust, since Rust has lifetimes where each different lifetime makes references different types, but Rust is intentionally designed such that when generating the final compiled code, it never knows what lifetimes anything has (lifetimes have been erased). This means that using the TypeScript-style Union approach can't work for the following code:

fn do_match<T, F>(v: <T | F>) -> Result<T, F> {
    // this requires Rust to be able to distinguish T and F when generating code for this function,
    // which happens after lifetimes are erased. So, if `T = &'a u8` and `F = &'b u8`,
    // then when Rust is generating code for this function it sees
    // `T = &'erased u8` and `F = &'erased u8`, so are they the same type or not?
    //
    // If Rust decides they're the same type, then does Rust return Ok?
    // if so, then this can be used to convert one lifetime `'b` into another
    // lifetime `'a` in safe code, which is unsound since `'a` could live for
    // much longer than `'b` so code can use memory after it has been freed,
    // which is Undefined Behavior (aka. rustc is allowed to generate code that
    // does anything at all, such as crash, overwrite unrelated memory, delete
    // all your code, do exactly what you wanted, format your harddrive, etc.).
    // If Rust decides to return Err instead, you can still convert lifetimes,
    // just the other way around, so it's just as bad.
    //
    // If Rust decides they're not the same type, then when some other
    // function tries to call this function where `'a` and `'b` are actually the
    // same lifetime, then the caller will think it's passing a TypeScript-style
    // Union type with only one variant, but `do_match`'s code will have been
    // generated for a TypeScript-style Union type with two variants, which is
    // also unsound and Undefined Behavior!
    match v {
        T(t) => Ok(t),
        F(f) => Err(f),
    }
}

@Keavon
Copy link

Keavon commented Jul 30, 2024

Interesting... I suppose there will always have to be some level of compromises but it's a matter of picking the least bad ones, while also ensuring that we don't use small downsides as a reason against even attempting the larger gains. (The "but sometimes..." fallacy coined by Technology Connections in his video about LED stoplights being worse at melting snow so people don't want to switch to them at all, or engineer mitigations.)

So I suppose I have a couple questions:

  • Could it require that equal lifetimes be supplied to all generic types?
  • Could it require that at most one generic type be used?
    • Maybe with trait constraints that require they be non-overlapping to have more than one generic

@Luca-spopo
Copy link

Luca-spopo commented Jul 31, 2024

I think two entirely unrelated features are being talked about concurrently

One would be just syntactic sugar to make anonymous "sum" types behave the same way as enums do currently. I would like to see this in the language.

My understanding:

  • Enums are something that exist at runtime, they contain a tag at runtime
  • An enum has its own API, you cannot treat it as if it is its contents (even if it only contains one possible value)
  • (So, you can't add two objects of type EnumA even if you know EnumA contains a number)
  • You can use an enum in a match statement etc.

The second features being talked about are a sum type (I will call it Union type).

My understanding:

  • A union type isn't something that exists at runtime, it's a type
  • An object of type Union<A, B> is either an A or a B. You can cast it and treat it as A or B if you know the underlying type.
  • The order of A and B are not significant. Union<A, B> is the same as Union<B, A>
  • Union<A, A> indistinguishable from A
  • This is a niche already served by traits. If we have a trait U and we know that the only types that implement U are A and B, then U is already a sum type of A and B.

I think what's missing for this second feature is making traits closed to extension. Right now we can have a trait U and two conforming structs A and B, but we cannot be 100% certain (usually) that A and B are the only structs that can conform to this trait.
We could abuse the orphan rules to make this the case today, but that's not a nice way to do it, and I dont think the compiler will let us exploit the knowledge that A and B are the only structs that implement trait U.

Just a syntax to say "Hey, the only structs allowed to implement this trait are this list I am defining" should be able to serve the niche of a union type. We should be able to declare that dyn& U is going to be either an A or a B because there's no other possibilities.

@kaivol
Copy link

kaivol commented Jul 31, 2024

Rust has lifetimes where each different lifetime makes references different types, but Rust is intentionally designed such that when generating the final compiled code, it never knows what lifetimes anything has (lifetimes have been erased).

@programmerjake
Aren't all types effectively erased at runtime? To distinguish lifetimes we would use the same mechanism as to distinguish types (so probably a discriminant).

If Rust decides they're not the same type, then when some other
function tries to call this function where 'a and 'b are actually the
same lifetime, then the caller will think it's passing a TypeScript-style
Union type with only one variant, but do_match's code will have been
generated for a TypeScript-style Union type with two variants, which is
also unsound and Undefined Behavior!

I'm not exactly familiar with TypeScript unions, so could you expand on this? What exactly would cause Undefined Behavior here?


Beyond that, I agree that anonymous sum types including generics are a bit awkward.
It probably shouldn't even be possible to match on Generic type parameters, but only on their bounds.

@8573
Copy link

8573 commented Jul 31, 2024

a sum type (I will call it Union type). ... Union<A, A> indistinguishable from A

This is a union type but not a sum type: a characteristic feature of a sum type A+B is that its number of possible values is the sum of the numbers of possible values of the summed types A and B, even if A = B. For example, enum E { A(u8), B(u8) } is a sum type u8+u8 and has 512 (= 2⁸ * 2) possible values, whereas a union type u8|u8 has only 256 (= 2⁸) possible values (because the two u8 types are equal and get collapsed into one).

@programmerjake
Copy link
Member

Rust has lifetimes where each different lifetime makes references different types, but Rust is intentionally designed such that when generating the final compiled code, it never knows what lifetimes anything has (lifetimes have been erased).

@programmerjake Aren't all types effectively erased at runtime? To distinguish lifetimes we would use the same mechanism as to distinguish types (so probably a discriminant).

sort of (e.g. wasm can still distinguish between passing a i32 and a i64), but that isn't the relevant difference, which is that Rust, when generating the final code (at the stage where it substitutes the actual types into generics and generates separate functions for each combination of types used, so it does know the exact types then except without lifetime information).

If Rust decides they're not the same type, then when some other
function tries to call this function where 'a and 'b are actually the
same lifetime, then the caller will think it's passing a TypeScript-style
Union type with only one variant, but do_match's code will have been
generated for a TypeScript-style Union type with two variants, which is
also unsound and Undefined Behavior!

I'm not exactly familiar with TypeScript unions, so could you expand on this? What exactly would cause Undefined Behavior here?

The key part of TypeScript-style unions here is that they act like enums except they collapse duplicate types all into one variant for each set of duplicate types. The part that causes Undefined Behavior is where the calling function tries to pass effectively an enum with one variant (so it usually just passes the variant's value without any data on which variant is passed since there's only one option), but the called function expects effectively an enum with two variants so is expecting data on which variant is active ... this ABI mismatch makes the function call Undefined Behavior.

@programmerjake
Copy link
Member

Beyond that, I agree that anonymous sum types including generics are a bit awkward. It probably shouldn't even be possible to match on Generic type parameters, but only on their bounds.

I disagree, I think that anonymous sum types should be like tuples, where you select which one you want based on position, and selecting which one you want based on type should be syntax sugar at most -- so by the time any code is generated and/or any generics are substituted with actual types, all such matching is already converted to matching based on position so doesn't need to know if types are the same or not.

@Dessix
Copy link

Dessix commented Jul 31, 2024

Positional selection does very little to help us solve the primary ergonomics problems in Rust - most notably, error handling being atrocious. Sadly, reviewing the opening post, it does describe a positional selector mechanism, which ... While novel, leaves us in the doldrums with regard to solving existing, practical problems.

We should probably file or contribute to an RFC for anonymous / TypeScript-style (but automatically tagged, and thus safely matchable back to original type!) unions. Whether such an RFC works through sum types in the compiler or not is a matter of design and implementation, and - while it may have Rust-ABI implications in the long term - does not actually matter all that much for a description of intended usage and syntax.

@burdges
Copy link

burdges commented Aug 1, 2024

I long proposed TinyBox<dyn Error> for error handling, but someone argued convincingly that returning Result<T,[usize; 2]> incurs too much overhead, so tagged unions likely incur too much overhead too.

I suspect error handling requires one first considers the actual calling convention. At present Rust only has variance from lifetimes, but we could consider either complicating variance or else providing implicit impl Froms for variant types, so either way Result<T,SpecificErrorZST> work, and compiles like Option<T> under the hood.

We'll likely need attributes that constructed the specific variant types, or at least tell the compiler what's going on.

pub enum MyErrors {
    #[variant_type_zst]
    ErrorX,
    #[variant_type_zst]
    ErrorY,
    #[variant_type]
    #[unique_non_zst]
    ErrorZ(&'static str),
}

fn foo(..) -> Result<T,MyErrors::ErrorX> { .. }

Along similar lines, you could imagine explicitly merging enums, which only works within one crate and adjusts discriminants, like

pub enum CrateError union {
    module1::Error,
    module2::Error,
    module3::Error,
    // AnotherCrateError // Invalid since discriminant defined in another compilation unit
}

Anyways, there are perforamance concerns which require care, so anonymous sum types might fit poorly.

@Dessix
Copy link

Dessix commented Aug 1, 2024

... someone argued convincingly that returning Result<T,[usize; 2]> incurs too much overhead, so tagged unions likely incur too much overhead too.

If they can't afford to use tagged unions in their errors, they can choose not to do so. The point here is to supplant the need for layers of wrapping thiserror or even boxing / heap-allocating newtypes that do nothing but bog down user codebases.

The whole point of bringing this up was to make it possible to avoid writing all of that boilerplate, since all of that extra work just gets us ... Java's old Checked Exceptions, which were too boilerplatey for anyone to actually use back then either, despite being less work than our current state of affairs.

@Rufflewind
Copy link

  • TypeScript-style unions are not a good fit for Rust as explained previously: Anonymous sum types #294 (comment)

  • Anonymous sum types with positional choices are a better fit in Rust, but positional choices are not ergonomic.

  • Instead, I think it's better to consider anonymous sum types with labeled choices. They still behave like sum types (i.e. A + A != A), but labels are more readable than numbers.


A hypothetical syntax for anonymous sum types with labeled choices might be:

fn foo(...) -> Result<T, Err(enum { X, Y(i32), Z(String) })> { ... }

fn bar(e: enum { X, Y(i32), Z(String) }) {
    match e {
        enum::X => { ... },
        enum::Y(i) => { ... },
        enum::Z(s) => { ... },
    }
}

which could desugar into, say:

enum __AutogeneratedEnum735fc68a {
  X         = 0x4b68ab38,
  Y(i32)    = 0x18f5384d,
  Z(String) = 0xbbeebd87,
}

fn foo(...) -> Result<T, __AutogeneratedEnum735fc68a> { ... }

fn bar(e: __AutogeneratedEnum735fc68a) {
    match e {
        __AutogeneratedEnum735fc68a::X => { ... },
        __AutogeneratedEnum735fc68a::Y(i) => { ... },
        __AutogeneratedEnum735fc68a::Z(s) => { ... },
    }
}

The discriminants are generated deterministically from the labels "X", "Y", and "Z" respectively. (This toy example uses sha256.) Deterministic discriminants would allow for some degree of cheap enum-to-enum coercion, e.g. perhaps Err(enum { X, Y(i32), Z(String) } could be upcasted to Err(enum { W(...), X, Y(i32), Z(String) }.

@Dessix
Copy link

Dessix commented Aug 1, 2024

Why do we erase type identity prior to solving which type we're passing via which paths? This seems to propose that we can't fix that - or even patch it over with a rule like "matching types combine to the shortest lifetime"?
Giving up on ergonomics because our type system throws information away too early and instead making the user resolve the typing, whether by position or by name, seems like a rather golang direction to take.

@glaebhoerl
Copy link
Contributor

I'll give this one more try:

I originally opened this issue back in 2013. It is about anonymous sum types with positional matching, for symmetry with tuples. No extra type-based magic, just as with tuples. I agree that type-based features also seem potentially valuable, but please, I beg, take those discussions to other issues and threads, because they are separate features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-data-types RFCs about data-types A-structural-typing Proposals relating to structural typing. A-sum-types Sum types related proposals. A-syntax Syntax related proposals & ideas T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

No branches or pull requests