Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-generated sum types #2414

Open
Ekleog opened this Issue Apr 23, 2018 · 191 comments

Comments

Projects
None yet
@Ekleog
Copy link

Ekleog commented Apr 23, 2018

First, the idea was lain down by @glaebhoerl here.

The idea is basically to have a way to tell the compiler “please auto-generate me a sum trait for my -> impl Trait” function, that would automatically derive Trait based on the implementations of the individual members.

There would, I think, be a need for a syntax specially dedicated to this, so that people are not auto-generating these without being aware of it. Currently, the two syntaxes I can think of are either |value| (without a following {}), running on the idea of “this is auto-generated, just like closures” (I don't like it much, but it could make sense), and the other idea is to make it use a keyword. I'd have gone with auto, but it appears to not be reserved, and become or enum would likely be the best choices.

(Edit: @Pauan pointed out that the |…| syntax would be ambiguous in certain cases, so please disregard it)

So the syntax would, I think, look like this:

fn foo(x: bool) -> impl Iterator<Item = u8> {
    if x { return become b"foo".iter().cloned() }
    become b"hello".iter().map(|c| c + 1)
}
// Or:
fn foo(x: bool) -> impl Iterator<Item = u8> {
    if x { return enum b"foo".iter().cloned() }
    enum b"hello".iter().map(|c| c + 1)
}
// Or:
fn foo(x: bool) -> impl Iterator<Item = u8> {
    if x { return |b"foo".iter().cloned()| }
    |b"hello".iter().map(|c| c + 1)|
}

The major advantage of the || syntax is that it doesn't raise the issue of parenthesizing, as it's already made of parenthesis.

What do you think about this idea, especially now that impl Trait is landing and the need is getting stronger?

@alexreg

This comment has been minimized.

Copy link

alexreg commented Apr 23, 2018

Thanks for creating a separate issue for this.

To be honest, I'm not sure we need explicit syntax for this. It's more of an (important) implementation detail than a user-facing feature, no? But if one does want to make the syntax explicit, then I suggest putting something like impl enum Trait in the function signature.

@Nemo157

This comment has been minimized.

Copy link
Contributor

Nemo157 commented Apr 23, 2018

To be honest, I'm not sure we need explicit syntax for this. It's more of an (important) implementation detail than a user-facing feature, no? But if one does want to make the syntax explicit, then I suggest putting something like impl enum Trait in the function signature.

@alexreg one reason to make it explicit is it does have performance implications, each time you call a method there will have to be a branch to call into the current type (hmm, unless these were implemented as some form of stack-box with a vtable instead, either way that still changes the performance). My first thought was also to make it a modifier on the impl Trait syntax to avoid having to repeat it on each return (impl sum Trait for the insiders pun of the some Trait proposed keyword instead of impl). But, as you mention this is an implementation detail, so that should not be exposed in the signature (could just have rustdoc hide it I suppose).

@Pauan

This comment has been minimized.

Copy link
Member

Pauan commented Apr 23, 2018

I might be wrong, but wouldn't the |...| cause parsing ambiguities, since it's already used for closures?

@Ekleog

This comment has been minimized.

Copy link
Author

Ekleog commented Apr 23, 2018

@Pauan Oh indeed, I was thinking EXPR { } was not valid syntax, but that's not the case in eg. if. Then, in if the |...| syntax should not be allowed anyway in if conditions, but that'd complicate for no reason the parser.

@Nemo157, @alexreg The issue with putting this as a modifier in the type signature is the fact it wouldn't work well inside a function:

fn bar() -> Option<LinkedList<char>> { /* ... */ }
// This is allowed
fn foo() -> impl enum Iterator<Item = char> {
    match bar() {
        Some(x) => x.iter(),
        None => "".iter(),
    }
}
// Either this is not allowed, or the loss in performance is not explicit
fn foo() -> impl enum Iterator<Item = char> {
    let mut tmp = match bar() {
        Some(x) => x.iter(),
        None => "".iter(),
    };
    let n = tmp.next();
    match n {
        Some(_) => tmp,
        None => "foo bar".iter(),
    }
}
@est31

This comment has been minimized.

Copy link
Contributor

est31 commented Apr 23, 2018

Haven't you just invented dynamic dispatch??

@alexreg

This comment has been minimized.

Copy link

alexreg commented Apr 23, 2018

Yeah, fair point about the performance hit. It’s a small one, but it wouldn’t be in the spirit of Rust to hide it from the user syntactically.

@Nemo157

This comment has been minimized.

Copy link
Contributor

Nemo157 commented Apr 23, 2018

@Ekleog you can use the exact same syntax for inside a function, somewhere you need to mention the trait that you're generating a sum type for anyway:

fn foo() -> impl enum Iterator<Item = char> {
    let mut tmp: impl enum Iterator<Item = char> = match bar() {
        Some(x) => x.iter(),
        None => "".iter(),
    };
    let n = tmp.next();
    match n {
        Some(_) => tmp,
        None => "foo bar".iter(),
    }
}

@est31 a constrained form of dynamic dispatch that could potentially get statically optimized if the compiler can prove only one or the other case is hit. Or, as I briefly mentioned above it could be possible for this to be done via a union for storage + a vtable for implementation, giving the benefits of dynamic dispatch without having to use the heap. (Although, if you have wildly different sizes for the different variants then you pay the cost in always using the size of the largest.)

One thing that I think might be important is to benchmark this versus just boxing and potentially have a lint recommending switching to a box if you have a large number of variants (I'm almost certain that a 200 variant switch would be a lot slower than dynamically dispatching to one of 200 implementors of a trait, but I couldn't begin to guess at what point the two crossover in call overhead, and there's the overhead of allocating the box in the first place).

@Ekleog

This comment has been minimized.

Copy link
Author

Ekleog commented Apr 23, 2018

@Nemo157 Thanks for explaining things better than I could!

I'd just have a small remark about your statement: I don't think a 200-variant switch would be a lot slower than dynamic dispatch: the switch should be codegen'd as a jump table, which would give something like (last time I wrote assembler is getting a bit long ago so I'm not sure about the exact syntax) mov rax, 0xBASE_JUMP_TABLE(ENUM_DISCRIMINANT,3); jmp rax, while the dynamic dispatch would look like mov rax, METHOD_INDEX(VTABLE); jmp rax. As BASE_JUMP_TABLE and METHOD_INDEX are constants, they're hardcoded in the assembly, and so both cases end up being 1/ a memory load (either in JUMP_TABLE or VTABLE, so there could be cache impact here depending on whether you always call different methods on the same object or whether you always call the same method on different objects), and 2/ a jump (with a dependency between these instructions).

So the mere number of implementors shouldn't matter much in evaluating the performance of this dispatch vs. a box. The way of using them does have an impact, but this will likely be hard to evaluate from the compiler's perspective.

However, what may raise an issue about performance is nesting of such sum types: if you have a sum type of a sum type of etc., then you're going to lose quite a bit of time going through all these jump tables. But the compiler may detect that one member of the sum type is another sum type and just flatten the result, so I guess that's more a matter of implementation than specifiction? :)

@est31

This comment has been minimized.

Copy link
Contributor

est31 commented Apr 23, 2018

a constrained form of dynamic dispatch that could potentially get statically optimized if the compiler can prove only one or the other case is hit.

LLVM is capable of doing devirtualisation.

as I briefly mentioned above it could be possible for this to be done via a union for storage + a vtable for implementation, giving the benefits of dynamic dispatch without having to use the heap

That's a point admittedly. Dynamically sized stack objects are a possibility but they have certain performance disadvantages.

@burdges

This comment has been minimized.

Copy link

burdges commented Apr 23, 2018

Is there anything wrong with the bike shed color -> enum Trait? There are still folk who want impl Trait to be replaced by some Trait and any Trait so maybe just enum Trait catches the "make this for me" better.

I think procedural macros could generate this right now. If coersions develop further then maybe doing so would becomes quite simple even. Right now, there are Either types that do this for specific types like Iterator and Future. I'm unsure why they do not even implement From.

@Ekleog

This comment has been minimized.

Copy link
Author

Ekleog commented Apr 23, 2018

So if we are going to start painting the bike shed (there seems to be little opposition right now, even though it has only been like a day since initial posting), I think there is a first question to answer:

Should the indication of the fact the return is an anonymous sum type lie in the return type or at the return sites?

i.e.

fn foo(x: T) -> MARKER Trait {
    match x {
        Bar => bar(),
        Baz => baz(),
        Quux => quux(),
    }
}
// vs.
fn foo(x: T) -> impl Trait {
    match x {
        Bar => MARKER bar(),
        Baz => MARKER baz(),
        Quux => MARKER quux(),
    }
}

Once this question will have been answered, we'll be able to think about the specifics of what MARKER should be.

@Ekleog

This comment has been minimized.

Copy link
Author

Ekleog commented Apr 23, 2018

So, now, my opinion: I think the fact the return is an anonymous sum type should lie at the return site, for two reasons:

  • mostly because the caller has no need of knowing that the return type is an anonymous sum type, only that it's some type that implements Trait
  • but also because it will have a nicer syntax when used inside a function with something that is not actually a return, but eg. let x = match or similar (depending on how effective type inference of the result will be, I think taking the intersection of the traits matched by all the return branches should do it, but…)

On the other hand, the only argument I could think of in favor of putting the marker in the return type is that it makes for less boilerplate, but I'm not really convinced, so I'm maybe not pushing forward the best arguments.

@alexreg

This comment has been minimized.

Copy link

alexreg commented Apr 24, 2018

@Ekleog Yeah, I'm with you on return site actually, even though I proposed the return type syntax above. As you say, it reflects the fact that it's more of an implementation detail that consumers of the function don't need to (shouldn't) care about. Also, I think the analogy to box syntax is a good one, and it would be used in a similar way superficially.

@Nemo157

This comment has been minimized.

Copy link
Contributor

Nemo157 commented Apr 24, 2018

I think procedural macros could generate this right now.

For a closed set of traits, sure, but to allow this to be used for any trait requires compiler support for getting the methods of the traits. Delegation plus some of its extensions might enable this to be fully implemented as a procedural macro.

I'm tempted to try and write a library or procedural macro version of this, I am currently manually doing this for io::{Read, Write} so even if it only supports those traits it would be useful for me. If it works that could be a good way to get some experience with using it in real code to inform the RFC.

@Ixrec

This comment has been minimized.

Copy link
Contributor

Ixrec commented Apr 24, 2018

Various forms of enum impl Trait sugar for autogenerated anonymous enums have been discussed a few times in the past. For some reason I can't find a centralized discussion of them now, but I think the points that came up before but haven't been fully raised yet are:

  • The original motivation in past discussions was to make it possible to introduce a new error type to a function without it "virally infecting"/requiring code changes up the entire call stack. I still believe that's a far more compelling benefit than anything to do with performance (after all, this is just sugar over defining your own enum).
  • It's probably better to put the marker for this feature on the signature, i.e. fn foo(x: T) -> MARKER Trait
    • As far as I know, the only options here are one marker in the signature, and one marker at every return point. A marker for every return point is just too noisy (especially when the return point is just a ?), and seems to go against the motivation of making it trivial to introduce a new error type.
    • The main argument against having it in the signature is that it shouldn't and doesn't affect the function's API/contract, so it's not something clients should know about. While a good rule of thumb, it's been pointed out in the past that this is already not an ironclad rule (I believe mut on arguments was the counterexample that's stable today), so rustdoc having to hide the marker isn't a new layer of complexity.
  • The syntax probably should include impl Trait because "returning an impl Trait" is the function's API/contract as far as clients are concerned. Hence, I slightly prefer enum impl Trait over enum Trait or impl enum Trait.

I believe the main thing preventing these discussions from going anywhere is that making it even easier to use impl Trait further accentuates the only serious problem with impl Trait: that it encourages making types unnameable. But now that the abstract type proposal exists, I don't personally think that's a big concern anymore.


@Nemo157 Hmm, what delegation extensions do you think we'd need? The "desugaring" I've always imagined is just a delegate *, so we'd need enum delegation (technically an extension iirc), and then whatever it takes for a delegate * to "just work" on most traits, which might mean delegation of associated types/constants, and I think that's it.

Though I think we should probably not block this on delegation, since it could just be compiler magic.

@Nemo157

This comment has been minimized.

Copy link
Contributor

Nemo157 commented Apr 24, 2018

From the current delegation RFC the extensions required are "delegating for an enum where every variant's data type implements the same trait" (or "Getter Methods" + "Delegate block") and "Delegating 'multiple Self arguments' for traits like PartialOrd" (although, this could be implemented without it and this feature would be in the same state as normal delegation until it's supported).

One thing I just realised is that delegation won't help with unbound associated types, required to support use cases like:

#[enumified]
fn foo() -> impl IntoIterator<Item = u32> {
    if true {
        vec![1, 2]
    } else {
        static values: &[u32] = &[3, 4];
        values.iter().cloned()
    }
}

would need to generate something like

enum Enumified_foo_IntoIterator {
    A(Vec<u32>),
    B(iter::Cloned<slice::Iter<'static>>),
}

enum Enumified_foo_IntoIterator_IntoIter_Iterator {
    A(vec::Iter<u32>),
    B(iter::Cloned<slice::Iter<'static>>),
}

impl IntoIterator for Enumified_foo {
    type Item = u32;
    type IntoIter = Enumified_foo_IntoIterator_IntoIter_Iterator;

    fn into_iter(self) -> self::IntoIter {
        match self {
            Enumified_foo_IntoIterator::A(a)
                => Enumified_foo_IntoIterator_IntoIter_Iterator::A(a.into_iter()),
            Enumified_foo_IntoIterator::B(b)
                => Enumified_foo_IntoIterator_IntoIter_Iterator::B(b.into_iter()),
        }
    }
}

impl Iterator for Enumified_foo_IntoIterator_IntoIter_Iterator {
    ...
}
@Ekleog

This comment has been minimized.

Copy link
Author

Ekleog commented Apr 25, 2018

@Ixrec Oh indeed, I didn't think of the Result<Foo, impl ErrorTrait> use case (my use case being really futures and wanting not to allocate all the time), that would make a syntax with the marker at return site much less convenient, with ?.

However, this could be “fixed” by piping ?'s definition through sed 's/return/return MARKER/'.

This wouldn't change the behaviour for existing code (as adding MARKER when there is a single variant would be a no-op). However, an argument could be raised that it does the reducing of performance without explicit marker, but for ? the boat has sailed and it's already using From magically.

So I still think that the ease of using MARKER through type-inferred branches (like let x = match { ... }), and not only at typed boundaries (like function-level return or let x: MARKER Trait = match { ... }), is a net enough win in favor of MARKER at the return site, with an implicit MARKER for ?.

However, as a downside of MARKER at return site, I must add that having it at return site will likely make implementation harder: what is the type of { MARKER x }? I'd say it could be handled by having a “TypeInProgress” type that would slowly be intersected with all the other types, and MARKER basically lifts Type to TypeInProgress(Type), and then typing rules follow, eg.

fn bar() -> bool {…}
struct Baz {} fn baz() -> Baz {…}
struct Quux {} fn quux() -> Quux {…}
struct More {} fn more() -> More {…}

trait Trait1 {} impl Trait1 for Baz {} impl Trait1 for Quux {} impl Trait1 for More
trait Trait2 {} impl Trait2 for Baz {} impl Trait2 for Quux {}

fn foo() -> impl Trait1 {
    let x = match bar() {
        true => MARKER baz(), //: TypeInProgress(Baz)
        false => MARKER quux(), //: TypeInProgress(Quux)
    }; //: TypeInProgress(Baz, Baz) & TypeInProgress(Quux, Quux)
    // = TypeInProgress(Trait1 + Trait2, enum { Baz, Quux })
    // (all the traits implemented by both)
    if bar() {
        MARKER x //: TypeInProgress(Trait1 + Trait2, enum { Baz, Quux })
    } else {
        MARKER more() //: TypeInProgress(More, More)
    } // TypeInProgress(Trait1 + Trait2, enum { Baz, Quux }) & TypeInProgress(More, More)
    // = TypeInProgress(Trait1, enum { Baz, Quux, More })
    // (all the types implemented by both)
}

And, once this forward-running phase has been performed, the actual type can be observed (ie. here enum { Baz, Quux, More }), generated, and backwards-filled into all the TypeInProgress placeholders.
Obviously, this requires that at the time of MARKER, the type is already known.

On the other hand, for a return-site-level MARKER setup, the scheme I can think of is the following:

// (skipping the same boilerplate)

fn foo() -> MARKER Trait1 {
    let x: MARKER Trait1 = match bar() {
        true => baz(),
        false => quux(),
    }; // Here we need to infer MARKER Trait1.
    // Observing all the values that come in, we see it must be enum { Baz, Quux }
    if bar() {
        x
    } else {
        more()
    } // Here we need to infer MARKER Trait1.
    // Observing all the values that come in, it must be enum { enum { Baz, Quux }, More }
}

I personally find the syntax of the second example less convenient (it forces writing down exactly which trait(s) we want to have, not letting type inference do its job) and the end-result less clean (two nested enums will be harder to optimize). It also will likely not detect common subtrees, eg. if the more() of the else branch was replaced by if bar() { more() } else { baz() }, the first type inference algorithm would infer enum { Baz, Quux, More }, because TypeInProgress(Trait1, enum { Baz, Quux, More }) is the intersection of TypeInProgress(Trait1 + Trait2, enum { Baz, Quux }) and TypeInProgress(Trait1, enum { More, Baz }) ; while the second type inference algorithm would be forced to complete the typing at the let x: MARKER Trait1, and would thus have to infer enum { enum { Baz, Quux }, enum { More, Baz } } (or maybe enum { enum { Baz, Quux }, More, Baz }, and in either case hitting exactly the performance issue of nesting sum types discussed before).

What do you think about the idea of having ? return MARKER x.unwrap_err()? (I agree with you that without it it's most likely best to have MARKER at the return type)

@Ixrec

This comment has been minimized.

Copy link
Contributor

Ixrec commented Apr 25, 2018

I personally find the syntax of the second example less convenient (it forces writing down exactly which trait(s) we want to have, not letting type inference do its job)

For me, the primary use case is returning an enum impl Trait from a function, in which case you already need to write down the traits you want in the signature. So I never saw this as a disadvantage. In fact, it didn't even occur to me to apply enum impl Trait on a regular let binding until today.

I'm not sure I understand the sentiment behind "not letting type inference do its job". Both variations of this idea involve us writing an explicit MARKER to say we want an autogenerated anonymous enum type. In both cases, type inference is only gathering up all the variants for us, and never inferring the need for an anon enum type in the first place. In both cases, the variable x needs to have a type of some kind, at least for type-checking purposes, and in both cases that type may very well disappear during compilation/optimization. So, what's the job that type inference doesn't do in the MARKER-on-signature case?

and the end-result less clean (two nested enums will be harder to optimize). It also will likely not detect common subtrees

I'm not sure I buy either of these claims. To the extent that "detecting common subtrees" is important, I would expect the existing enum layout optimizations to effectively take care of that for free. We probably need an actual compiler dev to comment here, but my expectation would be that the actual optimization-inhibiting difficulties would come from having "all traits" implemented by the anon enums, instead of just the traits you need.

And to me, the autogenerated anonymous enum type implementing more traits than I need/want it to is "less clean". I guess that's one of those loaded terms that's not super helpful.

I'm not seeing the significance of the TypeInProgress stuff; that seems like machinery you could use in implementing either variation of the syntax, and I don't see what it buys you other than perhaps guaranteeing the type of x goes away. This is probably another thing we need compiler people to comment on, but trying to make some variables not have a type sounds to me like it would be a non-starter, its motivation better addressed my optimizations after type-checking, and entirely orthogonal to the question of what the surface syntax should be anyway.

What do you think about the idea of having ? return MARKER x.unwrap_err()? (I agree with you that without it it's most likely best to have MARKER at the return type)

I think "the idea of having ? return MARKER x.unwrap_err()" is also strictly an implementation detail that's not really relevant to the surface syntax debate, especially since ? is already more than just sugar over a macro.


To clarify, I believe the real, interesting issue here is whether we want these anonymous enum types to implement only the traits we explicitly ask for, or all the traits they possibly could implement. Now that this question has been raised, I believe it's the only outstanding issue that really needs to get debated to make a decision on whether MARKER goes at every return site or only once in the signature/binding.

My preference is of course for the traits to be listed explicitly, since I believe the primary use case to be function signatures where you have to list them explicitly anyway, and I also suspect that auto-implementing every possible trait could lead to unexpected type inference nuisances, or runtime behavior, though I haven't thought about that much.

Let's make the type inference nuisance thing concrete. Say Trait1 and Trait2 both have a foo method, and types A and B both implement both traits. Then you want to write a function that, as in your last two examples, returns enum impl Trait1 and has a let binding on a match with two branches. If we go with your variation, the let binding infers the equivalent of enum impl Trait1+Trait2 and a foo() call later in the function becomes ambiguous, while in my variation you have to explicitly write enum impl Trait1 so a call to foo() just works. That's a real disadvantage of auto-implementing all possible traits, right?

@Ekleog

This comment has been minimized.

Copy link
Author

Ekleog commented Apr 25, 2018

I think "the idea of having ? return MARKER x.unwrap_err()" is also strictly an implementation detail that's not really relevant to the surface syntax debate, especially since ? is already more than just sugar over a macro.

Well, I added it to answer your concern that it would be painful to have to add MARKER at return sites like ? :)

Let's make the type inference nuisance thing concrete. Say Trait1 and Trait2 both have a foo method, and types A and B both implement both traits. Then you want to write a function that, as in your last two examples, returns enum impl Trait1 and has a let binding on a match with two branches. If we go with your variation, the let binding infers the equivalent of enum impl Trait1+Trait2 and a foo() call later in the function becomes ambiguous, while in my variation you have to explicitly write enum impl Trait1 so a call to foo() just works. That's a real disadvantage of auto-implementing all possible traits, right?

That's true. However, the same could be said with regular types: if I return a single value (so no MARKER anywhere), that implements Trait1 + Trait2 and put it in an un-typed variable, then calls to foo() later will be ambiguous. So that's consistent with what we currently have, and I don't think that's a real disadvantage: it's still possible to explicitly type with return-site marker, if you want to implement only a single trait and/or type inference fails: let x: impl Trait1 = if foo() { MARKER bar() } else { MARKER baz() } (the marking of a specific type would “close” the TypeInProgress type and realize it)

I'm not seeing the significance of the TypeInProgress stuff; that seems like machinery you could use in implementing either variation of the syntax, and I don't see what it buys you other than perhaps guaranteeing the type of x goes away.

Well, apart from the end-result being cleaner (and I don't think enum layout optimizations could optimize enum { enum { Foo, Bar }, Bar, Quux } into enum { Foo, Bar, Quux }, at least with named enums, as the tag could have significance), I don't know about rustc specifically, but typing is usually done on the AST. And on an AST, I think it'd be easier to go forward and slowly complete the type of a variable, than to try to go backwards from the return point to the return sites, and from there check all the possible types that could be returned .

Actually, I'd guess that's how rustc currently does type inference:

fn foo() -> Vec<u8> {
    let res = Vec::new; //: TypeInProgress(Vec<_>)
    bar();
    res // Here we know it must be Vec<u8>, so the _ from above is turned into u8
}

This is probably another thing we need compiler people to comment on, […]

Completely agree with you on this point :)

@nielsle

This comment has been minimized.

Copy link

nielsle commented May 2, 2018

Would it be practical to use a procedural macro to derive a specialized iterator for each word? (It seems possible, but a little verbose)

#[derive(IntoLetterIter)]
#[IntoLetterIterString="foo"]
struct Foo;

#[derive(IntoLetterIter)]
#[IntoLetterIterString="hello"]
struct Hello;

fn foo(x: bool) -> impl IntoIterator<Item = u8> {
    if x {  
        Foo 
    } else {
        Hello
    }
}
@joshtriplett

This comment has been minimized.

Copy link
Member

joshtriplett commented May 2, 2018

I'm concerned with the degree to which this seems to combine the implementation details of this specific optimization with the code wanting to use that optimization. It seems like, despite impl Trait itself being a relatively new feature, we're talking about extending it to include a form of reified vtables as an optimization, and exposing that particular choice of optimization with new syntax. And we're doing that without any performance numbers to evaluate that optimization.

I also wonder to what degree we could detect the cases where this makes sense (e.g. cases where we can know statically which impl gets returned) and handle those without needing the hint. If the compiler is already considering inlining a function, and it can see that the call to the function will always result in the same type implementing the Trait, then what prevents it from devirtualizing already?

I'd suggest, if we want to go this route, that we need 1) an implementation of this that doesn't require compiler changes, such as via a macro, 2) benchmarks, and 3) some clear indication that we can't already do this with automatic optimization. And even if we do end up deciding to do this, I'd expect it to look less like a marker on the return type or on the return expressions, and more like an #[optimization_hint] of some kind, similar to #[inline]

@maplant

This comment has been minimized.

Copy link

maplant commented May 2, 2018

Just to add my thoughts to this without clutter, here is my version of the optimization: https://internals.rust-lang.org/t/allowing-multiple-disparate-return-types-in-impl-trait-using-unions/7439
Automatically generating an enum is one way to devirtualize, but without inlining a lot of redundant match statements would be generated.
I'm interested in seeing what performance gains can be gleaned from this, if any.

I think that automatic sum type generation should be left to procedural macros

@Nemo157

This comment has been minimized.

Copy link
Contributor

Nemo157 commented May 2, 2018

@joshtriplett I don’t believe the only reason to want this is as an optimisation. One of the major reasons I want this is to support returning different implementations of an interface based on runtime decisions without requiring heap allocation, for use on embedded devices. I have been able to avoid needing this by sticking to compile time decisions (via generics) and having a few manually implemented delegating enums, but if this were supported via the language/a macro somehow that would really expand the possible design space.

I do agree that experimenting with a macro (limited to a supported set of traits, since it’s impossible for the macro to get the trait method list) would be the way to start. I’ve been meaning to try and throw something together myself, but haven’t found the time yet.

@maplant

This comment has been minimized.

Copy link

maplant commented May 2, 2018

@joshtriplett to address part of your comment, i.e. benchmarks, I created a repository that uses my method and benchmarks it against Box. Although I only have one test case and it is somewhat naive, it seems that my method is about twice as fast as Box. Repo here: https://github.com/DataAnalysisCosby/impl-trait-opt

@joshtriplett

This comment has been minimized.

Copy link
Member

joshtriplett commented May 2, 2018

@Nemo157 I don't think you need heap allocation to use -> impl Trait, with or without this optimization.

But in any case, I would hope that if it's available as an optimization hint, it would have an always version just like inline does.

@Ekleog

This comment has been minimized.

Copy link
Author

Ekleog commented May 2, 2018

@joshtriplett Let's look at this example (here showing what we want to do):

trait Trait {}
struct Foo {} impl Trait for Foo {}
struct Bar {} impl Trait for Bar {}

fn foo(x: bool) -> impl Trait {
    if x {
        Foo {}
    } else {
        Bar {}
    }
}

(playground)

This doesn't build. In order to make it build, I have a choice: either make it a heap-allocated object:

fn foo(x: bool) -> Box<Trait> {
    if x {
        Box::new(Foo {})
    } else {
        Box::new(Bar {})
    }
}

(playground)

Or I do it with an enum:

enum FooBar { F(Foo), B(Bar) }
impl Trait for FooBar {}
fn foo(x: bool) -> impl Trait {
    if x {
        FooBar::F(Foo {})
    } else {
        FooBar::B(Bar {})
    }
}

(playground)

The aim of this idea is to make the enum solution actually usable without a lot of boilerplate.

Is there another way to do this without heap allocation that I'd have missed?

As for the idea of making it an optimization, do you mean “just return a Box and have the compiler optimize-box-away(always)”? If so, how would it handle no_std systems, that don't (IIRC, my last use of such a system was ~a year ago) actually have Box::new?

@joshtriplett

This comment has been minimized.

Copy link
Member

joshtriplett commented May 3, 2018

@Ekleog Ah, thank you for the clarification; I see what you're getting at now.

@nielsle

This comment has been minimized.

Copy link

nielsle commented May 3, 2018

Regarding the third playground example, you can use derive_more to derive Foo.into(), or alternatively you can use derive-new to derive a constructor for FooBar.These libraries do not solve the complete problem in the RFC, but they may help a little.

AFAICS a procedural macro on the following form could potentially solve the complete problem

#[derive(IntoLetterIter)]
enum FooBar {
    #[format="foo"]
    Foo,
    #[format="hello"]
    Hello,
}
@Boscop

This comment has been minimized.

Copy link

Boscop commented Sep 4, 2018

@Ekleog Maybe the two ideas could be called "marker at unification" and "marker at expression"?

"marker at unification" requires writing the keyword only once, so it'd be more concise, but "marker at unification" should not be mistaken to mean "marker at return type".
I think "marker at unification" makes the most sense, but not at return type.
E.g. it could look like this:

fn bar() -> Option<LinkedList<char>> { /* ... */ }

fn foo() -> impl Iterator<Item = char> {
    impl enum match bar() {
        Some(x) => x.iter(),
        None => "".iter(),
    }
}

fn foo() -> impl Iterator<Item = char> {
    let mut tmp = impl enum match bar() { // first unification, of the match arms
        Some(x) => x.iter(),
        None => "".iter(),
    };
    let n = tmp.next();
    impl enum match n { // second unification
        Some(_) => tmp,
        None => "foo bar".iter(),
    } // this should not generate a 2-level nested enum as return type because it can be flattened
}

With "marker at expression" it would be much more verbose, requiring the impl enum keyword at every match arm (often you have many match arms).

("marker at return type" would only allow unification for return expressions but we want to allow it for expressions inside function bodies, too.)

@Ekleog

This comment has been minimized.

Copy link
Author

Ekleog commented Sep 5, 2018

@Boscop I like your names :) however, I still am not convinced by marker-at-unification: how would you have it handle this example (from somewhere above in the thread)? There is no way to unify after the Vec has been built here without a silent reallocation (which would be the case with marker-at-unification), while with marker-at-expression the compiler does the job of unifying at the right place (ie. inside map's argument lambda).

fn bar() -> bool { /* ... */ }
fn baz() -> Vec<Baz> { /* ... */ }
fn quux() -> Vec<Quux> { /* ... */ }
// Baz and Quux implement Trait
fn foo() -> Vec<impl Trait> {
    if bar() {
        baz().map(|x| marker x).collect()
    } else {
        quux().map(|x| marker x).collect()
    }
}
@Boscop

This comment has been minimized.

Copy link

Boscop commented Sep 6, 2018

@Ekleog The location of the marker does not influence where the compiler actually unifies!
Since the return type is Vec<impl Trait>, it knows it has to do the unification before the collect, since it can't do a unifying conversion for Vec<A> and Vec<B> (or any Foo<A> and Foo<B> when A and B both impl a trait).
So it will do the unification at the same location, no matter if we use "marker at expression" or "marker at unification".

But this shows that maybe the name "marker at unification" should be changed to "marker before branches" (aka "marker at sub-tree root"), because the actual unifying conversion always happens somewhere in the branches (in this case before the .collect(), in other cases at the end of each branch).
(But with the name "marker at unification" I meant that the logical "unification" point (that creates the need for unification) is the root of the sub-tree. The location of the inserted unifying conversion then is the "physical" act of unification.)

So the syntax is independent of the unification location, we just need to express which sub-tree should be unified. "marker before branches" (aka "marker at sub-tree root") would only require this keyword once (at the root of the sub-tree, before the (syntactical) branches), whereas "marker at expressions" (aka "marker inside branches" / "marker at sub-tree leaves") requires the keyword in every branch/sub-tree leaf..

With both syntaxes, the compiler would choose the same point for the actual unification: The last point where single values of type T: Trait occur in each branch (where you placed the marker above).


So to make the naming clearer, maybe we should just use "marker at tree root" vs "marker at leaves"..

@Ekleog

This comment has been minimized.

Copy link
Author

Ekleog commented Sep 9, 2018

@Boscop The problem is that if I expect this to work with marker-at-tree-root: (just taking the previous example and moving the marker keyword, and adding .into_iter()s I had forgotten)

fn foo() -> Vec<marker impl Trait> {
    if bar() {
        baz().into_iter().map(|x| x).collect()
    } else {
        quux().into_iter().map(|x| x).collect()
    }
}

then I would also naturally expect this to work, because .map(|x| x) is a no-op:

fn foo() -> Vec<marker impl Trait> {
    if bar() {
        baz().into_iter().collect()
    } else {
        quux().into_iter().collect()
    }
}

and then the .into_iter().collect() looks really like a no-op, so I would expect this to work:

fn foo() -> Vec<marker impl Trait> {
    if bar() {
        baz()
    } else {
        quux()
    }
}

at which point, if the compiler actually implements this in a way so that it works, there is an implicit reallocation.

Hence my not really liking this option :)

@Pauan

This comment has been minimized.

Copy link
Member

Pauan commented Sep 9, 2018

and then the .into_iter().collect() looks really like a no-op, so I would expect this to work:

Intuitively, I do not expect baz().into_iter().collect() to be equivalent to baz(), because the former is transforming the value into a new Vec based upon the expected return type, whereas the latter is not.

Consider this simple example:

let x: Vec<char> = baz();
let x: String = baz().into_iter().collect();

As you can see, baz().into_iter().collect() is absolutely not the same as baz()! So it is not a no-op at all: in this specific situation your intuition is wrong. You can think of it as being similar to how foo and |x| foo(x) are not equivalent.

My understanding is that the auto-coercion of a value into the anonymous enum only applies to the base type.

So a T can be auto-coerced into a marker impl Trait, but a Vec<T> cannot be auto-coerced into a Vec<marker impl Trait> (on the other hand, a Vec<T> can be auto-coerced into a marker impl Trait).

So I would expect that your final example will give a compile error, because it will refuse to unify Vec<T> and Vec<marker impl Trait>. I would expect it to give the same compile error regardless of whether it's using "marker at expression" or "marker at return type".

In other words, this should also be a compile error:

fn foo() -> Vec<impl Trait> {
    if bar() {
        marker baz()
    } else {
        marker quux()
    }
}
@Ekleog

This comment has been minimized.

Copy link
Author

Ekleog commented Sep 10, 2018

What you say about .into_iter().collect() not being a no-op is true indeed, however it is only changing the type of the container, not of the contents, currently as far as I know. Hence my saying it really looks like a no-op (to me currently) when going from Vec<_> to Vec<_> :)

Also, even baz().into_iter().collect() in my previous example cannot be unified as per the rules you raise up by the compiler, and the .map(|x| x) is actually necessary: baz().into_iter() is a impl Iterator<Item = Baz>, and the expected return type of the .collect() is Vec<marker impl Trait>, which cannot be made from a impl Iterator<Item = Baz> using method .collect().

Also, the .map(|x| x) would be required to give the compiler a chance to coerce inside the impl Iterator<Item = …>, as impl Iterator<Item = …> cannot be cast into a marker-ed one more than a Vec<…> could. (I'm not at all sure whether I'm putting my thoughts clearly here, please tell me if you find me hard to understand!). And I'm almost sure that currently considering .map(|x| x) is a no-op is actually correct, and not having it be a no-op would be very confusing :)

As for your last example about marker-at-leafs, we completely agree that this should not work, and it's why I had written baz().into_iter().map(|x| marker x).collect(), in the same way I would have written baz().into_iter().map(|x| MyHandRolledEnum::Variant1(x)).collect() :)

@Pauan

This comment has been minimized.

Copy link
Member

Pauan commented Sep 13, 2018

Also, even baz().into_iter().collect() in my previous example cannot be unified as per the rules you raise up by the compiler, and the .map(|x| x) is actually necessary

That's probably true, yeah (at least without major compiler trickery), in which case regardless of whether it uses "marker at expression" or "marker at type", the code will be identical, the only difference is where the marker is placed.

So the power of the two approaches should be identical, it's purely personal preference whether you prefer to explicitly mark where the coercion happens, or whether you prefer the convenience of only needing to specify the marker once.

I don't have a strong preference either way, but perhaps I'm leaning very slightly toward "marker at expression", simply because it makes it more obvious that .map(|x| marker x) is not a no-op.

And I'm almost sure that currently considering .map(|x| x) is a no-op is actually correct, and not having it be a no-op would be very confusing :)

That's not necessarily true, depending on the implementation of map.

I agree that intuitively it feels like it should be a no-op (just like how foo and |x| foo(x) intuitively feels like they should be identical), but that isn't necessarily true in reality.

As for your last example about marker-at-leafs, we completely agree that this should not work

Great, I'm glad we're in agreement about that.

@Boscop

This comment has been minimized.

Copy link

Boscop commented Sep 16, 2018

@Ekleog @Pauan I think we already agreed that "marker at return type" is not what we want, because we also want to be able to unify sub-expression trees within a function body.
So we have to decide between "marker at expr-tree root" and "marker at expr-tree leafs".


Also, even baz().into_iter().collect() in my previous example cannot be unified as per the rules you raise up by the compiler, and the .map(|x| x) is actually necessary

At least an implicit .map(|x| T::from(x)) would have to be inserted by the compiler but I'm not sure if we should make the compiler aware of iterators just so that it can insert this "magic". If we special-cased it for iterators, other wrapper types would be second-class citizens..

I don't have a strong preference either way, but perhaps I'm leaning very slightly toward "marker at expression", simply because it makes it more obvious that .map(|x| marker x) is not a no-op.

Yes, if we agree that we don't want the compiler to insert a magic .map(|x| T::from(x)) for iterators, we have to go with "marker at expression".


Btw, with "marker at expression", when we get impl Trait for let, your example could be written like this:

fn foo() -> Vec<impl Trait> {
    let iter: impl Iterator<Item = impl Trait> = if bar() {
        baz().into_iter().map(|x| marker x)
    } else {
        quux().into_iter().map(|x| marker x)
    };
    iter.collect() // collect() after the unified expression tree
}

And when we get impl Trait for expressions even if not bound by let, it could be written like this:

fn foo() -> Vec<impl Trait> {
    if bar() {
        baz().into_iter().map(|x| marker x)
    } else {
        quux().into_iter().map(|x| marker x)
    }.collect() // collect() after the unified expression tree
}

so you wouldn't even have to write .collect() twice :)
(But you still need to write .map(|x| marker x) in each leaf expr because the iterator Item type has to be the same.)

@Pauan

This comment has been minimized.

Copy link
Member

Pauan commented Sep 16, 2018

I think we already agreed that "marker at return type" is not what we want, because we also want to be able to unify sub-expression trees within a function body.

I don't see how "marker at type" prevents that. You can do stuff like this:

fn foo() {
    let x: marker impl Trait = ...;
}

This will unify the various types within ... into the marker impl Trait type.

At least an implicit .map(|x| T::from(x)) would have to be inserted by the compiler but I'm not sure if we should make the compiler aware of iterators just so that it can insert this "magic". If we special-cased it for iterators, other wrapper types would be second-class citizens..

I don't think I ever implied that. My proposal was that whenever converting an expression of type T to marker impl Trait it would do the automatic conversion. So it will work in any situation (not just with Iterators).

It just so happens that with Iterators the simplest way to create a T -> marker impl Trait expression is with .map(|x| x) (specifically, the x expression within the closure).

My comment about "compiler trickery" was with regards to the compiler doing the automatic conversion without .map(|x| x) (since in that case there wouldn't be any T -> marker impl Trait expressions).

@Boscop

This comment has been minimized.

Copy link

Boscop commented Sep 17, 2018

I don't see how "marker at type" prevents that. You can do stuff like this:

fn foo() {
    let x: marker impl Trait = ...;
}

@Pauan But we shouldn't require that the expression that we want to unify has to be bound with a let binding and given an explicit "type" (impl Trait)!

I don't think I ever implied that. My proposal was that whenever converting an expression of type T to marker impl Trait it would do the automatic conversion. So it will work in any situation (not just with Iterators).

The compiler can't do it automatically for wrapper types without knowing how to unwrap and re-wrap them though! It would need special-cased "magic" which would make custom wrapper types second-class citizens.

It just so happens that with Iterators the simplest way to create a T -> marker impl Trait expression is with .map(|x| x) (specifically, the x expression within the closure).

But if we don't use "marker at expression" and then require people to write .map(|x| x) (in the above example) just so that the compiler can insert the conversion there, it looks like a No-Op to humans, which we should avoid. Someone will remove it, thinking that it's a No-Op, and will get weird compiler errors. That's why we should use "marker at expression" (.map(|x| marker x)) so that it will be clear that this is not a No-Op, and this location is where the conversion happens!


Can we agree that "marker at expression" would satisfy all our requirements? It works for sub-expressions in function bodies (not just return expressions), it doesn't require a (typed) let binding, and it makes it clear where the conversion happens!

@Pauan

This comment has been minimized.

Copy link
Member

Pauan commented Sep 18, 2018

But we shouldn't require that the expression that we want to unify has to be bound with a let binding and given an explicit "type" (impl Trait)!

But that's already a requirement for "marker at expression":

fn foo() {
    let x: impl Trait = ... marker ...;
}

The type is needed, because without the type annotation the Rust compiler doesn't know what trait you're trying to convert it into.

The two systems are fundamentally equivalent, the only difference is the syntax (whether marker is at the type or expression).


The compiler can't do it automatically for wrapper types without knowing how to unwrap and re-wrap them though!

Yes it can, very easily. Why do you say it can't?

It would need special-cased "magic" which would make custom wrapper types second-class citizens.

Yes, which is the same as how closures are handled, but that's not a problem.

To be clear, "marker at expression" also has that problem, you act like it's a problem only with "marker at type", but it's not.

Regardless of which system is used, the compiler will generate an anonymous type, wrap the values into the type, and implement the trait for the type.

They are exactly the same system, the only difference is the syntax for where the marker goes. There is no fundamental difference between them.


But if we don't use "marker at expression" and then require people to write .map(|x| x) (in the above example) just so that the compiler can insert the conversion there, it looks like a No-Op to humans, which we should avoid. Someone will remove it, thinking that it's a No-Op, and will get weird compiler errors.

I agree, which is why I said in an earlier post that I'm very slightly in favor of "marker at expression".

But even then, the differences between them are minor. Both systems have benefits and drawbacks.


Can we agree that "marker at expression" would satisfy all our requirements?

Sure, but "marker at type" also satisfies the requirements.

It works for sub-expressions in function bodies (not just return expressions)

This is irrelevant, because both systems work fine with sub-expressions.

it doesn't require a (typed) let binding

Except that it does.

and it makes it clear where the conversion happens!

Yes, that is the primary benefit of it (which I already admitted to in an earlier post, I'm not sure why you keep trying to convince me of this).

The primary downside is needing to specify marker multiple times, and also the fact that when looking at the type it's not clear that there is an extra performance cost (compared to a regular impl Trait).


It just occurred to me that there is a third option: have the marker at both the type and expressions. This is even more verbose, but it makes everything extremely explicit.

@Ekleog

This comment has been minimized.

Copy link
Author

Ekleog commented Sep 19, 2018

@Pauan

But that's already a requirement for "marker at expression":

fn foo() {
    let x: impl Trait = ... marker ...;
}

The type is needed, because without the type annotation the Rust compiler doesn't know what trait you're trying to convert it into.

Well, this is potentially not true: the compiler could infer impl Trait from the uses of x, in the same way it currently infers u16 for x if I do fn foo(_: u16) {} let x = 42; foo(x); . See the machinery proposed at #2414 (comment) for how this could be implemented (basically, similar to the current handling of {integer}, I think).

It would need special-cased "magic" which would make custom wrapper types second-class citizens.

Yes, which is the same as how closures are handled, but that's not a problem.

I think @Boscop was speaking of making Iterator special-cased for automatically inserting the said .map(|x| x), which would sound like a bad idea to me too, because then eg. Streams become awkward to use (the same code that works for Iterator doesn't work for Stream)

The primary downside is needing to specify marker multiple times, and also the fact that when looking at the type it's not clear that there is an extra performance cost (compared to a regular impl Trait).

Well, a regular impl Trait can already be a manually-generated sum type or a trait object, so I'm not sure the extra performance cost should actually be shown here, as it's not necessarily slower than a non-markered type :) and so for this reason (as well as the fact that variable types could be inferred by the compiler) I don't think the explicitness at type location is actually required.

@Ixrec

This comment has been minimized.

Copy link
Contributor

Ixrec commented Sep 19, 2018

These examples involving containers or iterators of autogenerated enum types are starting to feel subtle enough that I'm not entirely sure it's worth going out of our way to support them. After all, whatever we end up using for marker, the whole point of marker impl Trait is to be a sugar for simply writing your own enum (and updating it every time your function gains/loses a return path). The other day I tried to catch up on this discussion and it took me several minutes just to make sense out of all this talk of why map(|x| marker x) was or was not necessary under various counterproposals. When understanding the intended semantics of a sugar feature becomes a challenge in and of itself, that feels like we've put the cart before the horse and it no longer qualifies as a "sugar".

@Pauan

This comment has been minimized.

Copy link
Member

Pauan commented Sep 19, 2018

@Ekleog the compiler could infer impl Trait from the uses of x

I'm not sure if that's true. It seems to me that inferring trait usage is quite a lot different (and more complicated) than inferring types like {integer}. But I haven't worked on the rustc compiler, so perhaps I'm wrong.


I think @Boscop was speaking of making Iterator special-cased for automatically inserting the said .map(|x| x), which would sound like a bad idea to me too

Oh, you're right, it does sound like they were saying that, my mistake. Though I had only mentioned compiler trickery in one small off-hand remark (and I wasn't in favor of it), and all of the discussions since then have been about requiring the .map(|x| x).

When I said this:

My proposal was that whenever converting an expression of type T to marker impl Trait it would do the automatic conversion.

I was not referring to some magic Iterator system, just an extremely simple type unification: when the compiler sees .map(|x| x) it knows that the x expression has the type T, and it tries to unify it with marker impl Trait, so it then inserts the wrapper around the x expression.

This works with any expression, anywhere (as long as it is being unified with marker impl Trait). It has nothing at all to do with Iterator. It's no different from ordinary type unification, or Into / From. It's not complicated at all, there's no compiler magic.

And it certainly would not allow for things like baz().into_iter().collect() or baz() (because there aren't any expressions of type T -> marker impl Trait).


Well, a regular impl Trait can already be a manually-generated sum type or a trait object, so I'm not sure the extra performance cost should actually be shown here, as it's not necessarily slower than a non-markered type :)

That is a good point, yeah.


@Ixrec These examples involving containers or iterators of autogenerated enum types are starting to feel subtle enough that I'm not entirely sure it's worth going out of our way to support them.

I'm not sure why the conversation has veered so far into Iterator. I don't think anybody is in favor of Iterator special-casing, and none of the proposals are adding in special support for Iterator (or any other sort of special-casing).

The two primary proposals are:

  1. Have the compiler insert the wrapper when unifying an <EXPR> expression of type T with type marker impl Trait.

  2. Have the compiler insert the wrapper when using marker <EXPR> (where <EXPR> is an expression of type T unifying with type impl Trait).

The differences are purely in the syntax (the type unification is the same, the anonymous type is the same, etc.)

In either case Iterators do not get any sort of special treatment, they're the same as anything else. So you will need to use foo.iter().map(|x| x) (with the first proposal) or foo.iter().map(|x| marker x) (with the second proposal).

And the same applies to Stream, Future, Option, Result, and anything else. No special-casing, no magic.


When understanding the intended semantics of a sugar feature becomes a challenge in and of itself, that feels like we've put the cart before the horse and it no longer qualifies as a "sugar".

This is the exploration / design phase, it's normal for things to get complicated and confusing, because many people are exploring many different possible options. The goal right now isn't to "decide" on some pristine final solution, the goal is to explore, and try out different ideas.

In addition, people are trying to explain their ideas (and also clarify any misconceptions), and people also change their mind, or new proposals are added, etc. So, some messiness is unavoidable.

However, that doesn't mean that the final sugar will be complicated or confusing. The two proposals right now are both quite simple (to understand, and also to implement).

I expect that after the exploration phase, a proper RFC will be written, in a clear and understandable way, without any of the current confusion or complications.

@Ixrec

This comment has been minimized.

Copy link
Contributor

Ixrec commented Sep 19, 2018

I'm not sure why the conversation has veered so far into Iterator. I don't think anybody is in favor of Iterator special-casing, and none of the proposals are adding in special support for Iterator (or any other sort of special-casing).

Sorry, I didn't mean to imply I thought anyone was proposing Iterator-specific magic, or that the on-paper description of the syntax had ever stopped being trivial.

What I meant was that people are making arguments for what the syntax ought to be based on what would be less weird for TypeConstructor<marker impl Trait>, but in these nested use cases every proposed variation of this syntax seems like it'd be more of a "clever" trick than a helpful self-evident sugar (to a future reader of the code that's not actively thinking about this specific feature like we are), and I'd rather just write out the enum myself at that point, so I don't think we should be encouraging or designing for these use cases (unless we somehow find a way to make even these cases similarly self-evident).

However, that doesn't mean that the final sugar will be complicated or confusing. The two proposals right now are both quite simple (to understand, and also to implement).

It's the need for map(|x| maybe-marker x) in these Iterator/Vec/etc cases that I found confusing. Yes, the proposals are still trivial to describe on paper, but that doesn't make their application to these use cases trivial to understand, even to a reader like me that's familiar with and actively thinking about this sugar.

@Ekleog

This comment has been minimized.

Copy link
Author

Ekleog commented Sep 19, 2018

@Ixrec The case for map(|x| marker x) came up when I noticed it just wouldn't be possible to handle unification at this point (with a reasonable syntax, I consider having map(|x| x) not be a no-op would be something that'd easily get in a “WTF Rust‽” talk and that we should avoid it) for marker-at-root-expression. Having a syntax that works only for a subset of cases would be sad, even though the unsupported subset of cases is rare there would necessarily come a point where someone would want to do it and be annoyed by such a limitation. Off-my-hat example: impl Stream<Item = impl Future> where the impl Future is going to be an AGST.

That said, my main argument in favor of marker-at-leaf-expressions still remains (well, after the original flip-flop) that it's how things would be done with a manually-written enum: just replace Either::A(stuff) with marker stuff and everything else gets done automatically.

Now, I must also say that my original use case for this was for huge futures. With async/await moving forward, this use case will likely become a lot less relevant, so I'm no longer sure this should actually get syntactic sugar. Just to say that I won't be pushing this forward until we gain some experience with async/await, to know whether there is still a real need for this RFC (for me) after async/await. :)

@Ixrec

This comment has been minimized.

Copy link
Contributor

Ixrec commented Sep 19, 2018

Now, I must also say that my original use case for this was for huge futures.

For (historical?) context, way way back when I first got into this topic and I suggested we call this enum impl Trait and we hadn't all collectively decided to always say marker for fear of bikeshed tangents, the motivating use case was autogenerating error enums. The problem statement back then was that whenever you change a function's implementation in a way that produces a new early return/error codepath, that new error type was typically "viral" in the sense that the function signature and all of its callers and their callers had to be updated to use a new slightly different set of error enums. With just regular impl Trait stabilized, the "viral" aspect is now fixed, but there's still the nuisance of bouncing between the "real" function and its dedicated error enum.

So that's part of the reason the recent discussion of nested things like Iterator<marker impl Trait> seemed undermotivated to me. Returning a collection or iterator of errors is very unusual. But you're right that looking at this from a Futures/Streams/etc angle completely changes that.

@Nemo157

This comment has been minimized.

Copy link
Contributor

Nemo157 commented Sep 19, 2018

@Ixrec (aside, this is the first time I've noticed that's a capital I not a lowercase l)

error types have some of the exact same issues with wanting to have the conversion happen inside closures though, for example you might want to write something like

use std::error::Error;

fn foo() -> Result<u32, Error1> where Error1: Error { ... }
fn bar(baz: u32) -> Result<u64, Error2> where Error2: Error { ... }

fn qux() -> Result<u64, marker impl Error> {
    let baz = foo().map_err(|e| marker e)?;
    let result = bar(baz).map_err(|e| marker e)?;
    Ok(result)
}

to make the short way to write it work, somehow From::from would need to be supported as an alternative to the marker

fn qux() -> Result<u64, marker impl Error> {
    Ok(bar(foo()?)?)
}
@Ixrec

This comment has been minimized.

Copy link
Contributor

Ixrec commented Sep 19, 2018

Yeah, in older discussions I think it was always assumed that if this feature happened ? would be integrated well enough with it to enable the "short way".

@Boscop

This comment has been minimized.

Copy link

Boscop commented Sep 20, 2018

@Ixrec Once we have impl enum baked into the language we can make ? work with it, too, because it's also baked in..

@Ekleog @Pauan Yes, there was some confusion. I'm not in favor of special-casing unwrapping/rewrapping for Iterators or any other type, that's why I prefer "marker at expression" so that we'd write .map(|x| marker x) etc.
Btw, it should be possible to infer the trait that should be used (more often than not), thus "marker at expression" wouldn't always require a let binding, e.g.:

foo(if cond { marker a } else { marker b });

fn foo<T: Bar>(x: T) { ... } // or written as: fn foo(x: impl Bar) { ... }

it allows the compiler to infer that the trait to use for unification is Bar.

Even in cases where it can't be inferred, it should work with a : impl Trait type ascription after the expression, still not requiring a let binding :)

Another argument for keeping the marker away from the impl Trait is that the trait itself is not affected by the unification. It's just a "coincidence" that impl Trait is written at the expression tree root in some cases (let bindings and return values). impl enum Trait would make it weird because then the marker would occur in a syntactic location that should only be used for specifying trait names.

But the strongest argument for "marker at expression" IMO is .map(|x| marker x). So I'm really in favor of "marker at expression"..

@burdges

This comment has been minimized.

Copy link

burdges commented Sep 20, 2018

I like roughly the enum_derive_delegate! macro idea meaning do this via procedural macros. We'll want enums that delegate traits like that anyways.

Also, there are seemingly many parallels to trait objects here, which we could leverage via improvements in DST handling:

fn foo(x: bool) -> impl Trait {
    trait MyTrait = Trait<AsssocType = SomeType, const C = some_value>; 
    if x { return enum_from_trait_object!(bar() : dyn MyTrait) }
    return enum_from_trait_object!(baz() : dyn MyTrait) }
}

In this, enum_from_trait_object! is a procedural macro that creates and returns an enum type by delegation from a trait object value realized by only a fixed number of concrete types, all with exactly the same associated types or constants for the trait.

We might need procedural macros to interact with type inference for that exact syntax to work, as the different invocations of enum_from_trait_object! must be linked together. We could avoid this with slightly more limited syntax:

fn foo(x: bool) -> impl Trait {
    trait MyTrait = Trait<AsssocType = SomeType, const C = some_value>; 
    enum_from_trait_object!(if x { 
        bar() : dyn MyTrait 
    } else { 
        baz() : dyn MyTrait 
    })
}

You could still simplify this syntax with a convenience macro though:

#[anonymous_delegated_enum]
fn foo(x: bool) -> impl Trait {
    if x { return bar() : dyn Trait }
    return bar() : dyn Trait }
}
@Ekleog

This comment has been minimized.

Copy link
Author

Ekleog commented Sep 20, 2018

@Boscop (disclaimer: I'm in favor of marker-at-expression too)

IMO, there is still an argument in favor of marker-at-type, and it is the question of ?: should it automatically add in marker to the error value, or not?

The answer to this question is highly non-obvious to me.


@burdges The enum_from_trait_object! can be written iff. the enum_derive_delegate proc macro is added to the language (as that would require special compiler support to generate the correct list of functions).

Also, the issues with using only the enum_derive_delegate to actually implement this idea have been listed here. In particular, it cannot be integrated with ?, nor with custom try!-like macros, because it can't say that if a single type is found then there should be no wrapper. This absence of integration makes it really unlikely to be used for errors IMO, which is likely becoming the primary use case for it, with async/await coming into existence.

@Boscop

This comment has been minimized.

Copy link

Boscop commented Sep 20, 2018

@Ekleog Btw, ? also works in do/try-catch blocks (which are expressions that don't necessarily have a type ascription) not just for return values. But ? already calls .into(), so it could also insert the marker but then we'd need a marker at the root of the expression tree (not necessarily at the type of this expression tree because there might not be a type ascription).
So to satisfy both constraints (1. .map(|x| marker x) and 2. ? uses marker when necessary) we'd either need to use a marker at expr AND at tree root, or require at least one of both (but not necessarily both).
Cases like (1) would require the marker at expr but cases like (2) would require it at the tree root / type.

@burdges

This comment has been minimized.

Copy link

burdges commented Sep 21, 2018

You first do not want ? to wrap errors when all types have identical error types, while you second do want to wrap errors into a second enum when they differ, right?

We cannot have syntactic ambiguity between the first and second cases here of course since they create different associated types. I only suggested requiring identical associated types, which supports the first case but completely forbids this second case.

We might call this second form "associated type erasure safe", especially in the trait object context. We have many object safe traits like AddAssign<&'a T> which do not support erasure of the associated type T, either by enum or reference. Any associated type being used in argument, constant, or constraint position seemingly creates this problem.

Anyways..

An anonymous enum type is seemingly a special case of a trait object, so the syntax enum Trait could be implemented by creating a trait object, except that (a) rustc identifies all variants used, (b) assigns size from the largest variant, and (c) places the vtable reference inside the type as a discriminant. Implementing with actual enums sounds like an optimization.

We should thus explore type erasure for trait objects before worrying about the enum special cases being discussed here. Is there any serious proposal even for generic tooling to say supersede the erased-serde crate? Associated type erasure safety, and merging error types, goes way beyond that, but that's your starting point.

@mqudsi

This comment has been minimized.

Copy link

mqudsi commented Oct 9, 2018

See also the discussion in #2261, which has been closed in favor of this issue.

@newpavlov

This comment has been minimized.

Copy link

newpavlov commented Nov 7, 2018

I've wrote a pre-RFC which includes functionality requested in this issue.

@taiki-e

This comment has been minimized.

Copy link

taiki-e commented Jan 24, 2019

I wrote this feature with procedural macro: auto_enums.
This analyzes obvious branches (match, if, return, etc..) and assigns variants.
In many cases, macros and methods for identification are unnecessary.

#[auto_enum(Iterator)]
fn foo(x: i32) -> impl Iterator<Item = i32> {
    match x {
        0 => 1..10,
        _ => vec![5, 10].into_iter(),
    }
}

Generated code

Several other points:

@alexreg

This comment has been minimized.

Copy link

alexreg commented Jan 24, 2019

@taiki-e Good stuff! This looks like a well-engineered approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.