Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unified coroutines a.k.a. Generator resume arguments #2781

Open
wants to merge 17 commits into
base: master
from

Conversation

@semtexzv
Copy link

semtexzv commented Oct 10, 2019

This RFC outlines a way to unify existing implementation of the Generator feature with the Fn* family of traits and closures in general. Integrating these 2 concepts allows us to simplify the language, making the generators 'just pinned closures', and allows implementation of new patterns due to addtional functionality of generators accepting resume arguments.

Generator resume arguments are a sought after feature, since in their absence the implementation of async-await was forced to utilize thread-local storage, making it no_std.

Rendered

The RFC builds upon original coroutines eRFC

Main contention points:

  • Syntax of arguments changing between yields (explicit vs implicit) and the interaction with lifetimes.
  • Use of tuples and connection to closures, and the interaction with yield being an expression which resolves to arguments passed to resume.

Examples of new patterns designed with proposed design of the generator trait & Feature:

```
The example looks like we aren't assigning to the `name` binding, and therefore upon the second yield we should return the value which was passed into the first resume function, but the implementation of such behavior would be extremely complex and probably would not correspond to what user wanted to do in the first place. Another problem is that this design would require making `yield` an expression, which would remove the correspondence of `yield` statement with the `return` statement.

The design we propose, in which the generator arguments are mentioned only at the start of the generator most closely resembles what is hapenning. And the user can't make a mistake by not assigning to the argument bindings from the yield statement. Only drawback of this approach is, the 'magic'. Since the value of the `name` is magically changed after each `yield`. But we pose that this is very similar to a closure being 'magically' transformed into a generator if it contains a `yield` statement and as such is acceptable amount of 'magic' behavior for this feature.

This comment has been minimized.

Copy link
@rpjohnst

rpjohnst Oct 10, 2019

Another downside of this approach is that a generator can't "naturally" hold on to passed-in values from previous invocations- they have to move them into other bindings if they want them to live across suspension points.

The let (name,) = yield name; syntax is arguably more correct- you can shadow the argument name, but you can also name it something else, or even overwrite the existing binding. The key is that the yield expression evaluates to the passed-in values, as it does in most coroutine implementations. That is, this should be valid:

let gen = |foo: T, bar: U| {
    /* foo and bar are the arguments to the initial resume */
    let (baz, qux) = yield ..;
    /* foo and bar are *still* the arguments to the initial resume, but probably dead here */
    /* baz and qux are the arguments to the second resume */
    do_something(yield ..);
    /* foo, bar, baz, and qux are still the arguments to the initial and second resume respectively */
    /* do_something is passed a tuple of the arguments to the third resume */
};

The root of what makes this awkward is the fact that generators don't run any code until their first resumption, which means there is no yield expression for the first set of passed-in values. This also potentially causes problems with lifetimes (see below), so it may be worth trying to come up with alternative non-closure syntax to declare the argument types.

This comment has been minimized.

Copy link
@semtexzv

semtexzv Oct 10, 2019

Author

Yes, the pre-rfc I linked to tried to solve this by introducing different set of parameters for the Start and Resume, which would correspond to this syntactic choice.

Well, what is more correct behavior ? Take a generator, which has 10 yield points, each returing 3 values. In the approach in which the default behavior is to store passed arguments inside the generator, this generator has suddenly grown to contain 30 fields, even though user did not request this behavior. We might optimize them out, but conceptually, they are still stored inside the generator.

I believe not storing these values is more correct choice. This is the same choice, which is picked by FnMut closures. But the issue of lifetimes is still present.

But this issue is mostly present here, on the source code presentation.Wouldn't dropping the arguments before yielding( in MIR representation) be a natural choice ?

On the applicable issues, I lean to the 'What's the behavior of closures ?' question to determine aproprate answer.

This comment has been minimized.

Copy link
@rpjohnst

rpjohnst Oct 10, 2019

We already optimize out dead values in generators, that is not an issue. The problem is that your approach prevents the user from storing them even if they wanted to. (Without going out of their way to copy them somewhere else.)

This comment has been minimized.

Copy link
@bwo

bwo Oct 12, 2019

The root of what makes this awkward is the fact that generators don't run any code until their first resumption, which means there is no yield expression for the first set of passed-in values.

it also means that you need to pass in the same type of value with which the generator is resumed in order to start it—this approach seems to make impossible, or at least syntactically complicated, starting the generator with arguments other than the resumption type, including starting it with no arguments.


- Python & Lua coroutines - They can be resumed with arguments, with yield expression returning these values [usage](https://www.tutorialspoint.com/lua/lua_coroutines.htm).

These are interesting, since they both adopt a syntax, in which the yield expression returns values passed to resume. We think that this approach is right one for dynamic languages like Python or lua but the wrong one for Rust. The reason is, these languages are dynamically typed, and allow passing of multiple values into the coroutine. The design proposed here is static, and allows passing only a single argument into the coroutine, a tuple. The argument tuple is treated the same way as in the `Fn*` family of traits.

This comment has been minimized.

Copy link
@rpjohnst

rpjohnst Oct 10, 2019

I don't believe dynamic typing really changes things here. It's true that in dynamic languages you can come up with elaborate protocols where each resume takes a different set of argument types, but that's rare and confusing- most use cases stick with a single type, just as we are enforcing in Rust generators.

Letting yield evaluate to the passed-in values, with the expression's type determined by the closure-syntax argument types, shouldn't cause any problems. Indeed, it simplifies things by introducing fewer new semantics (see above and below).

This comment has been minimized.

Copy link
@semtexzv

semtexzv Oct 10, 2019

Author

Well, it does not introduce new cognitive load but introduces fallible syntax, and makes the default/shorter code the wrong one in many cases.

This comment has been minimized.

Copy link
@eaglgenes101

eaglgenes101 Oct 10, 2019

We already have falliable syntax.

let x = {
    let mut i = 2;
    loop {
        // May not necessarily add to i
        i += if i > 16 {
            i/2
        } else {
            // Does not return a value to increment i by, 
            // instead breaking out of the loop and causing it
            // to evaluate to i
            break i 
        };
    }
};

- Do we unpack the coroutine arguments, unifying the behavior with closures, or do we force only a single argument and encourage the use of tuples ?

- Do we allow non `'static` coroutine arguments ? How would they interact with the lifetime of the generator, if the generator moved the values passed into `resume` into its local state ?

This comment has been minimized.

Copy link
@rpjohnst

rpjohnst Oct 10, 2019

If we do allow non-'static coroutine arguments, some use cases will want arguments that live only for a single call to resume. In this case, the syntactic approach proposed here doesn't work very well- the arguments would change lifetime (and thus type!) throughout the execution of the generator.

If instead the arguments are provided as the value of a yield expression, each one could have a separate set of lifetimes limited by the actual live range of the expression's result. This fits much more naturally with how lifetimes work in non-generator code, especially with NLL.

This comment has been minimized.

Copy link
@semtexzv

semtexzv Oct 10, 2019

Author

Yes, this is the primary issue with my approach.

The generators in this case assume lifetime issues are not resolved at the source code / AST level, but rather at the control-flow / MIR level.

This means that in this case:

1. |a| {
2.    loop {
3.       println!("a : {:?}", a);
4.       let (a,) = yield 1;
5.       println!("b : {:?}", a);
6.       let (a,) = yield 2;
7.    }        
8. }

The lifetime of a both starts at the line 6 and ends at the line 4, and starts at line 1, and ends at line 4 depending on the entry point. But, these are 2 different values of a, the behavior should be similar to the behavior which would be observed if we have written the generator by hand as a match statment, just like in the RFC.

fn take(a : &str, b : &str) {
    match a {
        "0" => {
            take(b)
        },
        "0" => {
            take(b)
        }
    }
}

In this case the b has 2 exit points, and therefore its lifetime is extended to cover both, if i'm not mistaken.

But yes, the flow of lifetimes backwards is weird. But I think you can't escape it with re-entrant generators.

But the issue is still present with the generators that store the values by default . In these generators, the lifetime of the arguments would necessariliy have to include the lifetime of the generator as a whole, since they could be stored inside it ?

I was under the assumption that the borrowck, is MIR based, not AST based, and therefore this approach would be usable. Or is the checking of lifetimes performed on the AST level ? Need more info from compiler team.

This comment has been minimized.

Copy link
@Nemo157

Nemo157 Oct 10, 2019

Contributor

This is a requirement for one of the primary usecases: futures. They will need to take Args = (&mut core::task::Context<'_>,) which contains two lifetimes that are only valid during the current resume call.

pythoneer and others added 2 commits Oct 10, 2019
fixed small typos and formatting
Copy link

KrishnaSannasi left a comment

There were a few changes not related to the content of the RFC, but cleaned it up a bit

text/0000-unified_coroutines.md Outdated Show resolved Hide resolved
text/0000-unified_coroutines.md Outdated Show resolved Hide resolved
text/0000-unified_coroutines.md Outdated Show resolved Hide resolved
```
How the similar state machines are implemented today:
```rust
enum State { Empty, First, Second }

This comment has been minimized.

Copy link
@KrishnaSannasi

KrishnaSannasi Oct 10, 2019

Suggested change
enum State { Empty, First, Second }
enum State { First, Second }

You don't need Empty to move out of sate in the for match statement

This comment has been minimized.

Copy link
@semtexzv

semtexzv Oct 11, 2019

Author

This code is copied from existing state machines I found in the wild. I'd rather keep it that way.

enum Event { A, B }
fn machine(state: &mut State, event: Event) -> &'static str {
match (mem::replace(state, State::Empty), event) {

This comment has been minimized.

Copy link
@KrishnaSannasi

KrishnaSannasi Oct 10, 2019

Related to my last comment, you don't need to replace state here with State::Empty

@KrishnaSannasi

This comment has been minimized.

Copy link

KrishnaSannasi commented Oct 10, 2019

Another view of generators is that they are a more general form of Iterator. With this in mind, under future possibilities could we add combinators that can pass the output of one generator/iterator to he input of another generator? Basically baking the input arguments.

semtexzv added 2 commits Oct 10, 2019
Update 0000-unified_coroutines.md
@KrishnaSannasi

This comment has been minimized.

Copy link

KrishnaSannasi commented Oct 10, 2019

@semtexzv
I was thinking something along the lines of, this

@Nemo157

This comment has been minimized.

Copy link
Contributor

Nemo157 commented Oct 10, 2019

Similar to core::future::Future experimenting with combinators outside the standard library as part of futures I believe it would make sense to leave Generator combinators up to an external library to start (and by the time Generator might be approaching stabilisation we'll hopefully have some examples/experience with moving combinators into the standard library with Future).

semtexzv added 2 commits Oct 11, 2019
text/0000-unified_coroutines.md Outdated Show resolved Hide resolved
text/0000-unified_coroutines.md Outdated Show resolved Hide resolved
@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Oct 11, 2019

I feel that the RFC as currently written spends a lot of “weirdness budget” in a way that is not necessary. In “normal” closures, the bindings introduced for arguments (between the two | pipes) exist from the very start of the closure’s body, which matches their location in source code. This RFC proposes the same location for introducing names that are not available until the first yield, and then can change value at each yield. (This leads to questions like what if the previous value was borrowed?)

How about this alternative?

  • Like in the current RFC, the Generator trait gains a new type parameter. Except I’ll call it ResumeArg (singular)

  • ResumeArg can be any type, not only a tuple. (Although it might default to (), and using a tuple would be an idiomatic way to pass multiple values.)

  • yield is an expression of type ResumeArg. That’s it. It is not concerned at all with name bindings. You can use that expression in a let $pattern = $expr;, but that’s not necessary. It could be passed as a function argument, or anything that accepts an expression.

  • There is no way to specify the ResumeArg type as part of generator literal syntax. (Nothing is allowed between the pipes when yield is used.) This RFC and its proposed alternatives seem to try hard to find such a way, but that may not be necessary: it can be a type variable left for inference to resolve (based both on what the generator’s body does with its yield expression, and how the generator is used).

    There is nearby precedent for this: generator literals don’t have syntax to specify the Generator::Yield associated type either. And the -> $type syntax that specifies Generator::Return is optional.

@semtexzv

This comment has been minimized.

Copy link
Author

semtexzv commented Oct 11, 2019

@SimonSapin You raise an important point. But, like closures, the arguments are available from the start of the generator, since generator is created suspended at the start, like closures, and the first resume starts the generator from the start. The only difference compared to FnMut is that generator has multiple return points, and before each return, it stores its state.

Instead of a generator as it stands today, think about a pinned FnMut closure.

@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Oct 11, 2019

Hmm I may have been mislead by this part of the RFC:

Notice that the argument to first resume call was unused,

I took this to mean that the proposed language semantics are that the argument to the first resume call is always dropped, because that call doesn’t have a corresponding yield expression. But maybe instead you only meant that this particular example generator does nothing with the argument of the first resume call?

If we consider that this first value passed to resume doesn’t need to be dropped, then indeed the “closure argument” syntax would be the appropriate way to make them available to the generator’s body.

So I think preference is closest to what is currently listed as alternative # 2. I’ll quote it below and comment inline:

  1. Creating a new binding upon each yield

I’d still phrase this as: yield is an expression, whose value may or may not be bound to a name with let.

let gen = |name :&'static str| {

    let (name, ) = yield "hello";
    let (name, ) = yield name;
}

We are creating a new binding upon each yield point, and therefore are shadowing earlier bindings.

Yes about shadowing. But again, creating a binding doesn’t have to be mandatory.

This would mean that by default, the generator stores all of the arguments passed into it through the resumes.

Not necessarily, even with a binding. The value won’t be stored in the generator if it has been moved or dropped before the next yield. Like any other value manipulated on the stack of a generator.

Another issue with these approaches is, that they require progmmer, to write additional code to get the default behavior. In other words: What happens when user does not perform the required assignment ? Simply said, this code is permitted, but nonsensical:

let gen = |name: &static str| {
    yield "hello";
    let (name, ) = yield name;
}

This is not nonsensical at all. Like any $expr; statement, the first yield simply drops its resume value.

Another issue is, how does this work with loops ? What is the value assigned to third in following example ?

let gen = |a| {
    loop {
        println!("a : {:?}", a);
        let (a,) = yield 1;
        println!("b : {:?}", a);
        let (a,) = yield 2;
    }        
}
let first = gen.resume(("0"));
let sec = gen.resume(("1"));
let third = gen.resume(("2"));

third is the integer 1. But I think the intended question was: what line is printed third? The answer to that is a : "0". The let bindings shadowed a for the rest of the lexical scope (which is the rest of the loop). When the loop does its next iteration, control goes back out of the scope of shadowing a’s and the initial a is visible again. The same would happen with a similar loop outside of a generator.

@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Oct 11, 2019

In short, I feel strongly that yield should be a stand-alone expression, not necessarily tied to name bindings.


Separately from the above, I feel there is a choice between:

  • (As in the current RFC), the value passed to resume has to be a tuple. For the first resume call, this maps very nicely to closure argument syntax with any number of argument. However for subsequent calls, in the case of a 1-tuple we’d need somewhat-awkward unpacking like let (name,) = yield; foo(name).

  • Or, the value passed to resume can be any type, and yield expressions have that type. let name = yield; foo(name) or even foo(yield) looks much nicer without tuple unpacking. However this forces the “closure argument” syntax to only have one argument. || {…} (zero argument) could be syntactic sugar for |_: ()| {…} (one argument of type unit-tuple). (Only for generators of course, not normal closures.) It’s for the initial bindings of multiple values as one tuple argument that this becomes awkward: |(foo, bar): (u32, &str)| {…} instead of |foo: u32, bar: &str| {…}

  • Trying to reconcile both nice-to-haves creates inconsistencies or discontinuities that are undesirable IMO. For example, does |foo: u32| { yield } implement Generator<u32> or Generator<(u32,)>? Why?

@semtexzv

This comment has been minimized.

Copy link
Author

semtexzv commented Oct 11, 2019

Well, the argument HAS to be a tuple, and it HAS to be unpacked inside the argument list of the generator, since it is the behavior of closures, and deviating from this behavior would most certainly be a mistake. I again, point out the Fn* family of traits.

As for the third point, the argument would most certainly have to be a tuple. (Interaction of Fn traits and closures).

But yes, yield being an expression is one approach. The issue I have with it, is that it does not provide unified way to accept arguments upon the generator start, and it's resume. You have argument list at the start, and then a tuple at the resume. If we had Tuple unpacking/packing, we could resolve this, and I think that solution would be one of the best.

But there is something to be said about the default choice. Is it a good default to drop the values passed into resume ?

I accept syntactic inconvenience, but I think, the deviation from concepts introduced in closures is a huge mistage.

@Nemo157

This comment has been minimized.

Copy link
Contributor

Nemo157 commented Oct 11, 2019

Not necessarily, even with a binding. The value won’t be stored in the generator if it has been moved or dropped before the next yield. Like any other value manipulated on the stack of a generator.

This would imply you need to scope every yield call (even moving the value out is not enough, see rust-lang/rust#57478), generators are like normal code and drop their values at the end of scope. (Although there are optimizations applied to drop values early if that is not observable).

@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Oct 11, 2019

@semtexzv

HAS to be

Yes, the goal of making generators and closures as close to each other as possible leads to arguments being a tuple. But I’m personally not very attached to that goal in the first place.

Closures have a whole family of traits with Fn, FnMut, and FnOnce. They are useful for reasons that doesn’t apply to generators. Fn taking &self doesn’t work for generators since resume always mutates (at least to track initial state v.s. each yield point v.s. returned). FnOnce defeats the point of having a generator in the first place.

@Nemo157 rust-lang/rust#57478 is an implementation bug to be fixed, right?

@Nemo157

This comment has been minimized.

Copy link
Contributor

Nemo157 commented Oct 11, 2019

Even if rust-lang/rust#57478 is fixed I would expect there to be a lot of generators that just shadow their existing bindings without moving out of them, having those all be kept alive to be dropped at the end would not be good, e.g.

|arg: String| {
    let arg = yield;
    let arg = yield;
    let arg = yield;
}

would have to keep all 4 strings live until dropped when the generator completes.

@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Oct 12, 2019

Going back to the example and modifying it slightly the following questions arise:

let gen = |name :&'static str| {
    let (name1, ) = yield "hello";
    // Is name here still valid?
    let (name2, ) = yield name1;
    // Is name2 here still valid?
}

I think the answer is no. Otherwise the requirements regarding lifetimes for callers of resume() would be extremely hard to describe. Everything which is passed to resume() at one point of time would need to be marked as borrowed for the lifetime of the generator. Which is most likely not what we want. Rather we want things to be only borrowed for the duration of the resume call.

Yes, in this example all variables should be available until the end of the scope. Why wouldn’t they be? And yes, this means that the generator type needs to have a lifetime parameter and borrow the resume arguments. (If they have lifetimes other than 'static, unlike this example.) A way to enforce that would be to make ResumeArg an associated type rather than a type parameter of the trait:

impl Generator for MyGenerator {
    type ResumeArg = &'a str;
    // …
}

The above cannot use 'a because it is not in scope, so it needs to be added to the impl block:

impl<'a> Generator for MyGenerator {
    type ResumeArg = &'a str;
    // …
}

This is also now allowed because the 'a has an unused parameter. If the Generator trait doesn’t accept parameters, the only way to make that parameter be used is therefore to parameterize the type:

enum MyGenerator<'a> {
    Initial,
    Yield1 { name: &'a str },
    Yield2 { name: &'a str, name1: &'a str },
    Returned,
}

impl<'a> Generator for MyGenerator<'a> {
    type ResumeArg = &'a str;
    // …
}
@semtexzv

This comment has been minimized.

Copy link
Author

semtexzv commented Oct 12, 2019

Edit: Added an alternative design found on rust forums, please re-read the RFC.

For multiple different yield poins you'd have to add multiple lifetime arguments, so I don't think this is the right choice. Previous attempts went this way, and met the same issues, ultimately resolving to using the same hacks to solve the lack of GATs which could be used to solve lifetime issues.

semtexzv added 2 commits Oct 12, 2019
@Matthias247

This comment has been minimized.

Copy link
Contributor

Matthias247 commented Oct 12, 2019

Edit: What is here is wrong. See clarifications from @rpjohnst below.

Yes, in this example all variables should be available until the end of the scope. Why wouldn’t they be?

I think this doesn't work great for a variety of reasons:

1. I don't see how this would look the caller, and how it would interact with the borrow-checker.

All arguments that are passed to resume() need to be permanently consumed - which is not something that is a known concept. And things like the following code wouldn't be valid:

let mut data: String = "hello".to_owned();
gen.resume(&data); // Now gen internally captures a reference to the original string
data += " world"; // This invalidates the reference
gen.resume(&data); // Now gen would have an invalidated and a new reference

2. It's unclear how this would interact with control flow in generators.

E.g. something along

let gen = |name :&'static str| {
    let (name, done) = yield "hello";
    while !done  {
        let (name, done) = yield name;
    }
}

If this follows the rule to capture everything, then the generators size would be infinite.

If the binding is the same for all iterations of the loop so that only 2 variants are captured, then the behavior on the resumer side is unclear. For some resumes the argument is only borrowed for the duration of the resume, for others for a few resume calls until it is released again, and yet for others it is the whole lifetime of the generator.

This also makes any change inside the generator non API compatible anymore, since introducing any control flow would change the lifetimes.

3. It increases generator size, even if applications don't require parameters across yield points

Lots of applications for resume args, e.g. passing Context for Futures, don't require any parameters across yield points. Passing them again is totally fine - or even expected, since things might have changed. Storing more - for the sake that some applications might require it - seems to be the wrong default. If some generators need data again they could just pass it again into the next resume call.

We could argue that if things are not used across yield points they are optimized away by the compiler. However in practice that seems to be tricky. For async fn there were already lots of questions when parameters are allowed to be dropped, in order to be semantically compatible with non async code. Here the latest changes were to keep the parameters as long as possible, which increases the Future sizes.

@rpjohnst

This comment has been minimized.

Copy link

rpjohnst commented Oct 12, 2019

The "yield evaluates to the resume argument" approach doesn't have any of those problems. And it doesn't have them precisely because it matches the behavior of normal Rust code more closely!

And things like the following code wouldn't be valid:

let mut data: String = "hello".to_owned();
gen.resume(&data); // Now gen internally captures a reference to the original string
data += " world"; // This invalidates the reference
gen.resume(&data); // Now gen would have an invalidated and a new reference

You can already write this in stable, non-generator Rust, and it handle both the cases of the generator keeping the argument and the generator dropping the argument. This simply needs to be encoded into the type of resume:

let gen = |name :&'static str| {
    let (name, done) = yield "hello";
    while !done  {
        let (name, done) = yield name;
    }
}

If this follows the rule to capture everything, then the generators size would be infinite.

No, the bindings created in the loop body are dropped at the end of the iteration, and thus don't live across any yield point. (This also doesn't do what you want, for the same reason.)

If the binding is the same for all iterations of the loop so that only 2 variants are captured, then the behavior on the resumer side is unclear. ...

This also makes any change inside the generator non API compatible anymore, since introducing any control flow would change the lifetimes.

This is already the case for async/await, and is why Pin exists. This is just a fundamental part of how structs-with-references work in Rust today, you can't get around that. Even with the "magic mutating arguments" approach, the generator author has the same decision to make, because they can move the value into a local that does live across yield points.

Fortunately, as above, the resumer-side behavior can be determined by looking at the type of resume in the particular generator impl.

It increases generator size, even if applications don't require parameters across yield points

It does not, please stop making this claim. The counterexamples you bring up all have to do with Drop, which simply does not apply to reference arguments like Future's Context. And when a generator argument does implement Drop, matching sync code is good and a large part of why this design makes sense. If size is a problem here the author can simply drop the argument manually, just like they do in sync code.

@Matthias247

This comment has been minimized.

Copy link
Contributor

Matthias247 commented Oct 13, 2019

Thanks for the clarifications. I wasn't aware of this kind of borrowing, and that it's intended to configure the borrowing mode via explicit type annotations.

I think more complete examples how that would look like could be helpful. I still can't fully follow where those things should go to (e.g. the example of @SimonSapin had a MyGenerator struct, but I imagined the actual struct to be compiler-generated).

@Nemo157

This comment has been minimized.

Copy link
Contributor

Nemo157 commented Oct 13, 2019

More accurately these signatures wouldn't be on the resume function as that's specified by the trait, but instead on the impl; impl<'a> Generator<(&'a str,)> for Self<'a> and impl<'a, 'b> Generator<(&'b str,)> for Self<'a>.

Problematically the former is saying that the argument must live for the same lifetime as the generator, there is no way I know of to say "the argument must be valid from now until the generator is dropped". For example this fails even though it could be valid, there's just (AFAIK) no way to write a set of signatures that satisfies the borrow checker.

This is all very straightforward from the caller side (it's identical to how the Fn* traits are used today), and works the same with all suggested syntaxes as they use the same trait definition. One issue with all the syntaxes is they don't have any way to actually indicate these lifetimes, if we start with the running example:

let gen = |_: &'static str| {
    let (name,) = yield "hello";
    let (_,) = yield name;
};

this is easily inferring the type impl Generator<(&'static str,), Yield = &'static str, Return = ()>.

We want to also be able to write a generator matching impl for<'a> Generator<(&'a str,), Yield = &'a str, Return = ()> (i.e. a generator that does not capture its input value) along with for<'a> impl Generator<(&'a str,), Yield = &'a str, Return = ()> + 'a (i.e. a generator that captures its input value, and so cannot outlive it). The syntax with an implicit lifetime could easily refer to either of these:

let gen = |_: &str| {
    let (name,) = yield "hello";
    let (_,) = yield name;
};

I think this could be valid under either interpretation, depending on whether you can propagate a lifetime between the previous yield-return and the current yield-argument. But, trying to store a value across a yield:

let gen = |name: &str| {
    let (_,) = yield "hello";
    let (_,) = yield name;
};

This could only be valid under the second interpretation, where implicit lifetimes are captured into the generator, rather than being only valid for the single resume call.

@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Oct 14, 2019

For multiple different yield poins you'd have to add multiple lifetime arguments

Why?

For example this fails even though it could be valid, there's just (AFAIK) no way to write a set of signatures that satisfies the borrow checker.

To me it’s obvious that this example could not be valid for exactly the reason given by rustc’s error message. data is dropped at the end of foo, but still borrowed by gen.

@rpjohnst

This comment has been minimized.

Copy link

rpjohnst commented Oct 14, 2019

To me it’s obvious that this example could not be valid for exactly the reason given by rustc’s error message. data is dropped at the end of foo, but still borrowed by gen.

But gen is also dropped at the end of foo. If you remove foo from the program, the exact same construct works in the body of main. The problem is that, as a parameter to foo, there is no lifetime you can give it to make it behave like a local binding.

@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Oct 14, 2019

Hmm I see. This seems to be a general issue with language expressivity of lifetimes in function signatures, not at all related to generators. Here is the same with Vec: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=262406bc79c51a44aeda0d90b27b9b0e

I don’t think this means that generators should not be allowed to borrow from their resume arguments.

@spunit262

This comment has been minimized.

Copy link

spunit262 commented Oct 15, 2019

Simply adding a rebinding changing the parameter into a local causes both example to compile, proving that is just a problem with lifetimes declared in function signatures.

@Nemo157

This comment has been minimized.

Copy link
Contributor

Nemo157 commented Oct 15, 2019

I don’t think this means that generators should not be allowed to borrow from their resume arguments.

Neither do I, the usecases I care about only need transient per-resume borrows (G: for<'a> Generator<&'a FOO>. It is just something that seems much more likely to be useful as part of a generator, and if it were somehow possible to specify at the user/trait interface it might affect the syntax for declaring argument lifetimes.

Simply adding a rebinding changing the parameter into a local causes both example to compile, proving that is just a problem with lifetimes declared in function signatures.

So it does. I have absolutely no idea why this works, I guess it's somehow related to Gen being (co/contra?)variant over the lifetime, but why does that only apply when it's rebound to a local variable instead of as an argument?

let gen = |name: &'static str| {
// name = "Not used"
yield "Hello"; // name is also dropped here
// name = "World

This comment has been minimized.

Copy link
@pcpthm

pcpthm Oct 17, 2019

Dropped / not dropped is not precise terms because name is Copy. Maybe change it to a non-copy type?

@newpavlov

This comment has been minimized.

Copy link
Contributor

newpavlov commented Oct 21, 2019

I also don't like how this proposal spends the weirdness budget and I prefer the @SimonSapin's approach.

But I would like to take a step back and ask the following question: do we really want to reuse the closure syntax for generators? Honestly I find it really strange and worrying that we have to look into a "closure" body to understand if we have a true closure or generator on our hands.

I would strongly prefer to have a dedicated syntax for generators which would allow to immediately see that we create a generator only by looking at a given "signature" and which possibly would allow to specify argument, yield and return types. Something like this (a very-very rough draft, feel free to propose better alternatives):

// `generator` here is a keyword, so we will have to wait for the next edition

// a simple generator, return and yield types are inferred
// if not provided explicitly, `ResumeArg` is always set to `()`
// Yield=u8, Return=u32, ResumeArg=()
let gen1 = generator {
    yield 1u8;
    42u32
};
// specifies all generator types explicitly
// ResumeArg=&str, Yield=u8, Return=u32
let gen2 = generator[&str -> u8] -> u32 { ... };
// specifies only some types
// Yield=u8, Return=u32, ResumeArg=()
let gen3 = generator[u8] -> u32 { ... };
// yield type is inferred, ResumeArg=()
let gen4 = generator -> u32 { ... };
// return type is inferred, ResumeArg=(), Yield=u8
let gen5 = generator[u8] { ... };
// return type is inferred, ResumeArg=&str, Yield=u8
let gen6 = generator[&str -> u8] { ... };
// return and yield types are inferred, ResumeArg=(u8, &str)
let gen7 = generator[(u8, &str) -> _ ] { ... };

We may even add something like generator fn:

generator fn foo(a: u8, b: u8) [&str -> u8] -> u32 {
    let s1 = yield 0;
    let s2 = match s1 {
        "ok" => yield a,
        _ => yield b,
    };
    42
}
let gen = foo(1, 2);
let y1 = gen.resume("foo");
let y2 = gen.resume("ok");
let r = gen.resume("bar");

But it's debatable whether there is a sufficient motivation for it.

@samsartor

This comment has been minimized.

Copy link
Contributor

samsartor commented Nov 12, 2019

I think I've found a good way to generalize coroutines in Rust that could be helpful here. Check out my post! I also included some comparison of the different proposals that people have made. I hope it can give us a common language to make this debate a little easier. For example,

generator fn foo(a: u8, b: u8) [&str -> u8] -> u32 {
    let s1 = yield 0;
    let s2 = match s1 {
        "ok" => yield a,
        _ => yield b,
    };
    42
}

is the same as

fn foo(a: u8, b: u8) -> impl FnPinMut(u8) -> GeneratorState<u32> {
    coroutine {
        let _ = ().yield; // first argument is lost
        let s1 = Yielded(0).yield;
        let s2 = match s1 {
            "ok" => Yielded(a).yield,
            _ => Yielded(b).yield,
        };
        Returned(42).yield;
        panic!()
    }
}

and

let gen = |name: &str| {
    let (_,) = yield "hello";
    let (_,) = yield name;
};

is the same as

let gen = coroutine {
    let (name,) = ().yield;
    let (_,) = Yielded("hello").yield;
    let (_,) = Yielded(name).yield;
    Returned(()).yield;
    panic!()
};

This can also describe odd kinds of coroutines that haven't come up yet. For example,

let future = ...;
let fused_future = coroutine {
    let ctx: &mut Context = ().yield;
    let value = await!(future, ctx);
    yield Ready(value);
    loop { yield Pending; }
};

or

let sum_every_other: impl FnPinMut(u32) -> u32 = coroutine {
    ().yield;
    let mut sum = 0;
    loop {
        sum += sum.yield;
        sum.yield;
    }
};
@samsartor

This comment has been minimized.

Copy link
Contributor

samsartor commented Nov 13, 2019

After getting a bunch of feedback on by post, I realized we can extend the magic mutation syntax to be completely general. We don't go as far as requiring "no return for generators" but we do require return and yield to take the same type. return x can then be desugared into yield x; unreachable!(). The user has to decide how to signal return themselves but I think giving the user that choice will have a lot of benefits:

  • a generator is just a pinned closure
  • generators can act like iterators by returning None
  • generators can act like futures by returning Poll::Ready
  • generators which use ? can signal return with Result::Err
  • generators can use .await so long as they take &mut Context and return Poll
  • an async { ... } block can desugar as |&mut Context| Poll::Ready(...)

The other option is to make every generator loop so that return desugars as yield x; continue. I think this is too weird to take seriously but it does have the unusual benefit that closure syntax and generator syntax become one and the same so long as pinning is only required once the generator creates an internal reference. That is, something like |x| x + 1 has the same exact meaning whether interpreted as a closure or a generator.

@bwo

This comment has been minimized.

Copy link

bwo commented Nov 13, 2019

but we do require return and yield to take the same type.

I think there's a lot of value in allowing a generator to return a different type than it yields. This seems to be a very common pattern in Haskell streaming libraries, for leftovers, eg.

@Lokathor

This comment has been minimized.

Copy link

Lokathor commented Nov 13, 2019

that's just an enum situation it seems like.

@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Nov 13, 2019

GeneratorState is not any other enum. When GeneratorState::Complete is returned, this signals at the language level that it is an error to call resume again.

@samsartor

This comment has been minimized.

Copy link
Contributor

samsartor commented Nov 13, 2019

@SimonSapin The same could be said of Option in the context of iterators or Poll in the context of futures. Result gets used that way for things like streaming parsers. I could conceivably create a fuse implementation for generators that produce GeneratorState just as easily as for iterators or futures and it could be just as useful.

@CAD97

This comment has been minimized.

Copy link

CAD97 commented Nov 13, 2019

In other words, GeneratorState::Complete signals at the library level that it is a logical error to poll the generator again, just like Option::None does for Iterator. Nothing in the language prevents you from polling again, or even providing an implementation of the trait that promises behavior after that yield (such as fuse).

That library convention is baked into the language via desugaring of await/for, but it's still "just" a trait around a function that returns an enum.

@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Nov 13, 2019

The same could be said of Option in the context of iterators

Yes, exactly. And it’s important that Iterator::next returns Option<Self::Item>, not an arbitrary Self::Item type.

@samsartor

This comment has been minimized.

Copy link
Contributor

samsartor commented Nov 13, 2019

Yah, that's because iterators are really high level. They have a semantic meaning of their own. I'm saying that people don't care if something is a "generator". They care if something is an iterator, stream, sink, future, reader, writer, etc. A generator is only valuable as a way of implementing one of those. In short, we have an opportunity here to make generators a lower level syntax which is useful for building up higher level objects! It makes me really really excited! Why create yet another convention when we can use all the ones that already exist?

@rpjohnst

This comment has been minimized.

Copy link

rpjohnst commented Nov 13, 2019

Why create yet another convention when we can use all the ones that already exist?

Because people need to be able to compose them. This is the reason async/await is still a distinct language feature from generators- for example, you should eventually be able to write a single function that is both async and an iterator, or in other words a stream.

The same logic applies to the iterator and closure traits- they may have the same "shape," but they are used for different purposes and can be composed, so keeping them separate is valuable not only for readability but for expressiveness.

@CAD97

This comment has been minimized.

Copy link

CAD97 commented Nov 13, 2019

It's important to detach the "coroutine" and "generator" ideals.

"coroutine" is the "FnPin", where you yield some item on every poll.

"generator" is "just" a "coroutine that returns GeneratorState"; it yields an item on each poll then returns some final value; continuing after that is a logical error.

"async" is "just" a "coroutine taking an async context that yields blockers and returns an item".

So the generalized approach @samsartor is arguing for is a coroutine, but it would still behoove the language to have an idea of and integrate with generators even while they're built on top of stackless coroutines.

Approximately,

trait FnPinMut<Args> { // generalized stackless coroutine
    type Output;
    extern "rust-call" fn call_pin(self: Pin<&mut Self>, args: Args) -> Self::Output;
}

type Generator<Args, Yield, Return> = FnPinMut<Args, Output=GeneratorState<Yield, Return>>;
type Future<Output> = FnPinMut<&mut Context, Output=Poll<Self::Output>>;
type AsyncGenerator<Yield, Return> = FnPinMut<&mut Context, Output=Poll<GeneratorState<Yield, Return>>>; // ignore resume args for simplicity
}

Once we agree on terminology, then we can have a productive conversation. It's not exactly useful when one party wants to build the generalized approach and another wants the more specialized, while thinking they're discussing the same thing.

@samsartor

This comment has been minimized.

Copy link
Contributor

samsartor commented Nov 13, 2019

It isn't that I disagree with the Generator trait in principle. Its just that I think Rust would benefit from a common way of building coroutines. If people later want a less explicit way of building generators specifically, I'm cool with that. However, @CAD97 is right that serving "I want an escape hatch to make more complex iterators/closures/futures/streams." vs serving "I want a generator syntax specifically." is really a separate debate. This proposal is just so close to serving the first that I figured I might as well suggest so.

@kpp

This comment has been minimized.

Copy link

kpp commented Nov 13, 2019

you should eventually be able to write a single function that is both async and an iterator, or in other words a stream.

https://boats.gitlab.io/blog/post/2018-01-25-async-i-self-referential-structs/:

    Iterators: A generator which yields T and returns () can be treated as an Iterator of T.
    Futures: A generator which yields () and returns Result<T, E> can be treated as a Future of T and E.

Combining with: an async { ... } block can desugar as |&mut Context| Poll::Ready(...)

You get:

  • Iterator<T>: A generator which yields T and returns ()
  • Future<T>: A generator which takes &mut Context yields () and returns T
  • Stream<T>: A generator which takes &mut Context yields T and returns ()
  • Future<Iterator<T>>: A generator which takes &mut Context yields () and returns a generator-Iterator
  • Iterator<Future<T>>: A generator which yields a generator-Future and returns ()
  • ...
@samsartor

This comment has been minimized.

Copy link
Contributor

samsartor commented Nov 13, 2019

So the question is:

Do we add some generator creation syntax plus utilities to convert between generators and iterator, future, stream, closure, etc. or do we add some coroutine creation syntax which can produce any of those types (including generators) directly? I see pros and cons to each but neither totally eliminates any possibilities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.