Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unified coroutines a.k.a. Generator resume arguments #2781

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

@semtexzv
Copy link

@semtexzv semtexzv commented Oct 10, 2019

This RFC outlines a way to unify existing implementation of the Generator feature with the Fn* family of traits and closures in general. Integrating these 2 concepts allows us to simplify the language, making the generators 'just pinned closures', and allows implementation of new patterns due to addtional functionality of generators accepting resume arguments.

Generator resume arguments are a sought after feature, since in their absence the implementation of async-await was forced to utilize thread-local storage, making it no_std.

Rendered

The RFC builds upon original coroutines eRFC

Main contention points:

  • Syntax of arguments changing between yields (explicit vs implicit) and the interaction with lifetimes.
  • Use of tuples and connection to closures, and the interaction with yield being an expression which resolves to arguments passed to resume.

Examples of new patterns designed with proposed design of the generator trait & Feature:

```
The example looks like we aren't assigning to the `name` binding, and therefore upon the second yield we should return the value which was passed into the first resume function, but the implementation of such behavior would be extremely complex and probably would not correspond to what user wanted to do in the first place. Another problem is that this design would require making `yield` an expression, which would remove the correspondence of `yield` statement with the `return` statement.

The design we propose, in which the generator arguments are mentioned only at the start of the generator most closely resembles what is hapenning. And the user can't make a mistake by not assigning to the argument bindings from the yield statement. Only drawback of this approach is, the 'magic'. Since the value of the `name` is magically changed after each `yield`. But we pose that this is very similar to a closure being 'magically' transformed into a generator if it contains a `yield` statement and as such is acceptable amount of 'magic' behavior for this feature.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another downside of this approach is that a generator can't "naturally" hold on to passed-in values from previous invocations- they have to move them into other bindings if they want them to live across suspension points.

The let (name,) = yield name; syntax is arguably more correct- you can shadow the argument name, but you can also name it something else, or even overwrite the existing binding. The key is that the yield expression evaluates to the passed-in values, as it does in most coroutine implementations. That is, this should be valid:

let gen = |foo: T, bar: U| {
    /* foo and bar are the arguments to the initial resume */
    let (baz, qux) = yield ..;
    /* foo and bar are *still* the arguments to the initial resume, but probably dead here */
    /* baz and qux are the arguments to the second resume */
    do_something(yield ..);
    /* foo, bar, baz, and qux are still the arguments to the initial and second resume respectively */
    /* do_something is passed a tuple of the arguments to the third resume */
};

The root of what makes this awkward is the fact that generators don't run any code until their first resumption, which means there is no yield expression for the first set of passed-in values. This also potentially causes problems with lifetimes (see below), so it may be worth trying to come up with alternative non-closure syntax to declare the argument types.

Copy link
Author

@semtexzv semtexzv Oct 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the pre-rfc I linked to tried to solve this by introducing different set of parameters for the Start and Resume, which would correspond to this syntactic choice.

Well, what is more correct behavior ? Take a generator, which has 10 yield points, each returing 3 values. In the approach in which the default behavior is to store passed arguments inside the generator, this generator has suddenly grown to contain 30 fields, even though user did not request this behavior. We might optimize them out, but conceptually, they are still stored inside the generator.

I believe not storing these values is more correct choice. This is the same choice, which is picked by FnMut closures. But the issue of lifetimes is still present.

But this issue is mostly present here, on the source code presentation.Wouldn't dropping the arguments before yielding( in MIR representation) be a natural choice ?

On the applicable issues, I lean to the 'What's the behavior of closures ?' question to determine aproprate answer.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already optimize out dead values in generators, that is not an issue. The problem is that your approach prevents the user from storing them even if they wanted to. (Without going out of their way to copy them somewhere else.)

Copy link

@bwo bwo Oct 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The root of what makes this awkward is the fact that generators don't run any code until their first resumption, which means there is no yield expression for the first set of passed-in values.

it also means that you need to pass in the same type of value with which the generator is resumed in order to start it—this approach seems to make impossible, or at least syntactically complicated, starting the generator with arguments other than the resumption type, including starting it with no arguments.

Copy link
Contributor

@tmandry tmandry Jan 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think forcing the user to move the values to other bindings might be a good thing. Saving a resume arg across yield points introduces a new runtime cost because that arg now has to be stored inside the generator object. (If it's not saved, it behaves like a function parameter to resume and only lives for the duration of that call.)

That said, I strongly prefer having an explicit assignment at each yield point, rather than an implicit one as this RFC proposes.


- Python & Lua coroutines - They can be resumed with arguments, with yield expression returning these values [usage](https://www.tutorialspoint.com/lua/lua_coroutines.htm).

These are interesting, since they both adopt a syntax, in which the yield expression returns values passed to resume. We think that this approach is right one for dynamic languages like Python or lua but the wrong one for Rust. The reason is, these languages are dynamically typed, and allow passing of multiple values into the coroutine. The design proposed here is static, and allows passing only a single argument into the coroutine, a tuple. The argument tuple is treated the same way as in the `Fn*` family of traits.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe dynamic typing really changes things here. It's true that in dynamic languages you can come up with elaborate protocols where each resume takes a different set of argument types, but that's rare and confusing- most use cases stick with a single type, just as we are enforcing in Rust generators.

Letting yield evaluate to the passed-in values, with the expression's type determined by the closure-syntax argument types, shouldn't cause any problems. Indeed, it simplifies things by introducing fewer new semantics (see above and below).

Copy link
Author

@semtexzv semtexzv Oct 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it does not introduce new cognitive load but introduces fallible syntax, and makes the default/shorter code the wrong one in many cases.

Copy link

@eaglgenes101 eaglgenes101 Oct 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have falliable syntax.

let x = {
    let mut i = 2;
    loop {
        // May not necessarily add to i
        i += if i > 16 {
            i/2
        } else {
            // Does not return a value to increment i by, 
            // instead breaking out of the loop and causing it
            // to evaluate to i
            break i 
        };
    }
};


- Do we unpack the coroutine arguments, unifying the behavior with closures, or do we force only a single argument and encourage the use of tuples ?

- Do we allow non `'static` coroutine arguments ? How would they interact with the lifetime of the generator, if the generator moved the values passed into `resume` into its local state ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do allow non-'static coroutine arguments, some use cases will want arguments that live only for a single call to resume. In this case, the syntactic approach proposed here doesn't work very well- the arguments would change lifetime (and thus type!) throughout the execution of the generator.

If instead the arguments are provided as the value of a yield expression, each one could have a separate set of lifetimes limited by the actual live range of the expression's result. This fits much more naturally with how lifetimes work in non-generator code, especially with NLL.

Copy link
Author

@semtexzv semtexzv Oct 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the primary issue with my approach.

The generators in this case assume lifetime issues are not resolved at the source code / AST level, but rather at the control-flow / MIR level.

This means that in this case:

1. |a| {
2.    loop {
3.       println!("a : {:?}", a);
4.       let (a,) = yield 1;
5.       println!("b : {:?}", a);
6.       let (a,) = yield 2;
7.    }        
8. }

The lifetime of a both starts at the line 6 and ends at the line 4, and starts at line 1, and ends at line 4 depending on the entry point. But, these are 2 different values of a, the behavior should be similar to the behavior which would be observed if we have written the generator by hand as a match statment, just like in the RFC.

fn take(a : &str, b : &str) {
    match a {
        "0" => {
            take(b)
        },
        "0" => {
            take(b)
        }
    }
}

In this case the b has 2 exit points, and therefore its lifetime is extended to cover both, if i'm not mistaken.

But yes, the flow of lifetimes backwards is weird. But I think you can't escape it with re-entrant generators.

But the issue is still present with the generators that store the values by default . In these generators, the lifetime of the arguments would necessariliy have to include the lifetime of the generator as a whole, since they could be stored inside it ?

I was under the assumption that the borrowck, is MIR based, not AST based, and therefore this approach would be usable. Or is the checking of lifetimes performed on the AST level ? Need more info from compiler team.

Copy link
Member

@Nemo157 Nemo157 Oct 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a requirement for one of the primary usecases: futures. They will need to take Args = (&mut core::task::Context<'_>,) which contains two lifetimes that are only valid during the current resume call.

Copy link

@RustyYato RustyYato left a comment

There were a few changes not related to the content of the RFC, but cleaned it up a bit

text/0000-unified_coroutines.md Outdated Show resolved Hide resolved
text/0000-unified_coroutines.md Outdated Show resolved Hide resolved
text/0000-unified_coroutines.md Outdated Show resolved Hide resolved
```
How the similar state machines are implemented today:
```rust
enum State { Empty, First, Second }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
enum State { Empty, First, Second }
enum State { First, Second }

You don't need Empty to move out of sate in the for match statement

Copy link
Author

@semtexzv semtexzv Oct 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is copied from existing state machines I found in the wild. I'd rather keep it that way.

enum Event { A, B }
fn machine(state: &mut State, event: Event) -> &'static str {
match (mem::replace(state, State::Empty), event) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to my last comment, you don't need to replace state here with State::Empty

@RustyYato
Copy link

@RustyYato RustyYato commented Oct 10, 2019

Another view of generators is that they are a more general form of Iterator. With this in mind, under future possibilities could we add combinators that can pass the output of one generator/iterator to he input of another generator? Basically baking the input arguments.

@RustyYato
Copy link

@RustyYato RustyYato commented Oct 10, 2019

@semtexzv
I was thinking something along the lines of, this

@Nemo157
Copy link
Member

@Nemo157 Nemo157 commented Oct 10, 2019

Similar to core::future::Future experimenting with combinators outside the standard library as part of futures I believe it would make sense to leave Generator combinators up to an external library to start (and by the time Generator might be approaching stabilisation we'll hopefully have some examples/experience with moving combinators into the standard library with Future).

text/0000-unified_coroutines.md Outdated Show resolved Hide resolved
text/0000-unified_coroutines.md Outdated Show resolved Hide resolved
@SimonSapin
Copy link
Contributor

@SimonSapin SimonSapin commented Oct 11, 2019

I feel that the RFC as currently written spends a lot of “weirdness budget” in a way that is not necessary. In “normal” closures, the bindings introduced for arguments (between the two | pipes) exist from the very start of the closure’s body, which matches their location in source code. This RFC proposes the same location for introducing names that are not available until the first yield, and then can change value at each yield. (This leads to questions like what if the previous value was borrowed?)

How about this alternative?

  • Like in the current RFC, the Generator trait gains a new type parameter. Except I’ll call it ResumeArg (singular)

  • ResumeArg can be any type, not only a tuple. (Although it might default to (), and using a tuple would be an idiomatic way to pass multiple values.)

  • yield is an expression of type ResumeArg. That’s it. It is not concerned at all with name bindings. You can use that expression in a let $pattern = $expr;, but that’s not necessary. It could be passed as a function argument, or anything that accepts an expression.

  • There is no way to specify the ResumeArg type as part of generator literal syntax. (Nothing is allowed between the pipes when yield is used.) This RFC and its proposed alternatives seem to try hard to find such a way, but that may not be necessary: it can be a type variable left for inference to resolve (based both on what the generator’s body does with its yield expression, and how the generator is used).

    There is nearby precedent for this: generator literals don’t have syntax to specify the Generator::Yield associated type either. And the -> $type syntax that specifies Generator::Return is optional.

text/0000-unified_coroutines.md Outdated Show resolved Hide resolved
@semtexzv
Copy link
Author

@semtexzv semtexzv commented Oct 11, 2019

@SimonSapin You raise an important point. But, like closures, the arguments are available from the start of the generator, since generator is created suspended at the start, like closures, and the first resume starts the generator from the start. The only difference compared to FnMut is that generator has multiple return points, and before each return, it stores its state.

Instead of a generator as it stands today, think about a pinned FnMut closure.

@SimonSapin
Copy link
Contributor

@SimonSapin SimonSapin commented Oct 11, 2019

Hmm I may have been mislead by this part of the RFC:

Notice that the argument to first resume call was unused,

I took this to mean that the proposed language semantics are that the argument to the first resume call is always dropped, because that call doesn’t have a corresponding yield expression. But maybe instead you only meant that this particular example generator does nothing with the argument of the first resume call?

If we consider that this first value passed to resume doesn’t need to be dropped, then indeed the “closure argument” syntax would be the appropriate way to make them available to the generator’s body.

So I think preference is closest to what is currently listed as alternative # 2. I’ll quote it below and comment inline:

  1. Creating a new binding upon each yield

I’d still phrase this as: yield is an expression, whose value may or may not be bound to a name with let.

let gen = |name :&'static str| {

    let (name, ) = yield "hello";
    let (name, ) = yield name;
}

We are creating a new binding upon each yield point, and therefore are shadowing earlier bindings.

Yes about shadowing. But again, creating a binding doesn’t have to be mandatory.

This would mean that by default, the generator stores all of the arguments passed into it through the resumes.

Not necessarily, even with a binding. The value won’t be stored in the generator if it has been moved or dropped before the next yield. Like any other value manipulated on the stack of a generator.

Another issue with these approaches is, that they require progmmer, to write additional code to get the default behavior. In other words: What happens when user does not perform the required assignment ? Simply said, this code is permitted, but nonsensical:

let gen = |name: &static str| {
    yield "hello";
    let (name, ) = yield name;
}

This is not nonsensical at all. Like any $expr; statement, the first yield simply drops its resume value.

Another issue is, how does this work with loops ? What is the value assigned to third in following example ?

let gen = |a| {
    loop {
        println!("a : {:?}", a);
        let (a,) = yield 1;
        println!("b : {:?}", a);
        let (a,) = yield 2;
    }        
}
let first = gen.resume(("0"));
let sec = gen.resume(("1"));
let third = gen.resume(("2"));

third is the integer 1. But I think the intended question was: what line is printed third? The answer to that is a : "0". The let bindings shadowed a for the rest of the lexical scope (which is the rest of the loop). When the loop does its next iteration, control goes back out of the scope of shadowing a’s and the initial a is visible again. The same would happen with a similar loop outside of a generator.

@SimonSapin
Copy link
Contributor

@SimonSapin SimonSapin commented Oct 11, 2019

In short, I feel strongly that yield should be a stand-alone expression, not necessarily tied to name bindings.


Separately from the above, I feel there is a choice between:

  • (As in the current RFC), the value passed to resume has to be a tuple. For the first resume call, this maps very nicely to closure argument syntax with any number of argument. However for subsequent calls, in the case of a 1-tuple we’d need somewhat-awkward unpacking like let (name,) = yield; foo(name).

  • Or, the value passed to resume can be any type, and yield expressions have that type. let name = yield; foo(name) or even foo(yield) looks much nicer without tuple unpacking. However this forces the “closure argument” syntax to only have one argument. || {…} (zero argument) could be syntactic sugar for |_: ()| {…} (one argument of type unit-tuple). (Only for generators of course, not normal closures.) It’s for the initial bindings of multiple values as one tuple argument that this becomes awkward: |(foo, bar): (u32, &str)| {…} instead of |foo: u32, bar: &str| {…}

  • Trying to reconcile both nice-to-haves creates inconsistencies or discontinuities that are undesirable IMO. For example, does |foo: u32| { yield } implement Generator<u32> or Generator<(u32,)>? Why?

@semtexzv
Copy link
Author

@semtexzv semtexzv commented Oct 11, 2019

Well, the argument HAS to be a tuple, and it HAS to be unpacked inside the argument list of the generator, since it is the behavior of closures, and deviating from this behavior would most certainly be a mistake. I again, point out the Fn* family of traits.

As for the third point, the argument would most certainly have to be a tuple. (Interaction of Fn traits and closures).

But yes, yield being an expression is one approach. The issue I have with it, is that it does not provide unified way to accept arguments upon the generator start, and it's resume. You have argument list at the start, and then a tuple at the resume. If we had Tuple unpacking/packing, we could resolve this, and I think that solution would be one of the best.

But there is something to be said about the default choice. Is it a good default to drop the values passed into resume ?

I accept syntactic inconvenience, but I think, the deviation from concepts introduced in closures is a huge mistage.

@Nemo157
Copy link
Member

@Nemo157 Nemo157 commented Oct 11, 2019

Not necessarily, even with a binding. The value won’t be stored in the generator if it has been moved or dropped before the next yield. Like any other value manipulated on the stack of a generator.

This would imply you need to scope every yield call (even moving the value out is not enough, see rust-lang/rust#57478), generators are like normal code and drop their values at the end of scope. (Although there are optimizations applied to drop values early if that is not observable).

@SimonSapin
Copy link
Contributor

@SimonSapin SimonSapin commented Oct 11, 2019

@semtexzv

HAS to be

Yes, the goal of making generators and closures as close to each other as possible leads to arguments being a tuple. But I’m personally not very attached to that goal in the first place.

Closures have a whole family of traits with Fn, FnMut, and FnOnce. They are useful for reasons that doesn’t apply to generators. Fn taking &self doesn’t work for generators since resume always mutates (at least to track initial state v.s. each yield point v.s. returned). FnOnce defeats the point of having a generator in the first place.

@Nemo157 rust-lang/rust#57478 is an implementation bug to be fixed, right?

@Nemo157
Copy link
Member

@Nemo157 Nemo157 commented Oct 11, 2019

Even if rust-lang/rust#57478 is fixed I would expect there to be a lot of generators that just shadow their existing bindings without moving out of them, having those all be kept alive to be dropped at the end would not be good, e.g.

|arg: String| {
    let arg = yield;
    let arg = yield;
    let arg = yield;
}

would have to keep all 4 strings live until dropped when the generator completes.

@cynecx
Copy link

@cynecx cynecx commented Feb 5, 2020

JFYI: rust-lang/rust#68524

@tema3210
Copy link

@tema3210 tema3210 commented Feb 27, 2020

Also, this allow efficient and natural actor system, where actor receiving message is just resume with the value, and send operation is yield certain variant of some enum. Combined with macroses this allow us to have easy to use actor system, which is also parallel and efficient due to being coroutines, not threads.
P.S. Smalltalk DSL?

@eddyb
Copy link
Member

@eddyb eddyb commented Mar 4, 2020

I'm surprised I'm not seeing any mention of Generator<Yield = !>, because that should be pretty close to a closure, including the fact that GeneratorState<!, R> should be the same size as R.

Well, a FnPinMut closure, which doesn't really exist today.

@tema3210
Copy link

@tema3210 tema3210 commented Mar 6, 2020

I'm surprised I'm not seeing any mention of Generator<Yield = !>, because that should be pretty close to a closure, including the fact that GeneratorState<!, R> should be the same size as R.

Well, a FnPinMut closure, which doesn't really exist today.

Generator<Yield=!> is a closure in terms of generators. In terms of proposed FnPinMut, closure is coroutine with one resume point and one yield point, it have dedicated storage for captured values and thus it separated via object boundary. fn(RT)->YT is the same except absence of object boundary ( fn doesn't capture anything from non-'static enviroment and thus don't need to be instanciated before use)
Now, we really have opportunity to generalize fn-family with generators (and all their kinds)

println!("{:?}", gen.resume(("Not used")));
println!("{:?}", gen.resume(("World")));
println!("{:?}", gen.resume(("Done")));
Copy link
Member

@kennytm kennytm Mar 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(x) is not a tuple. (x,) is.

Suggested change
println!("{:?}", gen.resume(("Not used")));
println!("{:?}", gen.resume(("World")));
println!("{:?}", gen.resume(("Done")));
println!("{:?}", gen.resume(("Not used",)));
println!("{:?}", gen.resume(("World",)));
println!("{:?}", gen.resume(("Done",)));

@tema3210
Copy link

@tema3210 tema3210 commented May 3, 2020

We have also question: how does the ? operator work inside of coroutine?

@pitaj
Copy link

@pitaj pitaj commented Jun 2, 2020

I'd like to help the discussion along by summarizing the syntax discussion. I won't really address anything surrounding the implementation with trait definitions and whatnot. It seems the main issues of contention for syntax are the following:

0. Coroutine vs Generator

For some, both terms refer to the same syntactical and behavioral concept.. For others, coroutine refers to a generic behavior, and generator refers to the syntax allowing for easy definition of a specific kind of coroutine.

The original RFC (2033) refers to them interchangeably. I will use the term "generators" to refer to the syntax and associated behavior that we are discussing, as that seems to be the least confusing.

1. Just a Closure with yield?

The original RFC introduced closures-with-yield as the syntax for defining generators. Some think that perhaps using the same syntax is a mistake and causes too many other issues due to limitations of closure syntax. Others ask if any given closure is also a valid generators, or if that only applies to closures without arguments, or not at all. Would it not be better to have a different syntax entirely if generator syntax would just look similar but behave differently from normal closures?

My Thoughts

I think that the re-use of "just" closure syntax and the behavior we want out of generators are impossible to reconcile. Especially if we want to be able to specify types within generator declarations. Issues with using "just closure" syntax:

  • no way to force an arbitrary closure-like declaration to be a generator
  • no way to fully specify input and output types at declaration
  • confusing difference in behavior between normal closures and generators
  • magic of inferring a generator when yield is present

I like the fully-qualified syntax |let (a, b): (A, B) = yield<Y>| -> R, with |let (a, b) = yield| as the shorthand with type inference, and just yield to ignore resume args (at least for the first resume).

  • (A, B) example tuple return argument type
  • Y yield output type
  • R return output type

Unlike other previous suggestions, it is unambiguous from <variable> <logical or> <code block>. It also echos later yield usage, making it very clear that yield is bound to those variables at the beginning of the scope.

2. Resume Types, Yield Types, Resume Types

Should we use the |...| field to specify the resume argument type? Should we use the return type field -> T to specify the yield output type? Should there be separate yield and return types, or should they be the same? Or should there be no return type at all ()? Some think that these types shouldn't even be specifiable, and should always rely on type inference.

My Thoughts

There are use cases for separate return and yield types, and this already exists in nightly. I don't think it makes much sense to artificially limit this capability. I do think there absolutely needs to be some way of manually specifying all of these types, which the current syntax does not provide.

I don't see why resume types should necessarily have to be tuples, either. There's no contractual requirement in the Fn* traits (that I can see) that requires that Args be a tuple.

3. Binding of Resume Arguments

Seems like the main question here is whether to allow arbitrary binding of the yield arguments, or whether to always require that they use the same names and are cannot be carried across yield points without explicitly doing so.

My Thoughts

I'm in agreement with @SimonSapin and yields should act like a normal expression which can be destructured, matched, etc any way people see fit. This seems to make the most sense and looks the best to me.

@pitaj
Copy link

@pitaj pitaj commented Jun 4, 2020

Another thing to consider is the existing implementation, which works much like @SimonSapin has proposed:

let mut g = |mut one_two: &str| -> &str {
    println!("{}", one_two);
    one_two = yield 0; // rebind a mut variable / argument
    println!("{}", one_two);
    let three = yield a + b; // bind to a new variable
    three
};

println!("first resume: {:?}", Pin::new(&mut g).resume("one"));
println!("second resume: {:?}", Pin::new(&mut g).resume("two"));
println!("third resume: {:?}", Pin::new(&mut g).resume("three"));

Given this syntax will work going forward (not necessarily true), that resolves the issues of defining types for resume args, binding resume args, and having separate types for yield and return. I think the idea of having only a single argument works well in this case because we can only pass a single value into or out of the generator at a time. It avoids issues of always needing to destructure a tuple and converting a list of arguments into a tuple.

Remaining Issue: Do we provide a way to define the yield value type in the generator definition syntax? If so, how?

I think we must provide a way to do so. Some options:

  1. |arg: ResumeType = yield YieldType| -> ReturnType {
  2. |yield YieldType -> arg: ResumeType| -> ReturnType {
  3. |arg: ResumeType| -> ReturnType yield YieldType {
  4. |arg: ResumeType| yield YieldType -> ReturnType {
  5. yield YieldType |arg: ResumeType| -> ReturnType {

Also consider variations wrapping YieldType or yield YieldType with <> or using the word yields instead of yield. My favorite option is the first, as it mirrors the actual assignment syntax of resume args and uses the already-reserved keyword yield. It also doesn't interrupt the return type area, and can be easily left out without causing much confusion when one wants to allow for type inference.

Any of these would solve the following two issues:

  • ability to force an arbitrary closure-like declaration to be a generator
  • ability to fully specify input and output types at declaration

And could solve the following two as well if we make the yield part mandatory to define a generator:

  • confusing difference in behavior between normal closures and generator
  • magic of inferring a generator when yield is present

@samsartor
Copy link
Contributor

@samsartor samsartor commented Jun 5, 2020

I've burned a lot of brain cells working on corountine craziness since last November. Both on my own and with @CAD97 & @pcpthm in a draft RFC. I'd like to say I have something concrete to show for it but, thanks to my own distractability and lack of time, it has mostly resulted in a few out-of-date rustc branches, a massive WIP blog post, etc. But I do think I have a good response to @pitaj's argument that "that the re-use of 'just' closure syntax and the behavior we want out of generators are impossible to reconcile".

Those things can totally be reconciled! I can even explain such a proposal in two sentences:

The statement yield x; is allowed in closures. It behaves the same as return x; except the next time the closure is called, execution resumes immediately following the statement instead of at the top of the block.

To get "yield closures" in Rust you don't need any fancy Generator trait, wild type declaration syntax, or even a new syntax to distinguish between the yield/return types. All you need is a new FnPin trait for the occasional closure that requires address stability because of live borrows across yield statements (plus the MIR transform and some type checking shenanigans). In fact, there is no need even for the "magic of inferring a generator when yield is present". yield could be simply a feature that mutable closures have. There would be no effect to adding, for example, if false { yield unreachable!(); } at the beginning of every existing closure.

I really really want to find time to evangelize this more. Talk about how it makes "magic mutation" kinda obvious and very sane, how it makes async streams practically trivial, how you might integrate await into the mix to do coroutine delegation, how it makes implementing new Future combinators really easy, how much rustc could be simplified by truly unifying "generators" and "closures", even about the remaining challenges with those semantics and how I think I've solved them. But me shouting "hey everyone! I love how yield could be totally ordinary" isn't really the point.

The point is that any kind of new generator syntax (now available anywhere Rust is sold!) makes a lot of sense as a sugar. @pitaj's proposed fully qualified syntax could be implemented with something like:

macro_rules! gyield {
    ($x:expr)  => { {
        yield GeneratorState::Yielded($x);
        __coarg
    } }
};

macro_rules! generator {
    (|let $ap:pat: $at:ty = yield<$yt:ty>| -> $rt:ty $x:expr)  => { |__coarg: $at| {
        let $ap = __coarg;
        yield GeneratorState::Completed($x);
        loop { yield panic!("generator resumed after completion") }
    } }
};

And there are cases where that sugar is quite useful! Having your different return/yield types automatically packed into an enum is pretty darn ergonomic. People want GeneratorState-wrapping for the same reason they want Ok-wrapping. But corountines don't really need to be their own enormous feature with all these kinds of syntatic considerations. Unification is possible!

@pitaj
Copy link

@pitaj pitaj commented Jun 5, 2020

@samsartor I don't have any problem with the idea of closures and generators actually being the same thing. That sounds like a novel and interesting idea, and one that would reconcile them.

What I was saying isn't reconcilable is having completely different behavior depending on whether a closure has yield or not. And it seems we agree on that point.

However I don't see how you could have different yield and return types without also having a new syntax for specifying the yield output type. (Or you could just force everyone to use type inference for it, which I don't agree with).

That's a minor issue and one easily solved. In fact, I'd say it makes sense that if the initial arguments into a generator are the same as the resume args, then it makes sense that the yield output type would be the same as the return type.

I like your idea though. Essentially the idea is that any given closure is not necessarily complete when it returns a value. The result of calling it the next time is dependent on whether that return value was provided by a yield or by a return (implicit or explicit).

Them you can reuse .call(...) for generators and use the same x(args...) syntax for generators as well.

This could also prevent some overhead when used for async functions, since it can return the future variants directly without needing wrapped in a generator variant.

It's more flexible, re-uses existing syntax, and is implemented as a fairly easy to understand extension to existing behavior.

Edit: another thing to consider is what happens if the generator is exhausted. Does the next call restart at the beginning, or does it panic? I think it would be most consistent to restart at the beginning, since just calling any other type of closure won't panic AFAIK.

@Nemo157
Copy link
Member

@Nemo157 Nemo157 commented Jun 23, 2020

I finally got round to revisiting my attempt at working out how to shim an argument accepting generator into a Sink now that we have resume arguments in nightly, though it's still blocked on rust-lang/rust#68923.

One thing I have realised is that the "yield returns value" approach makes this macro-rules attempt impossible, since because of the changing lifetime on each yield they would need new bindings to store into. It also means that fixing rust#68923 wouldn't allow the async/await transform to be a trivial safe proc-macro level transform. Even a simple case like this I can't work out a possible expansion without needing to involve the existing lifetime erasure (which should be just to workaround rust#68923)

async {
  loop {
    foo().await;
  }
}

The "single binding updated by yield" approach mentioned in this RFC makes a macro-rules attempt of the core Future async/await syntax possible (though still not nice) and a proc-macro transform trivial.

@tema3210
Copy link

@tema3210 tema3210 commented Aug 28, 2020

However I don't see how you could have different yield and return types without also having a new syntax for specifying the yield output type. (Or you could just force everyone to use type inference for it, which I don't agree with).

I find it quite similar to ... -> Result<T,E> vs ... -> T throws E. Your proposed syntax reminded me checked exceptions syntax. We could do just better naming for GeneratorState<Y,R> type, like Yield<Y,R>; we can note absence of yield or return types by doing Yield<Y,!> and Yield<!,R>, the last coroutine output type is actually a plain return without a yield.

Also, interaction with ?, error propagation operator, is totally unclear, where we want the error type to go, to return, yield, or entirely different branch? We could look for type which implements Try trait, but if both Yield and Return types implement Try? This is area of effect systems.

@tema3210
Copy link

@tema3210 tema3210 commented Sep 1, 2020

I have a bunch of ideas, but they involve more than generators feature.
So, I propose following:

  1. Rename GeneratorState<Y,R> to Yield<Y,R> to align well with Result<T,E> and reduce typing overhead.
  2. Add trait to allow piping iterator output to coroutine input, aka resume arguments. Propose name Fuser.
  3. ? operator in coroutines: Make it always do termination. For possibility of propagating error to yield type, there two problems: 1) Which resume point will be selected by next resume call after such propagation? 2) How maintain correctness considering 1) ? 3) Even if we go with magic mutation approach, and resume directly on the propagation site, there no way to recover after propagation, because it would require correct value to recover error, which we can't provide in general case.
  4. Add dedicated Stream and IntoStream traits. This is because Stream is not Iterator<Item=impl Future> nor Future<Output=impl Iterator>. Symmetrically to Iter and IntoIter this should be consumed by for-loops.
  5. Add second form of for loop, async one; Precise syntax is for x.await in stream {...} and for x.await? in y.stream() {...}: first one simply process the stream, second additionally propagates any error up to the caller.
  6. A bit off topic, but we could do for x? in iter in sync, if iter's Item is Try.

For working with non-terminating errors there try_block feature, and lot of other stuff. The reason for "Fuser" trait is that we might want to iterate over coroutines, at least partially, here we have problem with leftovers: what if data which was streamed into some coroutine was not sufficient for it to complete? In this case we want a way to get the coroutine back for further processing and get its return value when it's ready.

@RustyYato
Copy link

@RustyYato RustyYato commented Dec 21, 2020

A bit off topic, but we could do for x? in iter in sync, if iter's Item is Try.

This is unnecessary, you can do that propagation in the loop body. Similarly with .await?, it's not needed, do it in the body.

@rkjnsn
Copy link
Contributor

@rkjnsn rkjnsn commented Feb 15, 2021

I came across this RFC today, and, after reading through the discussion, the thing I feel most strongly about is that simply introducing yield shouldn't magically change the return type or behavior from the callers perspective, and we shouldn't try to cram additional types into the current closure syntax. I do think there are reasonable solutions on either side of that, though.

Option 1: Closure-like, creates regular FnMut or FnPinMut:

On the near side, I liked @samsartor's suggestion in #2781 (comment) of making yield a small extension to the existing closure syntax: you still get an FnMut (or an FnPinMut where necessary), and the only difference between yield and return is where execution resumes on subsequent calls (after the yield versus back at the top). In this case, yield is just making it nicer to do something you can already do with FnMut. One appealing thing about this approach is because it's a relatively small addition, several questions that have come up have fairly obvious answers.

  1. Should generators return values of a special type (GeneratorState)? Can yield and return provide different types?

    No. yield and return must both provide an expression matching the closure's return type, which is the same type the caller receives when invoking it. However, this can be an enumeration (Yield<Y, R>, Option<T> for an iterator-like coroutine, et cetera), with yield and return providing different variants.

  2. Should starting a coroutine be different from resuming it, with different parameters?

    No. Both the initial and subsequent invocations use the same call operation and the same parameters (but potentially different values), just like invoking an FnMut multiple times, today.

  3. What about inputs that need to be provided once up front and held by the coroutine?

    They should be captured by the closure, rather than passed as arguments.

  4. What are the lifetime requirements for values passed in when resuming?

    Exactly the same as calling a standard closure with the same signature. This mean if the coroutine wants to store arguments across yield points, it needs to specify an appropriately long lifetime for the respective parameters.

  5. How does a coroutine receive the arguments passed when called after a yield?

    This one could probably still be debated, but given the mental model of this syntax, and with the answer to (4) above, having the closure parameters reflect arguments passed to the current invocation makes the most sense, regardless of whether execution is starting from the top or after a yield. I admit this can be a little weird, in some cases:

    let coroutine = |x: i32| {
      dbg!(x);  // 1
      {
        let x = x;  // Copy and shadow x
        yield;
        dbg!(x);  // 1 (prints shadowing copy)
      }
      dbg!(x);  // 2 (parameter no longer shadowed, has value from current call)
    }
    coroutine(1);
    coroutine(2);

    However, given the suggested mental model, I don't think it's that weird, and actually makes sense once you understand what's going on. I also think, given (4), the alternative (yield returning the new arguments) could lead to frustrating borrow check errors if the parameters contain lifetimes and the user, by storing the result in the obvious way, causes the arguments to remain live across a subsequent yield.

  6. Can you yield from an async closure? How do you use it to implement a Stream?

    This seems to be one of the less obvious questions to answer for any of the proposals (including this one) where every invocation of the coroutine (including the first) takes the same parameters. This is rather different than Futures and Streams, where the initial invocation serves more or less as a constructor, with the passed arguments getting stored for later, and subsequent calls to poll or poll_next don't provide any additional input to the function.

    One solution might be to create a ResumableFuture trait that gets automatically implemented for any async closures that can be invoked multiple times (i.e., closures that don't consume their captures) regardless of whether they use yield or not. (Another option might be to put a resume method directly on Future and have the associated ResumeArgs type be ! for FnOnce-style closures.) To turn this into a stream, ResumeArgs would have to be (), which means any needed state (e.g., a TcpSocket to listen on) would need to be captured by the closure.

    Aside: I originally thought it might be possible to handle async+yield by invoking the coroutine multiple times, which each invocation returning a different future. Unfortunately, this runs into two problems. First, the future generated by async takes ownership of the needed state, rather than borrowing the originating closure. As such, driving the future can't update state in the closure. Second, even if that were changed, there's nothing to stop the user from driving the future only half way to completion, dropping it, and calling the closure again, which could result in the coroutine state being inconsistent.

Option 2: async-like, generates impl Generator

On the far side, I like the idea of having a distinct keyword, analogous to today's async, and attaching any additional types to that. Of course, this leaves all of the other questions open for debate. If we go this route, I think we should try to be as similar to async as possible to reduce the novelty of the feature, leading to the following constraints (names of traits and types are placeholders):

  1. There should be a keyword (which I'll refer to as generator for now) that can be applied to functions, closures, and blocks.
  2. When a generator function or closure is invoked, it returns an impl Generator capturing the arguments and state. No code from the body is executed until the first resume call.
  3. A generator block immediately evaluates to an impl Generator.
  4. Calling resume takes a value of the associated ResumeArg type, and returns a GenerateState enum containing either Yield(YieldType) or a Return(ReturnType).
  5. It is an error to call resume again after receiving Return.

I think (2) is the trickiest of the requirements, and indeed I have no solution for it. The requirement is definitely necessary, as, like a Future, a Generator needs to be moveable until the first resume call. On the other hand, it's unclear what should happen to the value passed to the first call to resume. Dropping it on the floor definitely seems wrong, but providing some bespoke method to obtain it (unlike future resumes, which can just return it from yield) feels gross.

Setting that aside for the moment, I think such a syntax could look something like the following, partially inspired by @newpavlov's suggestion in #2781 (comment)):

// Free function. Initial call takes two &str args, takes an i32 for each resume,
// yields usize, and finally returns a String.
generator[i32 -> usize] fn(arg1: &str, arg2: &str) ->  String {
  // …
}

// Closure. 
// Yield type and return type can be omitted and inferred.
generator[i32] |arg1: &str, arg2: &str| {
  //…
}

// Async block. Immediately evaluates to an `impl Generator`.
generator[i32] {
  //…
}

My opinion

If we can find an elegant solution for passing the initial resume argument for option 2, I think I'd lean toward that option, as it would make both compiler state-machine transforms very similar, and it would be very easy to understand one after having learned the other.

However, if this turns out not to be possible, or even just requires consequential divergence from the way async functions work, then I very strongly favor option 1, as it basically just provides a nicer way to write something that FnMut can already do today, with FnMutPin being a very straightforward extension to allow closure state to contain internal references. While I admit having the arguments be different after a yield is a little weird at first glance, I find it makes sense after I think about what's actually going on. My closure has been called again with different arguments, it's just that I've opted (by using yield instead of return) for execution to jump down to after the yield instead of starting at the top. In any case, I find the cognitive load added by this weirdness to be significantly less than that of having a third construct that acts neither like a closure nor like async.

@samsartor
Copy link
Contributor

@samsartor samsartor commented Feb 15, 2021

@rkjnsn Wow, this is a really great breakdown of these two options, and tracks pretty well with my understanding of the subject!

The only correction I'd like to make is on "shadowing resume arguments" vs "reassigning resume arguments". Under MCP-49 (Option 1 above), the previous resume arguments are not shadowed. Instead they are completely dropped when entering a yield and the new arguments take their places when exiting it. This does prevent users from naively holding references to prior arguments, but the user <--> compiler feedback loop in that case looks fairly nice:

=> |x| {
    let y = &x;
    yield;
    dbg!(y, x);
}

error[E0506]: cannot assign `x` on yield because it is borrowed
 --> src/lib.rs:3:4
  |
2 |     let y = &x;
  |             -- borrow of `x` occurs here
3 |     yield;
  |     ^^^^^ assignment to borrowed `x` occurs here
4 |     dbg!(y, x);
  |          - borrow later used here
  |
  = help: consider moving `x` into a new binding before borrowing

=> |x| {
    let a = x;
    let y = &a;
    yield;
    dbg!(y, x);
}

As you mentioned, shadowing previous resume arguments can be quite confusing. But more importantly, it makes implementing poll functions needlessly difficult by introducing context reuse bugs in code like the following:

std::future::from_fn(|ctx| {
  if is_blocked() {
    register_waker(ctx);
    yield Pending;
  }

  while let Pending = task.poll(ctx) { .. }
})

In my experience, it is fairly rare that corountine authors want to reuse past arguments. And when they do, creating a new binding is a clear and obvious workaround. Hence, I'm pretty solidly in the reassignment camp.

Can you yield from an async closure?

In my mind, no. @CAD97, @pitaj, @pcpthm, and myself had quite a vigorous debate around this when drafting MCP-49 and basically came to the conclusion that there was no obvious way to accomplish it, at least for the time being.

I think the reasoning is simple: coroutines are all about giving the user control over per-resume input but Futures (and async blocks in general) already know what per-resume input they want, an std::task::Context. Trying to satisfy both the desires of the async keyword and the user simultaneously is a loosing battle. It sounds like you quite thoroughly explored the consequences of that!

How do you use it to implement a Stream?

Even without much syntactic sugar, it isn't too hard under MCP-49. A Stream::then implementation looks roughly like:

std::stream::from_fn(|ctx| {
  while let Some(item) = await_with!(inner.next(), ctx) {
    yield Ready(Some(await_with!(func(item), ctx)));
  }
  Ready(None)
})

Although we can all agree that a generator sugar makes this look nicer!

async gen {
  while let Some(item) = inner.next().await {
    yield func(item).await;
  }
}

So generators or yield closures?

Por que no los dos?

This is just my opinion, but I think these are really two completely different features with two different goals. Trying to pack both into one syntax just creates a mess that is worse for both cases.

Generators want only one thing: easily implement Iterator and Stream. Generators do not give a damn about resume arguments. Once you realize that, the syntax becomes pretty trival.

If you want to hear my thoughts on generalized coroutines, how they might support generators (as either a language feature or proc-macro crate), and what else they might have to offer (yield closures + iterator combinators = unlimited power), check out the design notes I wrote for the language team.

@rkjnsn
Copy link
Contributor

@rkjnsn rkjnsn commented Feb 15, 2021

The only correction I'd like to make is on "shadowing resume arguments" vs "reassigning resume arguments"

To clarify my point on shadowing, it was indeed my understanding with option 1 that the existing arguments would dropped on yield, with the parameters getting set to the new arguments on resume. Indeed, I think that's one of the key benefits of this approach over some other suggestions for generalized coroutines discussed in this thread. My point about shadowing was that if user explicitly shadowed a parameter (the let x = x; line in my example), then the parameter would still be shadowed upon resumption, even though the previous argument would have been dropped and a new one passed. Once execution leaves the scope with the shadowing binding, however, the parameter (now holding the new argument) would be visible, again.

I was merely pointing out that this could seem a little weird at first ("how is the parameter reassigned when it isn't even visible?"), but that (to me, at least) it made sense after thinking about it for a bit. So, basically I was agreeing with you. 🙂

Can you yield from an async closure? In my mind, no.

This seems reasonable. While I think you could probably get it to do something (basically, you'd have poll and resume as separate operations, and you'd have to poll to ready before you could resume), it wouldn't exactly be nice (it's basically two separate, commingled coroutines with a weird interface at that point), and the from_fn+await_with! seems much better. Especially considering…

So generators or yield closures? Por que no los dos?

Agreed. I was evaluating both options from a perspective of "assuming we want generalized coroutines, which syntax makes sense?"

Given that resume arguments are where async-like syntax falls short, and given that iterators and streams don't care about that functionality, I certainly have no objection to providing async-like syntax for those use cases, either as a stop gap before general coroutines land, or as a user-friendly facade over them for the common case.

@samsartor
Copy link
Contributor

@samsartor samsartor commented Feb 15, 2021

My point about shadowing was that if user explicitly shadowed a parameter, then the parameter would still be shadowed upon resumption.

Ah, I totally misread your code example. It's actually kind of an interesting case I hadn't thought about, thanks for sharing! ThankfuIy, I think it is behavior that's kind of hard to get in the first place, and that users familiar with Rust's shadowing rules shouldn't be too surprised by it.

It sounds like we're pretty much on the same page! 😁

@Nemo157
Copy link
Member

@Nemo157 Nemo157 commented Feb 15, 2021

Generators want only one thing: easily implement Iterator and Stream. Generators do not give a damn about resume arguments. Once you realize that, the syntax becomes pretty trival.

I disagree with this, I would really like generator based Sink/AsyncRead/AsyncWrite implementations too to replace the state machine contortions you need to deal with them currently.

@jan-hudec
Copy link

@jan-hudec jan-hudec commented Feb 15, 2021

So generators or yield closures? Por que no los dos?

Agreed. I was evaluating both options from a perspective of "assuming we want generalized coroutines, which syntax makes sense?"

Given that resume arguments are where async-like syntax falls short, and given that iterators and streams don't care about that functionality, I certainly have no objection to providing async-like syntax for those use cases, either as a stop gap before general coroutines land, or as a user-friendly facade over them for the common case.

The first option is more general. It helps writing any state machine, including the same as produced by async, at the cost of requiring spelling out the state enum. The second option, on the other hand, is just a different special case similar to async.

In that light, I would:

  • say that the first option fits the purpose of specific RFC better,
  • vote for doing the first option first, and
  • suggest creating a separate RFC to bikeshed the second (the first one is fairly minimal, so there fortunately isn't much to bikeshed).

@samsartor
Copy link
Contributor

@samsartor samsartor commented Feb 15, 2021

I would really like generator based Sink/AsyncRead/AsyncWrite implementations too to replace the state machine contortions you need to deal with them currently.

We all want a nicer syntax for those traits! But I would say you are looking for a generalized coroutine syntax. For example, here is a sketch of using MCP-49's yield closures to decode a base64 stream:

let mut decoder = |sextet: u8, octets: &mut ReadBuf| {
    let a = sextet; // witness a, b, and c sextets for later use
    yield;
    let b = sextet;
    octets.append_one(a << 2 | b >> 4); // aaaaaabb
    yield;
    let c = sextet;
    octets.append_one((b & 0b1111) << 4 | c >> 2); // bbbbcccc
    yield;
    octets.append_one((c & 0b11) << 6 | sextet) // ccdddddd
};

io::read_from_fn(move |ctx, octet_buffer| {
     // do some prep work during the first call to poll_read
    pin!(inner);
    let mut sextet_buffer = ReadBuf::new([MaybeUninit::uninit(); 1024]);

    'read loop {
        // wait for the inner reader to provide some bytes
        await_with!(AsyncRead::poll_read, &mut inner, ctx, &mut sextet_buffer)?;

        for byte in sextet_buffer.filled() {
            while octet_buffer.remaining() == 0 {
                // the given buffer is filled, the poll_read function should
                // return Ready
                yield Ready(Ok(()));
                // pick up where we left off
            }

            // pass a byte of input to our decoder
            decoder(match byte {
                b'A'..=b'Z' => byte - b'A' + 0,
                b'a'..=b'z' => byte - b'a' + 26,
                b'0'..=b'9' => byte - b'0' + 52,
                b'+' | b'-' => 62,
                b'/' | b',' | b'_' => 63,
                b'=' => return Ready(Ok(())),
                e => yield Ready(Err(InvalidChar(e).into())),
            }, octet_buffer);
        }
    }
})

Unlike yield closure implementations of Iterator and Stream, which benefit from some sugar on top of this, this AsyncRead impl is pretty hard to improve upon. You could eliminate a little verbosity by finding some way to make the ctx resume arg implicit, and get back the .await syntax, but it is a fairly small gain for quite a lot of magic.

@Nemo157
Copy link
Member

@Nemo157 Nemo157 commented Feb 15, 2021

And here's an example using async-io-macros (full example):

async_io_macros::async_read! {
    futures::pin_mut!(input);

    loop {
        let mut bytes = [0; 4];
        let len = input.read(&mut bytes).await?;
        if len == 0 {
            break;
        }
        input.read_exact(&mut bytes[len..]).await?;
        for byte in &mut bytes {
            *byte = match *byte {
                b'A'..=b'Z' => *byte - b'A',
                b'a'..=b'z' => *byte - b'a' + 26,
                b'0'..=b'9' => *byte - b'0' + 52,
                b'+' | b'-' => 62,
                b'/' | b',' | b'_' => 63,
                b'=' => b'=',
                _ => return Err(std::io::Error::new(std::io::ErrorKind::Other, "invalid char")),
            }
        }
        let out = [
            bytes[0] << 2 | bytes[1] >> 4,
            bytes[1] << 4 | bytes[2] >> 2,
            bytes[2] << 6 | bytes[3],
        ];
        let mut out = if bytes[2] == b'=' { &out[..1] } else if bytes[3] == b'=' { &out[..2] } else { &out[..] };
        while !out.is_empty() {
            yield |buffer| {
                let len = buffer.len().min(out.len());
                buffer[..len].copy_from_slice(&out[..len]);
                let (_, tail) = out.split_at(len);
                out = tail;
                Ok(len)
            };
        }
    }

    Ok(())
}

I'm not 100% happy with the looped yielded closure for output, but I haven't had time to try and come up with a better syntax.

Ah, I think I see the disconnect here, when you say generator you're specifically talking about something used for iterator/stream-like usecases? While I see generator (in Rust land) as being generalized coroutine syntax, since that's what it is currently on nightly.

@samsartor
Copy link
Contributor

@samsartor samsartor commented Feb 15, 2021

Ah, I think I see the disconnect here, when you say generator you're specifically talking about something used for iterator/stream-like usecases? While I see generator (in Rust land) as being generalized coroutine syntax, since that's what it is currently on nightly.

Yep. The generalized coroutine design notes go over this in detail, but the term "generator" is a bit overloaded in the Rust language design space right now. I tend to use it more to describe the generators in RFC-2996 (gen fn) than the ones in RFC-2033 (the current Generator trait). Neither is strictly wrong, but since I was making a point about including both yield closures (which are fairly similar to generators today) and gen functions (which are more like the async-stream or propane macros), the distinction was important.

@tema3210
Copy link

@tema3210 tema3210 commented Jun 15, 2021

What if we make yield to have syntax: receiver.yield expr, meaning that the receiver mutable binding will be assigned with the next resume argument?

This makes things explicit about where the next resume argument will be stored, so that the magic mutation is no longer too confusing.

UI with borrows is likely to be improved, as if there is a borrow of some binding, then users cannot assign to it, neither with plain assign nor by yielding to it (hence the requirement for receiver to be a mutable binding).

And if desired, we could easily allow yield expr to be also an expression - this proposal doesn't intervenes in any way.

@SimonSapin
Copy link
Contributor

@SimonSapin SimonSapin commented Jun 15, 2021

I don’t see what foo.yield bar brings over foo = yield bar where yield bar is an expression.

@tema3210
Copy link

@tema3210 tema3210 commented Jul 4, 2021

Another idea about the topic is that we can simply say that the resume argument isn't initialized before the first yield, that behavior would be surprising yet correct.

@tema3210
Copy link

@tema3210 tema3210 commented Dec 1, 2021

After a lot of time and reading design notes many times, I got feeling that there are two entirely distinct level of abstractions involved:

  • First is what people often want for implementing iterators and streams: co-called gen fns from RFC 2996. These are high level concepts useful in a lot of places (for example, proposed async for);
  • Second one is level of protocol state machines and other coroutines.

I think that instead of trying to fit one feature on two uses, we have to design distinct UIs for the feature:

The high-level case I imagine is more of that in RFC 2996, here we allow functions to have only yield and produce impls of either Stream or Iterator. When exhausted we default to restarting unless we have a move capture - in this case coroutine becomes poisoned.

The low-level case and API is described in MCP-49. On behavioural part we don't do implicit restarting and instead always poison coroutine when final state is reached (we mention this in the docs, and ask users to explicitly loop internals of their coroutines if needed). return is allowed and can have type other than yield; in case of both being used in same closure, the resulting type implements Generator and has GeneratorState wrapping. If a closure has only yields, it implements only the most basic FnPinMut trait.

We need to somehow distingush all these kinds, I think that more common (and one having less types involved) high level use deserves its own top-level syntax as of async. My bet is gen fn syntax for high level case; just reuse closures for low level case.

Traits are following:

  • Iterator<Item=Item> ~= FnPinMut(())->Option<Item> + Unpin this is case for gen fn and gen closures;
  • Stream<Item=Item> ~= FnPinMut(&mut Context) -> Poll<Option<Item>> - async gen closures and fns;
  • Generator<R,Yield=Item,Return=Return> ~= FnPinMut(R) ->GeneratorState<Item,Return> - closures with both yield and return;
  • AsyncGenerator<R,Yield=Item,Return=Return> ~= FnPinMut(R,&mut Context)->Poll< GeneratorState<Item,Return>> - async generators.

Today, only FnPinMut and AsyncGenerator are absent. Also, we can implement these traits for FnPinMut types with right bounds and make state machine transform to just produce impl FnPinMut types.

Edit: And given that we can have two UIs, we can pick both sides on first resume problem:

  • The high-level case gets yield as an expressions with the exception of first resume, which gets assigned to argument's binding;
  • Low-level case gets magic mutation in sake of correctness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment