New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A `yield` construct in the vein of C# #388

Open
rust-highfive opened this Issue Oct 12, 2014 · 73 comments

Comments

Projects
None yet
@rust-highfive

rust-highfive commented Oct 12, 2014

Issue by bstrie
Friday Jul 12, 2013 at 13:46 GMT

For earlier discussion, see rust-lang/rust#7746

This issue was labelled with: A-an-interesting-project, E-hard, I-wishlist in the Rust repository


@thestinger has mentioned this several times as a wishlist item. This is a feature that greatly simplifies the implementation of external iterators by allowing them to be compiled to state machines at compile time (did I get that right?). See http://msdn.microsoft.com/en-us/library/vstudio/9k7k7cf0.aspx for reference.

While we shouldn't expect this for 1.0, it might be prescient to reserve the keyword right now.

Nominating for far-future milestone.

@errordeveloper

This comment has been minimized.

Show comment
Hide comment
@errordeveloper

errordeveloper Oct 12, 2014

I believe yield conveys pretty much the same meaning in other languages, including Python, Ruby and Lua. I think it's very appropriate to remove "C#" from the title.

errordeveloper commented Oct 12, 2014

I believe yield conveys pretty much the same meaning in other languages, including Python, Ruby and Lua. I think it's very appropriate to remove "C#" from the title.

@thestinger

This comment has been minimized.

Show comment
Hide comment
@thestinger

thestinger Oct 12, 2014

@errordeveloper: It's not implemented with a compile-time state machine in Python and Ruby. C# in the title is definitely appropriate, because (AFAIK) there isn't another language with a comparable feature. The proposal is not about generators / coroutines built on context switches.

thestinger commented Oct 12, 2014

@errordeveloper: It's not implemented with a compile-time state machine in Python and Ruby. C# in the title is definitely appropriate, because (AFAIK) there isn't another language with a comparable feature. The proposal is not about generators / coroutines built on context switches.

@errordeveloper

This comment has been minimized.

Show comment
Hide comment
@errordeveloper

errordeveloper Oct 12, 2014

@thestinger sure, I didn't fully realise that.

errordeveloper commented Oct 12, 2014

@thestinger sure, I didn't fully realise that.

@sfackler

This comment has been minimized.

Show comment
Hide comment
@sfackler

sfackler Oct 14, 2014

Member

A decently in depth discussion of yield support in C++: http://isocpp.org/files/papers/N4134.pdf

Member

sfackler commented Oct 14, 2014

A decently in depth discussion of yield support in C++: http://isocpp.org/files/papers/N4134.pdf

@errordeveloper

This comment has been minimized.

Show comment
Hide comment
@errordeveloper

errordeveloper commented Oct 14, 2014

@sfackler great find!

@errordeveloper

This comment has been minimized.

Show comment
Hide comment
@errordeveloper

errordeveloper Oct 14, 2014

If Am I understanding correct that someone needs to make a pull-request with an actual RFC write-up, and if so, has anyone here already started on that or at least intending to do so?

errordeveloper commented Oct 14, 2014

If Am I understanding correct that someone needs to make a pull-request with an actual RFC write-up, and if so, has anyone here already started on that or at least intending to do so?

@sinistersnare

This comment has been minimized.

Show comment
Hide comment

sinistersnare commented Oct 14, 2014

@zwarich

This comment has been minimized.

Show comment
Hide comment
@zwarich

zwarich Oct 14, 2014

Contributor

@errordeveloper I have started thinking about the implications of a 'yield' construct for borrow checking, which is something that past proposals have sort of glossed over. I haven't been too anxious to write up an RFC, because there is no chance that it will be considered before 1.0.

Contributor

zwarich commented Oct 14, 2014

@errordeveloper I have started thinking about the implications of a 'yield' construct for borrow checking, which is something that past proposals have sort of glossed over. I haven't been too anxious to write up an RFC, because there is no chance that it will be considered before 1.0.

@Lucretiel

This comment has been minimized.

Show comment
Hide comment
@Lucretiel

Lucretiel Nov 19, 2014

Is the fact that it's implemented as a compile-time state machine significant? The usage pattern to the developer is the same as in any other language, albeit with explicit typing.

Lucretiel commented Nov 19, 2014

Is the fact that it's implemented as a compile-time state machine significant? The usage pattern to the developer is the same as in any other language, albeit with explicit typing.

@errordeveloper

This comment has been minimized.

Show comment
Hide comment
@errordeveloper

errordeveloper Nov 25, 2014

It is, as it makes this feature most useful. State machines generated at
build time can be validated/safety-checked as well as optimised by the
compiler plugin that implements it.

On Wed, 19 Nov 2014 20:46 Lucretiel notifications@github.com wrote:

Is the fact that it's implemented as a compile-time state machine
significant? The usage pattern to the developer is the same as in any other
language, albeit with explicit typing.


Reply to this email directly or view it on GitHub
#388 (comment).

errordeveloper commented Nov 25, 2014

It is, as it makes this feature most useful. State machines generated at
build time can be validated/safety-checked as well as optimised by the
compiler plugin that implements it.

On Wed, 19 Nov 2014 20:46 Lucretiel notifications@github.com wrote:

Is the fact that it's implemented as a compile-time state machine
significant? The usage pattern to the developer is the same as in any other
language, albeit with explicit typing.


Reply to this email directly or view it on GitHub
#388 (comment).

@Ericson2314

This comment has been minimized.

Show comment
Hide comment
@Ericson2314

Ericson2314 Nov 30, 2014

Contributor

That is an interesting bunch of libraries there! It looks like your monad macro has bind as a primitive like Haskell's. I wonder if performance could be improved by using map and join alone as primitives to remove some "administrative lambdas".

Either way (as pipes will contain FnOnces even if the core monad parts don't), it would be nice if there was some way to add an #[inline] to the function the once closure sugar desguars into. Maybe that might help with code locality.

Contributor

Ericson2314 commented Nov 30, 2014

That is an interesting bunch of libraries there! It looks like your monad macro has bind as a primitive like Haskell's. I wonder if performance could be improved by using map and join alone as primitives to remove some "administrative lambdas".

Either way (as pipes will contain FnOnces even if the core monad parts don't), it would be nice if there was some way to add an #[inline] to the function the once closure sugar desguars into. Maybe that might help with code locality.

@gnzlbg

This comment has been minimized.

Show comment
Hide comment
@gnzlbg

gnzlbg Apr 17, 2015

Contributor

Here the latest Kohlhoff paper "Resumable Expressions", a new lower-level c++ alternative to await that allows implementing yield at the library level and combines stackless and stackfull coroutines:
http://open-std.org/JTC1/SC22/WG21/docs/papers/2015/n4453.pdf

Contributor

gnzlbg commented Apr 17, 2015

Here the latest Kohlhoff paper "Resumable Expressions", a new lower-level c++ alternative to await that allows implementing yield at the library level and combines stackless and stackfull coroutines:
http://open-std.org/JTC1/SC22/WG21/docs/papers/2015/n4453.pdf

@Lucretiel

This comment has been minimized.

Show comment
Hide comment
@Lucretiel

Lucretiel Apr 17, 2015

Lucretiel commented Apr 17, 2015

@steveklabnik steveklabnik referenced this issue Apr 21, 2015

Closed

Async IO #1081

@steveklabnik

This comment has been minimized.

Show comment
Hide comment
@steveklabnik

steveklabnik May 6, 2015

Member

Worth noting: Python's PEP process just accepted an async/await PEP: https://www.python.org/dev/peps/pep-0492/

Member

steveklabnik commented May 6, 2015

Worth noting: Python's PEP process just accepted an async/await PEP: https://www.python.org/dev/peps/pep-0492/

@ryanhiebert

This comment has been minimized.

Show comment
Hide comment
@ryanhiebert

ryanhiebert May 6, 2015

PEP 492 is pretty cool, I'm excited to try the next pre-release of Python 3.5.

In addition to generators and async functions in Javascript, there's also some work being done on async generators as well: https://github.com/jhusain/asyncgenerator. The author gave a talk about some related concepts in this talk, though the syntax is hidden on the last slide and not really explained: https://www.youtube.com/watch?v=gawmdhCNy-A.

ryanhiebert commented May 6, 2015

PEP 492 is pretty cool, I'm excited to try the next pre-release of Python 3.5.

In addition to generators and async functions in Javascript, there's also some work being done on async generators as well: https://github.com/jhusain/asyncgenerator. The author gave a talk about some related concepts in this talk, though the syntax is hidden on the last slide and not really explained: https://www.youtube.com/watch?v=gawmdhCNy-A.

@gnzlbg

This comment has been minimized.

Show comment
Hide comment
@gnzlbg

gnzlbg May 7, 2015

Contributor

@Lucretiel i find that being able to reuse existing functions_as is_ is a huge advantage of the "resumable expressions" proposal. The other proposals split the world and that is in my opinion a very bad thing.

Contributor

gnzlbg commented May 7, 2015

@Lucretiel i find that being able to reuse existing functions_as is_ is a huge advantage of the "resumable expressions" proposal. The other proposals split the world and that is in my opinion a very bad thing.

@Lucretiel

This comment has been minimized.

Show comment
Hide comment
@Lucretiel

Lucretiel May 8, 2015

The trouble is that you then lose what I think is one of the primary benefits of async programming: explicit context switch points. When literally any function can cause a context switch, you're basically not any better than mutlithreading- constantly having to use locks and other primitives to ensure all your state is safe. God help you if you get it wrong and have to try to debug it. With explict async, you know that no other coroutine will execute between one yield and the next, so you're safe to do whatever needs to be done, and be guaranteed that it will appear atomic to other coroutines.

I'll grant that being able to reuse everything has some surface appeal, but I think it's a problem waiting to happen. The reality is that concurrency requires a fundamentally different model of thinking- look at all the classic multithreading problems to see why- and explicit async reflects that model.

Just my two cents, though. It could be that rust, with its emphasis on static memory safety, doesn't need to worry as much as, say, C++.

Lucretiel commented May 8, 2015

The trouble is that you then lose what I think is one of the primary benefits of async programming: explicit context switch points. When literally any function can cause a context switch, you're basically not any better than mutlithreading- constantly having to use locks and other primitives to ensure all your state is safe. God help you if you get it wrong and have to try to debug it. With explict async, you know that no other coroutine will execute between one yield and the next, so you're safe to do whatever needs to be done, and be guaranteed that it will appear atomic to other coroutines.

I'll grant that being able to reuse everything has some surface appeal, but I think it's a problem waiting to happen. The reality is that concurrency requires a fundamentally different model of thinking- look at all the classic multithreading problems to see why- and explicit async reflects that model.

Just my two cents, though. It could be that rust, with its emphasis on static memory safety, doesn't need to worry as much as, say, C++.

@jtolds

This comment has been minimized.

Show comment
Hide comment
@jtolds

jtolds Sep 17, 2015

Just to provide a counterpoint, introducing async/await keywords is (IMO) a growing language smell. It means that you're effectively partitioning all code into two competing classes (blocking/async), and that really causes some significant problems for large codebases. Please see http://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/ for more.

jtolds commented Sep 17, 2015

Just to provide a counterpoint, introducing async/await keywords is (IMO) a growing language smell. It means that you're effectively partitioning all code into two competing classes (blocking/async), and that really causes some significant problems for large codebases. Please see http://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/ for more.

@gnzlbg

This comment has been minimized.

Show comment
Hide comment
@gnzlbg

gnzlbg Sep 17, 2015

Contributor

Just to provide a counterpoint, introducing async/await keywords is (IMO)
a growing language smell. It means that you're effectively partitioning all
code into two competing classes (blocking/async)

Yes, that basically splits the world.

On Thu, Sep 17, 2015 at 7:17 AM, JT Olds notifications@github.com wrote:

Just to provide a counterpoint, introducing async/await keywords is (IMO)
a growing language smell. It means that you're effectively partitioning all
code into two competing classes (blocking/async), and that really causes
some significant problems for large codebases. Please see
http://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/
for more.


Reply to this email directly or view it on GitHub
#388 (comment).

Contributor

gnzlbg commented Sep 17, 2015

Just to provide a counterpoint, introducing async/await keywords is (IMO)
a growing language smell. It means that you're effectively partitioning all
code into two competing classes (blocking/async)

Yes, that basically splits the world.

On Thu, Sep 17, 2015 at 7:17 AM, JT Olds notifications@github.com wrote:

Just to provide a counterpoint, introducing async/await keywords is (IMO)
a growing language smell. It means that you're effectively partitioning all
code into two competing classes (blocking/async), and that really causes
some significant problems for large codebases. Please see
http://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/
for more.


Reply to this email directly or view it on GitHub
#388 (comment).

@ryanhiebert

This comment has been minimized.

Show comment
Hide comment
@ryanhiebert

ryanhiebert Sep 17, 2015

Just to provide a counterpoint, introducing async/await keywords is (IMO) a growing language smell. It means that you're effectively partitioning all code into two competing classes (blocking/async)

I completely understand your point of view, having debated it a great deal myself, but I have to disagree. It is a totally different model with different semantics, and you cannot reasonably pretend to do async work synchronously and have be something that a mediocre programmer, such as myself, can reason about.

While a simple function is just a function, an async function isn't a function at all. The flow is like a function, but the async and await keywords allow us to reason about what race conditions may happen in this function in a sane way, because we know where suspension will happen. The similarity to a function is on purpose, but they aren't the same thing, and for very good reason.

The fact that the later we have explicit async and await the more we will have libraries that end up with two versions of different functions, one synchronous and one asynchronous, is an indication that it's important to solve this sooner rather than later, so that library authors can have the tools they need to make that decision sooner rather than later. Then maybe they can choose to have only one version, whichever is appropriate for their work.

Because of Rust's memory safety guarantees, it's easy to conclude that we don't need explicit suspension points, and that Rust's ownership rules will sort out this mess, as it does for memory access. But there are plenty of race conditions that aren't memory-access data races, and programmers need as much help with those as they do with memory races.

So far, I haven't seen how Rust's ownership could make that part of explicit async programming unnecessary. If somebody can solve that, then I'd probably consider further whether explicit async and await keywords were really necessary. For now, I'm convinced that they are.

ryanhiebert commented Sep 17, 2015

Just to provide a counterpoint, introducing async/await keywords is (IMO) a growing language smell. It means that you're effectively partitioning all code into two competing classes (blocking/async)

I completely understand your point of view, having debated it a great deal myself, but I have to disagree. It is a totally different model with different semantics, and you cannot reasonably pretend to do async work synchronously and have be something that a mediocre programmer, such as myself, can reason about.

While a simple function is just a function, an async function isn't a function at all. The flow is like a function, but the async and await keywords allow us to reason about what race conditions may happen in this function in a sane way, because we know where suspension will happen. The similarity to a function is on purpose, but they aren't the same thing, and for very good reason.

The fact that the later we have explicit async and await the more we will have libraries that end up with two versions of different functions, one synchronous and one asynchronous, is an indication that it's important to solve this sooner rather than later, so that library authors can have the tools they need to make that decision sooner rather than later. Then maybe they can choose to have only one version, whichever is appropriate for their work.

Because of Rust's memory safety guarantees, it's easy to conclude that we don't need explicit suspension points, and that Rust's ownership rules will sort out this mess, as it does for memory access. But there are plenty of race conditions that aren't memory-access data races, and programmers need as much help with those as they do with memory races.

So far, I haven't seen how Rust's ownership could make that part of explicit async programming unnecessary. If somebody can solve that, then I'd probably consider further whether explicit async and await keywords were really necessary. For now, I'm convinced that they are.

@jtolds

This comment has been minimized.

Show comment
Hide comment
@jtolds

jtolds Sep 17, 2015

While a simple function is just a function, an async function isn't a function at all. The flow is like a function, but the async and await keywords allow us to reason about what race conditions may happen in this function in a sane way, because we know where suspension will happen. The similarity to a function is on purpose, but they aren't the same thing, and for very good reason.

Because of Rust's memory safety guarantees, it's easy to conclude that we don't need explicit suspension points, and that Rust's ownership rules will sort out this mess, as it does for memory access. But there are plenty of race conditions that aren't memory-access data races, and programmers need as much help with those as they do with memory races.

This seems like a surprising position to take. This doesn't strike me as a very good reason, but more akin to rationalization. As long as there's other threads running, you definitely don't have explicit suspension points. Your argument against yourself here is a good one. Rust is designed to make explicit suspension points unnecessary, which is a good thing. Single-threaded event loops don't make good use of multi-core parallelism.

The fact that the later we have explicit async and await the more we will have libraries that end up with two versions of different functions, one synchronous and one asynchronous, is an indication that it's important to solve this sooner rather than later, so that library authors can have the tools they need to make that decision sooner rather than later. Then maybe they can choose to have only one version, whichever is appropriate for their work.

Again, a surprising position to take. If the language handles async IO for you, then every library always works for you, and there's no bifurcation. Imagine the following scenario: you're writing some small app at low scale. It's a killer app, it does well, and you need to scale it up. Bam, you need to switch to an async IO model (as indicated by https://news.ycombinator.com/item?id=10232131). Well shoot, that sucks. Okay, so you start porting everything, but you start hitting the long-tail of libraries. There's only one Rust UPNP library perhaps, and its author didn't need to scale like you do, and it blocks. Great, so now you either get to write a new library or make a threadpool. But it's not just the UPNP library, there's 20 other little libraries like this. You switch to the async version of some database library and everything starts looking good. A few weeks later you notice your app just keeps blocking up and no longer responding. Turns out one small error case in the database library uses the blocking version of the library and now your entire event loop is hosed.

This entire problem could be avoided completely.

jtolds commented Sep 17, 2015

While a simple function is just a function, an async function isn't a function at all. The flow is like a function, but the async and await keywords allow us to reason about what race conditions may happen in this function in a sane way, because we know where suspension will happen. The similarity to a function is on purpose, but they aren't the same thing, and for very good reason.

Because of Rust's memory safety guarantees, it's easy to conclude that we don't need explicit suspension points, and that Rust's ownership rules will sort out this mess, as it does for memory access. But there are plenty of race conditions that aren't memory-access data races, and programmers need as much help with those as they do with memory races.

This seems like a surprising position to take. This doesn't strike me as a very good reason, but more akin to rationalization. As long as there's other threads running, you definitely don't have explicit suspension points. Your argument against yourself here is a good one. Rust is designed to make explicit suspension points unnecessary, which is a good thing. Single-threaded event loops don't make good use of multi-core parallelism.

The fact that the later we have explicit async and await the more we will have libraries that end up with two versions of different functions, one synchronous and one asynchronous, is an indication that it's important to solve this sooner rather than later, so that library authors can have the tools they need to make that decision sooner rather than later. Then maybe they can choose to have only one version, whichever is appropriate for their work.

Again, a surprising position to take. If the language handles async IO for you, then every library always works for you, and there's no bifurcation. Imagine the following scenario: you're writing some small app at low scale. It's a killer app, it does well, and you need to scale it up. Bam, you need to switch to an async IO model (as indicated by https://news.ycombinator.com/item?id=10232131). Well shoot, that sucks. Okay, so you start porting everything, but you start hitting the long-tail of libraries. There's only one Rust UPNP library perhaps, and its author didn't need to scale like you do, and it blocks. Great, so now you either get to write a new library or make a threadpool. But it's not just the UPNP library, there's 20 other little libraries like this. You switch to the async version of some database library and everything starts looking good. A few weeks later you notice your app just keeps blocking up and no longer responding. Turns out one small error case in the database library uses the blocking version of the library and now your entire event loop is hosed.

This entire problem could be avoided completely.

@sinistersnare

This comment has been minimized.

Show comment
Hide comment

sinistersnare commented Sep 17, 2015

@glaebhoerl

This comment has been minimized.

Show comment
Hide comment
Contributor

glaebhoerl commented Sep 17, 2015

@rpjohnst

This comment has been minimized.

Show comment
Hide comment
@rpjohnst

rpjohnst Sep 17, 2015

Implementation-wise, it is not hard to convert an async function back to a blocking one with little to no overhead. It may be too late to do anything like this in Rust, but it would be interesting to make everything async, and then let typical blocking calls use some syntactically lightweight (perhaps even invisible) way to force them to be synchronous.

This would give people the benefits of Go-like concurrency, without taking away the control and flexibility of async/await.

rpjohnst commented Sep 17, 2015

Implementation-wise, it is not hard to convert an async function back to a blocking one with little to no overhead. It may be too late to do anything like this in Rust, but it would be interesting to make everything async, and then let typical blocking calls use some syntactically lightweight (perhaps even invisible) way to force them to be synchronous.

This would give people the benefits of Go-like concurrency, without taking away the control and flexibility of async/await.

@Ericson2314

This comment has been minimized.

Show comment
Hide comment
@Ericson2314

Ericson2314 Sep 18, 2015

Contributor

Heh, parts of this rant have been a long time coming :)

To those that want to combine everything, I am sympathetic, but it's very important to remember that these different implementation strategies have different tradeoffs, and that matter for a language like Rust. Much talk of continuation, threads, etc misses out that. For example, we can use:

The solution is not to try to paper over these distinction, but provide ways to be agnostic/polymorphic to which is used. For a first step, it's good to make higher-order functions agnostic in their arguments. For example a suspended thread in my forked libfringe can be treated as a normal closure if the current continuation is prepended as the first argument. Even better however, would to write polymorphic code that itself can be specialized to one of those types. We can go either the monadic or the algebraic affects route for this.

For those that care, I'd argue the root of the problem is that Rust (and C) implicitly manipulate the call stack, unlike a hypothetical typed assembly language. In such a typed assembly language, a contiguous call stack, a segmented call stack, webs of dynamically dispatched closures, finite state machines, etc, can all be on equal footing. Then everything we want here could be left to libraries and not the compiler, and all would be well.

It's infeasible to ditch LLVM and get really fancy dependent types in the at-all-near future (let me not suggest otherwise), so we will need compiler extensions/plugins. But I think with enough cleverness, we can integrate that with langauge-level abstractions like monad trait / effect system. I have some hunches on how that might be done, but they're pretty WIP, and I've already said some odd stuff, so I'll call it quits here.

Contributor

Ericson2314 commented Sep 18, 2015

Heh, parts of this rant have been a long time coming :)

To those that want to combine everything, I am sympathetic, but it's very important to remember that these different implementation strategies have different tradeoffs, and that matter for a language like Rust. Much talk of continuation, threads, etc misses out that. For example, we can use:

The solution is not to try to paper over these distinction, but provide ways to be agnostic/polymorphic to which is used. For a first step, it's good to make higher-order functions agnostic in their arguments. For example a suspended thread in my forked libfringe can be treated as a normal closure if the current continuation is prepended as the first argument. Even better however, would to write polymorphic code that itself can be specialized to one of those types. We can go either the monadic or the algebraic affects route for this.

For those that care, I'd argue the root of the problem is that Rust (and C) implicitly manipulate the call stack, unlike a hypothetical typed assembly language. In such a typed assembly language, a contiguous call stack, a segmented call stack, webs of dynamically dispatched closures, finite state machines, etc, can all be on equal footing. Then everything we want here could be left to libraries and not the compiler, and all would be well.

It's infeasible to ditch LLVM and get really fancy dependent types in the at-all-near future (let me not suggest otherwise), so we will need compiler extensions/plugins. But I think with enough cleverness, we can integrate that with langauge-level abstractions like monad trait / effect system. I have some hunches on how that might be done, but they're pretty WIP, and I've already said some odd stuff, so I'll call it quits here.

@ryanhiebert

This comment has been minimized.

Show comment
Hide comment
@ryanhiebert

ryanhiebert Sep 18, 2015

@rpjohnst:

Implementation-wise, it is not hard to convert an async function back to a blocking one with little to no overhead. It may be too late to do anything like this in Rust, but it would be interesting to make everything async, and then let typical blocking calls use some syntactically lightweight (perhaps even invisible) way to force them to be synchronous.

This ability to run async code from synchronous code is somewhat assumed. The split the world issue still exists, though, and this doesn't do anything to solve it, unfortunately. The link (shared twice above by different people) titled What Color is Your Function?, explains that this correspondence exists. Though it could be a neat language feature if we did end up with async / await keywords.


@jtolds:

As long as there's other threads running, you definitely don't have explicit suspension points. Your argument against yourself here is a good one. Rust is designed to make explicit suspension points unnecessary, which is a good thing.

Using threads implies implicit suspension points. That doesn't mean there aren't some explicit suspension points in there too, but you have to consider both complexities together when you do that, so it's something you likely wouldn't want to do without serious consideration.

Rust isn't designed to make explicit suspension points unnecessary AFAIK, it simply doesn't have the feature. A post-1.0 feature. Perhaps there's some future enhancement that would make it unnecessary, but I haven't heard of it yet.

Single-threaded event loops don't make good use of multi-core parallelism.

You are correct that single threaded event loops can't use multi-core parallelism. That's the whole point. And if you're using them, they don't need to, because the CPU isn't your bottleneck, IO is.

Again, a surprising position to take. If the language handles async IO for you, then every library always works for you, and there's no bifurcation.

bifurcation. And I learn a new word, thank you. 😃

It depends, of course, on what you mean by "handles it for you". Rust's type system can make things memory-safe, but there are other conditions that you have to solve yourself, because Rust does not solve it for you.

Consider the dual lock deadlock. In single-threaded explicit-suspension-points world, handling such a situation is trivial (python-ish pseudo-code, I'm overlooking some problems Rust saves you from to show you one it doesn't):

import get_big_thing  # A big IO bound task.

# Both locks are required simultaneously to safely run get_big_thing
mutex1, mutex2 = Mutex(), Mutex()

async def this_way():
    lock1 = mutex1.lock()
    lock2 = mutex2.lock()
    return (await get_big_thing())
    # Locks released at end of scope

async def that_way():
    lock2 = mutex2.lock()
    lock1 = mutex1.lock()
    return (await get_big_thing())
    # Locks released at end of scope

# Some run loop runs these two coroutines

The problem here is a dead-lock. If a synchronous version of this code is run on multiple threads to make it parallel, it would be easy for the the first lock to be obtained by this_way, then the second lock to be obtained by that_way, and then neither would be able to continue, because both of their locks are already taken.

Using a single thread to run these coroutines (because our bottleneck is IO, not CPU) works really well, because the explicit await points mean that the programmer can easily identify the places where something else might happen first.

It's important to note that the purpose of this is not to make it easy to convert synchronous code to async code. The purpose is to allow the author to reason about async code. The explicit suspension points help the author to do that.

This point is so important that Python chose not to implement the awaitable protocol on the futures module, which uses threads / processes to resolve them. The async / await model only really works well when you're dealing inside a single thread, but it can be invaluable in that context.

Imagine the following scenario [snip long demonstration of the long-tail effect]

Yes, the long-tail effect really exists. The longer we wait before providing the choice, the worse it gets, because there's more code that has to be considered retroactively. I expect this to be a big burden in the Python community going forward. They still added the feature.

Unless there's another alternative that can make these deadlocks and race conditions that the borrow checker does not solve easier to reason about, this solution does indeed help programmers reason about what will happen in their program. Unless there is a better solution to the problem, all we would do by waiting is exacerbate the problem.

ryanhiebert commented Sep 18, 2015

@rpjohnst:

Implementation-wise, it is not hard to convert an async function back to a blocking one with little to no overhead. It may be too late to do anything like this in Rust, but it would be interesting to make everything async, and then let typical blocking calls use some syntactically lightweight (perhaps even invisible) way to force them to be synchronous.

This ability to run async code from synchronous code is somewhat assumed. The split the world issue still exists, though, and this doesn't do anything to solve it, unfortunately. The link (shared twice above by different people) titled What Color is Your Function?, explains that this correspondence exists. Though it could be a neat language feature if we did end up with async / await keywords.


@jtolds:

As long as there's other threads running, you definitely don't have explicit suspension points. Your argument against yourself here is a good one. Rust is designed to make explicit suspension points unnecessary, which is a good thing.

Using threads implies implicit suspension points. That doesn't mean there aren't some explicit suspension points in there too, but you have to consider both complexities together when you do that, so it's something you likely wouldn't want to do without serious consideration.

Rust isn't designed to make explicit suspension points unnecessary AFAIK, it simply doesn't have the feature. A post-1.0 feature. Perhaps there's some future enhancement that would make it unnecessary, but I haven't heard of it yet.

Single-threaded event loops don't make good use of multi-core parallelism.

You are correct that single threaded event loops can't use multi-core parallelism. That's the whole point. And if you're using them, they don't need to, because the CPU isn't your bottleneck, IO is.

Again, a surprising position to take. If the language handles async IO for you, then every library always works for you, and there's no bifurcation.

bifurcation. And I learn a new word, thank you. 😃

It depends, of course, on what you mean by "handles it for you". Rust's type system can make things memory-safe, but there are other conditions that you have to solve yourself, because Rust does not solve it for you.

Consider the dual lock deadlock. In single-threaded explicit-suspension-points world, handling such a situation is trivial (python-ish pseudo-code, I'm overlooking some problems Rust saves you from to show you one it doesn't):

import get_big_thing  # A big IO bound task.

# Both locks are required simultaneously to safely run get_big_thing
mutex1, mutex2 = Mutex(), Mutex()

async def this_way():
    lock1 = mutex1.lock()
    lock2 = mutex2.lock()
    return (await get_big_thing())
    # Locks released at end of scope

async def that_way():
    lock2 = mutex2.lock()
    lock1 = mutex1.lock()
    return (await get_big_thing())
    # Locks released at end of scope

# Some run loop runs these two coroutines

The problem here is a dead-lock. If a synchronous version of this code is run on multiple threads to make it parallel, it would be easy for the the first lock to be obtained by this_way, then the second lock to be obtained by that_way, and then neither would be able to continue, because both of their locks are already taken.

Using a single thread to run these coroutines (because our bottleneck is IO, not CPU) works really well, because the explicit await points mean that the programmer can easily identify the places where something else might happen first.

It's important to note that the purpose of this is not to make it easy to convert synchronous code to async code. The purpose is to allow the author to reason about async code. The explicit suspension points help the author to do that.

This point is so important that Python chose not to implement the awaitable protocol on the futures module, which uses threads / processes to resolve them. The async / await model only really works well when you're dealing inside a single thread, but it can be invaluable in that context.

Imagine the following scenario [snip long demonstration of the long-tail effect]

Yes, the long-tail effect really exists. The longer we wait before providing the choice, the worse it gets, because there's more code that has to be considered retroactively. I expect this to be a big burden in the Python community going forward. They still added the feature.

Unless there's another alternative that can make these deadlocks and race conditions that the borrow checker does not solve easier to reason about, this solution does indeed help programmers reason about what will happen in their program. Unless there is a better solution to the problem, all we would do by waiting is exacerbate the problem.

@jtolds

This comment has been minimized.

Show comment
Hide comment
@jtolds

jtolds Sep 18, 2015

You are correct that single threaded event loops can't use multi-core parallelism. That's the whole point. And if you're using them, they don't need to, because the CPU isn't your bottleneck, IO is.

Not necessarily. Event loop/promise processing adds overhead, and there's a maximum amount of overhead a single core can handle, which limits the number of (possibly small) IO operations that can be handled. This isn't academic; the reason my company switched from Python+Twisted to Go was this reason. We were CPU bound when we should have been IO bound. That said, it was Python + Twisted, so I admit the overhead was unnecessarily large.

async def this_way():
    lock1 = mutex1.lock()
    lock2 = mutex2.lock()
    return (await get_big_thing())
    # Locks released at end of scope

In this example, I assume you mean to have lines like lock1 = await mutex1.lock() ? Surely you don't mean to use actual mutexes but instead mean async mutexes in your event loop code. Even with your description I admit I am baffled how async helps with deadlocks here. Either you mean real mutexes, in which case you have a blocked event loop waiting to happen and you didn't need the mutexes to begin with (cause there's no actual parallel multi-core contention), or you mean async mutexes, in which case you still get an application-level deadlock.

It's important to note that the purpose of this is not to make it easy to convert synchronous code to async code. The purpose is to allow the author to reason about async code. The explicit suspension points help the author to do that.

Not to belabor the point but so far I've only seen more confusion.

Yes, the long-tail effect really exists. The longer we wait before providing the choice, the worse it gets, because there's more code that has to be considered retroactively. I expect this to be a big burden in the Python community going forward. They still added the feature.

The initial release of Twisted was 12 years ago. I know they're finally adding deferreds and event loops to the standard lib, but there's been prior art and experience with this stuff for a long time.

Unless there's another alternative that can make these deadlocks and race conditions that the borrow checker does not solve easier to reason about...

I know this isn't news or anything, but it is impossible to solve all deadlocks, in a mathematical proof sort of way.

Anyway, I guess I should point out I understand the reasons why Rust decided to not do green threads and why that results in people discussing async IO, but it's just super disappointing. When something is so close to perfect you really care about those last few blemishes.

jtolds commented Sep 18, 2015

You are correct that single threaded event loops can't use multi-core parallelism. That's the whole point. And if you're using them, they don't need to, because the CPU isn't your bottleneck, IO is.

Not necessarily. Event loop/promise processing adds overhead, and there's a maximum amount of overhead a single core can handle, which limits the number of (possibly small) IO operations that can be handled. This isn't academic; the reason my company switched from Python+Twisted to Go was this reason. We were CPU bound when we should have been IO bound. That said, it was Python + Twisted, so I admit the overhead was unnecessarily large.

async def this_way():
    lock1 = mutex1.lock()
    lock2 = mutex2.lock()
    return (await get_big_thing())
    # Locks released at end of scope

In this example, I assume you mean to have lines like lock1 = await mutex1.lock() ? Surely you don't mean to use actual mutexes but instead mean async mutexes in your event loop code. Even with your description I admit I am baffled how async helps with deadlocks here. Either you mean real mutexes, in which case you have a blocked event loop waiting to happen and you didn't need the mutexes to begin with (cause there's no actual parallel multi-core contention), or you mean async mutexes, in which case you still get an application-level deadlock.

It's important to note that the purpose of this is not to make it easy to convert synchronous code to async code. The purpose is to allow the author to reason about async code. The explicit suspension points help the author to do that.

Not to belabor the point but so far I've only seen more confusion.

Yes, the long-tail effect really exists. The longer we wait before providing the choice, the worse it gets, because there's more code that has to be considered retroactively. I expect this to be a big burden in the Python community going forward. They still added the feature.

The initial release of Twisted was 12 years ago. I know they're finally adding deferreds and event loops to the standard lib, but there's been prior art and experience with this stuff for a long time.

Unless there's another alternative that can make these deadlocks and race conditions that the borrow checker does not solve easier to reason about...

I know this isn't news or anything, but it is impossible to solve all deadlocks, in a mathematical proof sort of way.

Anyway, I guess I should point out I understand the reasons why Rust decided to not do green threads and why that results in people discussing async IO, but it's just super disappointing. When something is so close to perfect you really care about those last few blemishes.

@jtolds

This comment has been minimized.

Show comment
Hide comment
@jtolds

jtolds Sep 18, 2015

Also, I want to point out that I totally give you that the async style does allow the programmer to see when she has full control of the thread of execution. That is a benefit. But everything is tradeoffs, and if I had to order my priorities that benefit would be very far down the priority list, certainly underneath being able to use all the third party libraries I need.

Additionally, I have seen subtle synchronization bugs introduced when someone has some ordered logic depending on complete control of the thread of execution, and then some later refactor comes in and turns a synchronous CPU-bound call into an async call, and then the synchronization logic isn't correspondingly updated. I think relying on having full control of the CPU is the wrong kind of thing to rely on, and personally I'd rather not fool people into thinking about it.

But yes, it is all about tradeoffs.

jtolds commented Sep 18, 2015

Also, I want to point out that I totally give you that the async style does allow the programmer to see when she has full control of the thread of execution. That is a benefit. But everything is tradeoffs, and if I had to order my priorities that benefit would be very far down the priority list, certainly underneath being able to use all the third party libraries I need.

Additionally, I have seen subtle synchronization bugs introduced when someone has some ordered logic depending on complete control of the thread of execution, and then some later refactor comes in and turns a synchronous CPU-bound call into an async call, and then the synchronization logic isn't correspondingly updated. I think relying on having full control of the CPU is the wrong kind of thing to rely on, and personally I'd rather not fool people into thinking about it.

But yes, it is all about tradeoffs.

@erickt

This comment has been minimized.

Show comment
Hide comment
@erickt

erickt Jul 31, 2016

@camlorn: stateful is coming along quite nicely with the help of @NSIL. We are getting pretty close to a 0.1 release, we just need to implement a few more things on dealing with temporaries across yield points. I'd love any help getting it ready :)

erickt commented Jul 31, 2016

@camlorn: stateful is coming along quite nicely with the help of @NSIL. We are getting pretty close to a 0.1 release, we just need to implement a few more things on dealing with temporaries across yield points. I'd love any help getting it ready :)

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Jul 31, 2016

Member

@camlorn FWIW, an official implementation would have to be done on MIR to be optimal - where it should be easier, too, as the MIR is already a CFG.

Member

eddyb commented Jul 31, 2016

@camlorn FWIW, an official implementation would have to be done on MIR to be optimal - where it should be easier, too, as the MIR is already a CFG.

@erickt

This comment has been minimized.

Show comment
Hide comment
@erickt

erickt Jul 31, 2016

@eddyb: I know you already know this, but for everyone else, it should be pretty straightforward to port Stateful's state machine transformations to Mir, since Stateful is heavily inspired by Mir. Really the tricky thing is really waiting on impl trait in order to implement this without boxing the iterator type.

erickt commented Jul 31, 2016

@eddyb: I know you already know this, but for everyone else, it should be pretty straightforward to port Stateful's state machine transformations to Mir, since Stateful is heavily inspired by Mir. Really the tricky thing is really waiting on impl trait in order to implement this without boxing the iterator type.

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Jul 31, 2016

Member

@erickt There are two reasons why a port is unnecessary/undesirable:

  • the "state machine transform" on the MIR is just changing Yield(next) to Return and pointing a branch of the entry point to next, no AST is involved at any point
  • generating the AdtDef for the state enum is tricky - it depends on when you do the state machine transform, if any optimizations are involved, then the layout of the type may depend on compiler options and any introspection of it would introduce hidden dependencies and compatibility hazards; if you do it too early, you risk being suboptimal (@alexcrichton ran into some perf issues with nested enums in futures-rs)
Member

eddyb commented Jul 31, 2016

@erickt There are two reasons why a port is unnecessary/undesirable:

  • the "state machine transform" on the MIR is just changing Yield(next) to Return and pointing a branch of the entry point to next, no AST is involved at any point
  • generating the AdtDef for the state enum is tricky - it depends on when you do the state machine transform, if any optimizations are involved, then the layout of the type may depend on compiler options and any introspection of it would introduce hidden dependencies and compatibility hazards; if you do it too early, you risk being suboptimal (@alexcrichton ran into some perf issues with nested enums in futures-rs)
@camlorn

This comment has been minimized.

Show comment
Hide comment
@camlorn

camlorn Aug 1, 2016

How are recursive generators handled?

In languages that aren't Rust, the generator can make another instance of itself and then yield from it using a for loop. I think lifetimes can be used to prevent it, but maybe it's something we want to allow.

Python has a yield from, but I think they added it for I/O. Perhaps this isn't necessary. It might also be a good use for the become keyword which I read was reserved, but that could also be used for tail calls in future, so possibly not because of the double meaning. The implication being that you can "become" another generator.

@erickt
I'll take a look. Not sure my Rust is good enough to help, but it might be.

camlorn commented Aug 1, 2016

How are recursive generators handled?

In languages that aren't Rust, the generator can make another instance of itself and then yield from it using a for loop. I think lifetimes can be used to prevent it, but maybe it's something we want to allow.

Python has a yield from, but I think they added it for I/O. Perhaps this isn't necessary. It might also be a good use for the become keyword which I read was reserved, but that could also be used for tail calls in future, so possibly not because of the double meaning. The implication being that you can "become" another generator.

@erickt
I'll take a look. Not sure my Rust is good enough to help, but it might be.

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Aug 1, 2016

Member

@camlorn Such recursion would become a type recursion and require boxing to avoid the state having an "infinite" size.

Member

eddyb commented Aug 1, 2016

@camlorn Such recursion would become a type recursion and require boxing to avoid the state having an "infinite" size.

@camlorn

This comment has been minimized.

Show comment
Hide comment
@camlorn

camlorn Aug 1, 2016

@eddyb
O, so it would.

camlorn commented Aug 1, 2016

@eddyb
O, so it would.

@erickt

This comment has been minimized.

Show comment
Hide comment
@erickt

erickt Aug 1, 2016

@camlorn: stateful doesn't do something like yield from We could implement some basic sugar that converts yield_from!(some_generator()) just generates for item in some_generator() { yield_!(item) }. So this ends up adding O(k) jumps to sub-iterators, where k is the depth of the generator stack. That should be the same cost if you implement these generators by hand where each generator is opaque.

I bet a sufficiently smart optimizer could inline these generators to get you down to O(1) jumps.

erickt commented Aug 1, 2016

@camlorn: stateful doesn't do something like yield from We could implement some basic sugar that converts yield_from!(some_generator()) just generates for item in some_generator() { yield_!(item) }. So this ends up adding O(k) jumps to sub-iterators, where k is the depth of the generator stack. That should be the same cost if you implement these generators by hand where each generator is opaque.

I bet a sufficiently smart optimizer could inline these generators to get you down to O(1) jumps.

@l0calh05t

This comment has been minimized.

Show comment
Hide comment
@l0calh05t

l0calh05t Aug 1, 2016

Considering that Rust is built on LLVM, have you been following the coroutine discussion on LLVM-Dev?

https://reviews.llvm.org/rL276513

l0calh05t commented Aug 1, 2016

Considering that Rust is built on LLVM, have you been following the coroutine discussion on LLVM-Dev?

https://reviews.llvm.org/rL276513

@camlorn

This comment has been minimized.

Show comment
Hide comment
@camlorn

camlorn Aug 1, 2016

@erickt
My question, sufficiently answered by @eddyb, was what happens in the case of that stack being infinite. You end up with iterators inside their own iterator structs.

camlorn commented Aug 1, 2016

@erickt
My question, sufficiently answered by @eddyb, was what happens in the case of that stack being infinite. You end up with iterators inside their own iterator structs.

@camlorn

This comment has been minimized.

Show comment
Hide comment
@camlorn

camlorn Aug 1, 2016

Sorry to double comment, but this page might be more useful as to LLVM 4.0 coroutines: http://llvm.org/docs/Coroutines.html

camlorn commented Aug 1, 2016

Sorry to double comment, but this page might be more useful as to LLVM 4.0 coroutines: http://llvm.org/docs/Coroutines.html

@erickt

This comment has been minimized.

Show comment
Hide comment
@erickt

erickt Aug 1, 2016

@l0calh05t / @camlorn: Yeah, I'm somewhat aware of llvm's proposal. As best as I can tell, under the covers they're effectively doing the same transformations that stateful is doing - compiling down functions into state machines and a reified stack to store stack variables. One thing that's slightly dodgy from my first glance is that they seem to require that the coroutine frame might need to be allocated, but that's not required for stateful (once we get impl trait).

Really it comes down where we'd get the best advantage. If LLVM implement some crazy optimizations that we don't want to duplicate, it might make sense to use LLVM's implementation. However, it might be faster, or integrate better with our type system and borrow checker if we lower this transformation on the Rust side. I'm not sure what'd be best.

erickt commented Aug 1, 2016

@l0calh05t / @camlorn: Yeah, I'm somewhat aware of llvm's proposal. As best as I can tell, under the covers they're effectively doing the same transformations that stateful is doing - compiling down functions into state machines and a reified stack to store stack variables. One thing that's slightly dodgy from my first glance is that they seem to require that the coroutine frame might need to be allocated, but that's not required for stateful (once we get impl trait).

Really it comes down where we'd get the best advantage. If LLVM implement some crazy optimizations that we don't want to duplicate, it might make sense to use LLVM's implementation. However, it might be faster, or integrate better with our type system and borrow checker if we lower this transformation on the Rust side. I'm not sure what'd be best.

@gnzlbg

This comment has been minimized.

Show comment
Hide comment
@gnzlbg

gnzlbg Aug 2, 2016

Contributor

One thing that's slightly dodgy from my first glance is that they seem to require that the coroutine frame might need to be allocated,

The proposal includes a "heap allocation elision" optimization (CoroElide) pass that will store the coroutine in its parent stackframe when possible. So if you only create such coroutines you never get heap allocations. It is also planed to allow this kind of optimizations across ABI boundaries.

If anything it could be worth it to chim in there and clarify a bit more under which conditions heap allocation might be necessary

Contributor

gnzlbg commented Aug 2, 2016

One thing that's slightly dodgy from my first glance is that they seem to require that the coroutine frame might need to be allocated,

The proposal includes a "heap allocation elision" optimization (CoroElide) pass that will store the coroutine in its parent stackframe when possible. So if you only create such coroutines you never get heap allocations. It is also planed to allow this kind of optimizations across ABI boundaries.

If anything it could be worth it to chim in there and clarify a bit more under which conditions heap allocation might be necessary

@DemiMarie

This comment has been minimized.

Show comment
Hide comment
@DemiMarie

DemiMarie Oct 3, 2016

@gnzlbg That only works for OS stuff if you can guarantee that the heap allocation will never happen (and get a compile-time error when it cannot be elided).

DemiMarie commented Oct 3, 2016

@gnzlbg That only works for OS stuff if you can guarantee that the heap allocation will never happen (and get a compile-time error when it cannot be elided).

@gnzlbg

This comment has been minimized.

Show comment
Hide comment
@gnzlbg

gnzlbg Oct 3, 2016

Contributor

Do we want to be able to put DSTs on a coroutines stack frame?(I want to)
If yes, can that be done without heap allocation in general?

Do we want to store coroutine stack frames in e.g. a Vec? (I want to) If
yes there must at least be a way to opt into heap allocation.

So while I agree it is useful to be able to ensure that in some cases
coroutines do not heap allocate, I want to be able to use them also in
those situations in which you might need to heap allocate.

On Monday, 3 October 2016, Demi Marie Obenour notifications@github.com
wrote:

@gnzlbg https://github.com/gnzlbg That only works for OS stuff if you
can guarantee that the heap allocation will never happen (and get a
compile-time error when it cannot be elided).


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#388 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AA3NpouTy411wikk-iObCrZ0xl61b3p0ks5qwGs-gaJpZM4CtyWG
.

Contributor

gnzlbg commented Oct 3, 2016

Do we want to be able to put DSTs on a coroutines stack frame?(I want to)
If yes, can that be done without heap allocation in general?

Do we want to store coroutine stack frames in e.g. a Vec? (I want to) If
yes there must at least be a way to opt into heap allocation.

So while I agree it is useful to be able to ensure that in some cases
coroutines do not heap allocate, I want to be able to use them also in
those situations in which you might need to heap allocate.

On Monday, 3 October 2016, Demi Marie Obenour notifications@github.com
wrote:

@gnzlbg https://github.com/gnzlbg That only works for OS stuff if you
can guarantee that the heap allocation will never happen (and get a
compile-time error when it cannot be elided).


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#388 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AA3NpouTy411wikk-iObCrZ0xl61b3p0ks5qwGs-gaJpZM4CtyWG
.

@camlorn

This comment has been minimized.

Show comment
Hide comment
@camlorn

camlorn Oct 4, 2016

@gnzlbg
You can always use Box for a dst. With the state machine transformation you can always box the generator and stick it in a Vec that way, should it be dynamically sized.

Boxing the generator is as efficient as having the generator be a pointer to a stack frame, as far as I'm aware. Boxing the dsts shouldn't be that much of an overhead either. Can you provide an example of code that puts a dst on the stack that you think might be both important and problematic?

To be honest, I find the points about avoiding allocation to be more convincing than otherwise; if you don't care about such things, use a higher level language with GC and whatnot and call it a day.

If being able to allocate dsts next to each other for cache friendliness is important, I can think of ways to write libraries that try to do this, as well. They are slightly less convenient than it just working, but maintain the same sorts of advantages.

camlorn commented Oct 4, 2016

@gnzlbg
You can always use Box for a dst. With the state machine transformation you can always box the generator and stick it in a Vec that way, should it be dynamically sized.

Boxing the generator is as efficient as having the generator be a pointer to a stack frame, as far as I'm aware. Boxing the dsts shouldn't be that much of an overhead either. Can you provide an example of code that puts a dst on the stack that you think might be both important and problematic?

To be honest, I find the points about avoiding allocation to be more convincing than otherwise; if you don't care about such things, use a higher level language with GC and whatnot and call it a day.

If being able to allocate dsts next to each other for cache friendliness is important, I can think of ways to write libraries that try to do this, as well. They are slightly less convenient than it just working, but maintain the same sorts of advantages.

@DemiMarie

This comment has been minimized.

Show comment
Hide comment
@DemiMarie

DemiMarie Oct 4, 2016

In kernel code, or hard real-time code, you might not be allowed to allocate

On Oct 4, 2016 17:09, "Austin Hicks" notifications@github.com wrote:

@gnzlbg https://github.com/gnzlbg
You can always use Box for a dst. With the state machine transformation
you can always box the generator and stick it in a Vec that way, should it
be dynamically sized.

Boxing the generator is as efficient as having the generator be a pointer
to a stack frame, as far as I'm aware. Boxing the dsts shouldn't be that
much of an overhead either. Can you provide an example of code that puts a
dst on the stack that you think might be both important and problematic?

To be honest, I find the points about avoiding allocation to be more
convincing than otherwise; if you don't care about such things, use a
higher level language with GC and whatnot and call it a day.

If being able to allocate dsts next to each other for cache friendliness
is important, I can think of ways to write libraries that try to do this,
as well. They are slightly less convenient than it just working, but
maintain the same sorts of advantages.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#388 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AGGWB6E3FGkrF-ufw9pa9iUlPqdoglP_ks5qwsCngaJpZM4CtyWG
.

DemiMarie commented Oct 4, 2016

In kernel code, or hard real-time code, you might not be allowed to allocate

On Oct 4, 2016 17:09, "Austin Hicks" notifications@github.com wrote:

@gnzlbg https://github.com/gnzlbg
You can always use Box for a dst. With the state machine transformation
you can always box the generator and stick it in a Vec that way, should it
be dynamically sized.

Boxing the generator is as efficient as having the generator be a pointer
to a stack frame, as far as I'm aware. Boxing the dsts shouldn't be that
much of an overhead either. Can you provide an example of code that puts a
dst on the stack that you think might be both important and problematic?

To be honest, I find the points about avoiding allocation to be more
convincing than otherwise; if you don't care about such things, use a
higher level language with GC and whatnot and call it a day.

If being able to allocate dsts next to each other for cache friendliness
is important, I can think of ways to write libraries that try to do this,
as well. They are slightly less convenient than it just working, but
maintain the same sorts of advantages.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#388 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AGGWB6E3FGkrF-ufw9pa9iUlPqdoglP_ks5qwsCngaJpZM4CtyWG
.

@gnzlbg

This comment has been minimized.

Show comment
Hide comment
@gnzlbg

gnzlbg Oct 4, 2016

Contributor

You can always use Box for a dst.
[...]
Boxing the dsts shouldn't be that much of an overhead either.

If I need to Box the generator I do not need to box the DST as well. Having to box both is not a zero-cost abstraction. I think this is bad.

If I cannot use DSTs inside a coroutine, you are essentially splitting the language into two parts, the part that can be used within a coroutine and the part that cannot be used within a coroutine. I think this is also bad.

To be honest, I find the points about avoiding allocation to be more convincing than otherwise

I think that avoiding allocation is very important but in case somebody has a coroutine that must allocate, that should be a zero-cost abstraction as well. I think it would be bad to have to come up with a completely different coroutine system to address this use case, and hence why I am raising this issue.


@DemiMarie

In kernel code, or hard real-time code, you might not be allowed to allocate

Then don't use language features that allocate memory (like box), libraries that allocate memory, or coroutines that allocate memory.

I am sure we can find ways of disabling coroutines that allocate memory from kernel code such that coroutines that do not allocate can be used safely.

Contributor

gnzlbg commented Oct 4, 2016

You can always use Box for a dst.
[...]
Boxing the dsts shouldn't be that much of an overhead either.

If I need to Box the generator I do not need to box the DST as well. Having to box both is not a zero-cost abstraction. I think this is bad.

If I cannot use DSTs inside a coroutine, you are essentially splitting the language into two parts, the part that can be used within a coroutine and the part that cannot be used within a coroutine. I think this is also bad.

To be honest, I find the points about avoiding allocation to be more convincing than otherwise

I think that avoiding allocation is very important but in case somebody has a coroutine that must allocate, that should be a zero-cost abstraction as well. I think it would be bad to have to come up with a completely different coroutine system to address this use case, and hence why I am raising this issue.


@DemiMarie

In kernel code, or hard real-time code, you might not be allowed to allocate

Then don't use language features that allocate memory (like box), libraries that allocate memory, or coroutines that allocate memory.

I am sure we can find ways of disabling coroutines that allocate memory from kernel code such that coroutines that do not allocate can be used safely.

@camlorn

This comment has been minimized.

Show comment
Hide comment
@camlorn

camlorn Oct 5, 2016

@DemiMarie @gnzlbg
The point about dsts may be pointless anyway, because according to the book you can't have one in a variable. I don't know for sure, but I can't figure out how you get one on the stack to start with and the internet is widely saying this isn't possible or at least very difficult in safe Rust. They don't make sense in stack variables because they don't have compile-time sizes. I'm basically certain you can't return them from functions directly. Etc. Can you show code that works now that you think might not work in a generator?

The thing is that someone has to know the size eventually. Even with Box, the thing that goes into Box::new is always sized. Assuming there is no counterexample, therefore, the generator has the same limitations as the stack anyway, and you can always convert it to a struct with a state field and a known size.

camlorn commented Oct 5, 2016

@DemiMarie @gnzlbg
The point about dsts may be pointless anyway, because according to the book you can't have one in a variable. I don't know for sure, but I can't figure out how you get one on the stack to start with and the internet is widely saying this isn't possible or at least very difficult in safe Rust. They don't make sense in stack variables because they don't have compile-time sizes. I'm basically certain you can't return them from functions directly. Etc. Can you show code that works now that you think might not work in a generator?

The thing is that someone has to know the size eventually. Even with Box, the thing that goes into Box::new is always sized. Assuming there is no counterexample, therefore, the generator has the same limitations as the stack anyway, and you can always convert it to a struct with a state field and a known size.

@Lucretiel

This comment has been minimized.

Show comment
Hide comment
@Lucretiel

Lucretiel Oct 5, 2016

I'm not understanding why you can't know the size of a generator/coroutine at compile time? Assuming it's non-recursive, of course. If it makes function calls, those can just live on the normal call stack since they'll only be invoked when control is inside the generator. If it invokes other generators/coroutines, those would just live in the static memory of the coroutine itself. Circular dependencies would be detected the same way they're detected in structs. The size of a coroutine would just be pessimistically the minimum size needed to hold all the variables in the local control flow stack, excluding function calls.

My understanding is it'd look something like this:

// Pseudocody
gen fn my_generator(flag: bool) -> i32 {
    let val1: i32 = 10;
    let val2: i32 = 20;
    if flag {
        let val3: i64 = 30;
        yield val1; yield val2; yield val3 as i32;
    } else {
        let val3: i16 = 30;
        yield val1; yield val2; yield val3;
    }
    // The stack space for this function call isn't part of my_generator's
    // storage, because it can only be invoked when control flow is here.
    let val4 = function_returning_i32();
    yield val4;

    // The storage required for this generator IS part of the storage, though.
    let mut sub_gen = my_sub_generator();
    for value in sub_gen {
        yield value;
    }
}

In this example, the storage required for this generator (before compiler optimizations, and ignoring alignments) would be:

i32 + i32 + max(i64, i16) + i32 + sub_gen + value

Plus a small, fixed amount of extra space to store where the last yield was so we can resume iteration.

So I don't understand the argument that we can't know the size of a generator at compile time. It's true that the size of a generator might be large at compile time, to account for deeply-nested but non-recursive generators, but that's no different than the deeply nested struct that would be required to implement the same pattern.

I've elided ownership & lifetime concerns here, but I don't see any obvious reasons in principle that it would be different than a lambda or a struct.

Lucretiel commented Oct 5, 2016

I'm not understanding why you can't know the size of a generator/coroutine at compile time? Assuming it's non-recursive, of course. If it makes function calls, those can just live on the normal call stack since they'll only be invoked when control is inside the generator. If it invokes other generators/coroutines, those would just live in the static memory of the coroutine itself. Circular dependencies would be detected the same way they're detected in structs. The size of a coroutine would just be pessimistically the minimum size needed to hold all the variables in the local control flow stack, excluding function calls.

My understanding is it'd look something like this:

// Pseudocody
gen fn my_generator(flag: bool) -> i32 {
    let val1: i32 = 10;
    let val2: i32 = 20;
    if flag {
        let val3: i64 = 30;
        yield val1; yield val2; yield val3 as i32;
    } else {
        let val3: i16 = 30;
        yield val1; yield val2; yield val3;
    }
    // The stack space for this function call isn't part of my_generator's
    // storage, because it can only be invoked when control flow is here.
    let val4 = function_returning_i32();
    yield val4;

    // The storage required for this generator IS part of the storage, though.
    let mut sub_gen = my_sub_generator();
    for value in sub_gen {
        yield value;
    }
}

In this example, the storage required for this generator (before compiler optimizations, and ignoring alignments) would be:

i32 + i32 + max(i64, i16) + i32 + sub_gen + value

Plus a small, fixed amount of extra space to store where the last yield was so we can resume iteration.

So I don't understand the argument that we can't know the size of a generator at compile time. It's true that the size of a generator might be large at compile time, to account for deeply-nested but non-recursive generators, but that's no different than the deeply nested struct that would be required to implement the same pattern.

I've elided ownership & lifetime concerns here, but I don't see any obvious reasons in principle that it would be different than a lambda or a struct.

@camlorn

This comment has been minimized.

Show comment
Hide comment
@camlorn

camlorn Oct 6, 2016

@Lucretiel
That's it, yes. That's why I'm kind of confused as well. The only thing which can break it is if you actually can get a DST on the stack somehow. Admittedly there's alloca (the Linux function, not the LLVM op), but other than that the size is known and the transformation is safe.

Now, the really fun part: work is going on to eliminate unused variables. I think your example might actually only have to include one of val1 through val3. At the very least, val4 shouldn't ever exist in the storage. SO the best case is even better than that, potentially by a whole lot in some cases.

I'm starting to think I should throw an RFC out which does what we can do and leaves room for streaming in future whenever we get HKT.

I'm also looking at LLVM, and probably it's best to do this as LLVM coroutines if we can. There's a pretty long todo list, however, including a few things which might be deal breakers (specifically alignment being ignored by the allocation/freeing intrinsics). This documentation isn't currently very good, but it looks like the pointer for the dynamically allocated frame is supplied by you, not just made by LLVM. So probably it could be worked out such that Rust generates a struct with a [u8; sizeof(coroutine's frame)] in it for that case,. Alternatively, maybe Rust would need a new kind of type internally to make this work.

camlorn commented Oct 6, 2016

@Lucretiel
That's it, yes. That's why I'm kind of confused as well. The only thing which can break it is if you actually can get a DST on the stack somehow. Admittedly there's alloca (the Linux function, not the LLVM op), but other than that the size is known and the transformation is safe.

Now, the really fun part: work is going on to eliminate unused variables. I think your example might actually only have to include one of val1 through val3. At the very least, val4 shouldn't ever exist in the storage. SO the best case is even better than that, potentially by a whole lot in some cases.

I'm starting to think I should throw an RFC out which does what we can do and leaves room for streaming in future whenever we get HKT.

I'm also looking at LLVM, and probably it's best to do this as LLVM coroutines if we can. There's a pretty long todo list, however, including a few things which might be deal breakers (specifically alignment being ignored by the allocation/freeing intrinsics). This documentation isn't currently very good, but it looks like the pointer for the dynamically allocated frame is supplied by you, not just made by LLVM. So probably it could be worked out such that Rust generates a struct with a [u8; sizeof(coroutine's frame)] in it for that case,. Alternatively, maybe Rust would need a new kind of type internally to make this work.

@camlorn

This comment has been minimized.

Show comment
Hide comment
@camlorn

camlorn Oct 6, 2016

Actually I take that back. I bet you can't move the frame once you allocate it, which would sadly make this heap only in some cases. The only saving grace is that you can tell beforehand.

camlorn commented Oct 6, 2016

Actually I take that back. I bet you can't move the frame once you allocate it, which would sadly make this heap only in some cases. The only saving grace is that you can tell beforehand.

@Lucretiel

This comment has been minimized.

Show comment
Hide comment
@Lucretiel

Lucretiel Oct 6, 2016

I bet you can't move the frame once you allocate it

Sure, because of dangling pointers, but that's how everything in Rust is, right? Like, that's the whole point of the ownership / borrow / mutable borrow system. It's no different than being unable to move a stack-allocated struct once you've allocated it on the stack; you can move it by bitwise copy, but only when there are no living borrows of that struct in existence. The point is that a generator "frame" is essentially the same thing as a struct, in terms of ownership, lifetime, and sizing constraints.

Lucretiel commented Oct 6, 2016

I bet you can't move the frame once you allocate it

Sure, because of dangling pointers, but that's how everything in Rust is, right? Like, that's the whole point of the ownership / borrow / mutable borrow system. It's no different than being unable to move a stack-allocated struct once you've allocated it on the stack; you can move it by bitwise copy, but only when there are no living borrows of that struct in existence. The point is that a generator "frame" is essentially the same thing as a struct, in terms of ownership, lifetime, and sizing constraints.

@Lucretiel

This comment has been minimized.

Show comment
Hide comment
@Lucretiel

Lucretiel Oct 6, 2016

The only problem I can see is that there might be some interior references, like this:

gen fn my_gen() -> int {
    let i = 10;
    let ri = &i;
    ...
}

Which would make it impossible to move the struct in a way that the borrow will accept (again, before compiler optimizations or a potentially more relaxed borrow checker).

But it's still essentially syntax sugar over a struct. The challenge will be communicating borrow checker violations to the user in a useful way, but I don't see any reasons in principle that it's different than a struct frame.

Lucretiel commented Oct 6, 2016

The only problem I can see is that there might be some interior references, like this:

gen fn my_gen() -> int {
    let i = 10;
    let ri = &i;
    ...
}

Which would make it impossible to move the struct in a way that the borrow will accept (again, before compiler optimizations or a potentially more relaxed borrow checker).

But it's still essentially syntax sugar over a struct. The challenge will be communicating borrow checker violations to the user in a useful way, but I don't see any reasons in principle that it's different than a struct frame.

@DemiMarie

This comment has been minimized.

Show comment
Hide comment
@DemiMarie

DemiMarie Oct 7, 2016

Yikes. Sounds like any borrows of stack allocated variables at a yield
site would cause the transformation to fail.

Obviously, this would be a major usability problem.

On Oct 6, 2016 3:35 PM, "Lucretiel" notifications@github.com wrote:

The only problem I can see is that there might be some interior
references, like this:

gen fn my_gen() -> int {
let i = 10;
let ri = &i;
...
}

But it's still essentially syntax sugar over a struct. The challenge will
be communicating borrow checker violations to the user in a useful way, but
I don't see any reasons in principle that it's different than a struct
frame.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#388 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AGGWB-aTdDU9dwEaVO7ctUSTXUWn0vyeks5qxU2VgaJpZM4CtyWG
.

DemiMarie commented Oct 7, 2016

Yikes. Sounds like any borrows of stack allocated variables at a yield
site would cause the transformation to fail.

Obviously, this would be a major usability problem.

On Oct 6, 2016 3:35 PM, "Lucretiel" notifications@github.com wrote:

The only problem I can see is that there might be some interior
references, like this:

gen fn my_gen() -> int {
let i = 10;
let ri = &i;
...
}

But it's still essentially syntax sugar over a struct. The challenge will
be communicating borrow checker violations to the user in a useful way, but
I don't see any reasons in principle that it's different than a struct
frame.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#388 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AGGWB-aTdDU9dwEaVO7ctUSTXUWn0vyeks5qxU2VgaJpZM4CtyWG
.

@camlorn

This comment has been minimized.

Show comment
Hide comment
@camlorn

camlorn Oct 7, 2016

I'm betting that you can't even move LLVM coroutines by bitwise copy safely. The implementation assumes that you allocate the pointer for it in some cases, but doesn't seem to allow you to set the pointer if it changes. I.e. if you use alloca for it either directly or indirectly it can only live as long as the stack frame that made it, even if it's a case wherein the struct would live longer. I.e. you maybe can't return a generator from a function using Rust's model if you use the LLVM coroutine intrinsics because it invalidates a pointer you can't set. Or store them in a struct that gets passed around, for that matter. This needs further investigation.

This is fixable if you just allocate the frame on the heap, but having Rust language features allocate implicitly is way against the language's philosophy, imho.

In terms of references, What doesn't work is specifically borrowing a variable on the "stack" of the generator and then yielding the reference. You can borrow them all you want, you just can't yield the reference. This is the streaming iterator problem, and the latest RFC is #1598.

As for borrow checker errors, you do that first as well. If the coroutine tries to yield anything with a lifetime less than the return type of the coroutine, you error. And then you make the lifetime elision rules apply like it's a function. I.e. you're allowed to yield references to/from any argument to the generator, but not from inside the generator.

In general, you do any safety checks before lowering the generator to a struct and iterator implementation.

We might need to wait for #1598, though, if only because hitting the non-streaming case and the streaming case at the same time might be necessary. I sort of favor dedicated syntax for a streaming generator: we can implicitly detect that it needs to be streaming but the difference would be one tiny change in the code, anywhere at all. This would lead to cryptic errors that are hard to fix.

camlorn commented Oct 7, 2016

I'm betting that you can't even move LLVM coroutines by bitwise copy safely. The implementation assumes that you allocate the pointer for it in some cases, but doesn't seem to allow you to set the pointer if it changes. I.e. if you use alloca for it either directly or indirectly it can only live as long as the stack frame that made it, even if it's a case wherein the struct would live longer. I.e. you maybe can't return a generator from a function using Rust's model if you use the LLVM coroutine intrinsics because it invalidates a pointer you can't set. Or store them in a struct that gets passed around, for that matter. This needs further investigation.

This is fixable if you just allocate the frame on the heap, but having Rust language features allocate implicitly is way against the language's philosophy, imho.

In terms of references, What doesn't work is specifically borrowing a variable on the "stack" of the generator and then yielding the reference. You can borrow them all you want, you just can't yield the reference. This is the streaming iterator problem, and the latest RFC is #1598.

As for borrow checker errors, you do that first as well. If the coroutine tries to yield anything with a lifetime less than the return type of the coroutine, you error. And then you make the lifetime elision rules apply like it's a function. I.e. you're allowed to yield references to/from any argument to the generator, but not from inside the generator.

In general, you do any safety checks before lowering the generator to a struct and iterator implementation.

We might need to wait for #1598, though, if only because hitting the non-streaming case and the streaming case at the same time might be necessary. I sort of favor dedicated syntax for a streaming generator: we can implicitly detect that it needs to be streaming but the difference would be one tiny change in the code, anywhere at all. This would lead to cryptic errors that are hard to fix.

@gnzlbg

This comment has been minimized.

Show comment
Hide comment
@gnzlbg

gnzlbg Oct 10, 2016

Contributor

offtopic: The cppcon2016 video about LLVM coroutines is up: https://www.youtube.com/watch?v=8C8NnE1Dg4A

Contributor

gnzlbg commented Oct 10, 2016

offtopic: The cppcon2016 video about LLVM coroutines is up: https://www.youtube.com/watch?v=8C8NnE1Dg4A

@Lisoph

This comment has been minimized.

Show comment
Hide comment
@Lisoph

Lisoph May 23, 2017

I want to breathe some fresh air into this RFC, with my yield function to state machine (as an iterator) concept. Gist. It's not very good - it was quickly hacked together, but it actually compiles - you can play around with it!

Explanation: The commented-out function foo at the top is our yielding function / coroutine. All the other code below, is what could be generated by the compiler. It's similar to what Roslyn (C#) would do, I think. The return type of the coroutine becomes the data type of the values which the coroutine yields.
FooStack is a container for all the stack variables declared in the coroutine.
YieldsFnFoo is the actual iterator generated by the compiler.
The main function shows how the coroutine can be called.


I didn't follow this RFC much, but it appears that the current discussion is mostly about LLVM's built in coroutine support. I guess my concept could be considered an opposite to LLVM coroutines, since my concept could be implemented at a very high compiler level, possibly even procedural macros.

Feedback very welcome!

Lisoph commented May 23, 2017

I want to breathe some fresh air into this RFC, with my yield function to state machine (as an iterator) concept. Gist. It's not very good - it was quickly hacked together, but it actually compiles - you can play around with it!

Explanation: The commented-out function foo at the top is our yielding function / coroutine. All the other code below, is what could be generated by the compiler. It's similar to what Roslyn (C#) would do, I think. The return type of the coroutine becomes the data type of the values which the coroutine yields.
FooStack is a container for all the stack variables declared in the coroutine.
YieldsFnFoo is the actual iterator generated by the compiler.
The main function shows how the coroutine can be called.


I didn't follow this RFC much, but it appears that the current discussion is mostly about LLVM's built in coroutine support. I guess my concept could be considered an opposite to LLVM coroutines, since my concept could be implemented at a very high compiler level, possibly even procedural macros.

Feedback very welcome!

@mbrubeck

This comment has been minimized.

Show comment
Hide comment
@mbrubeck

mbrubeck Sep 28, 2017

Contributor

Experimental RFC #2033 has been accepted and implemented in nightly.

Contributor

mbrubeck commented Sep 28, 2017

Experimental RFC #2033 has been accepted and implemented in nightly.

@Centril Centril added the T-lang label Feb 23, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment