New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: stabilize `std::task` and `std::future::Future` #2592

Open
wants to merge 16 commits into
base: master
from

Conversation

Projects
None yet
@aturon
Member

aturon commented Nov 10, 2018

This RFC proposes to stabilize the library component for the first-class async/await syntax. In particular, it would stabilize:

  • All APIs of the std-level task system, i.e. std::task::*.
  • The core Future API, i.e. core::future::Future and std::future::Future.

It does not propose to stabilize any of the async/await syntax itself, which will be proposed in a separate step. It also does not cover stabilization of the Pin APIs, which has already been proposed elsewhere.

This is a revised and significantly slimmed down version of the earlier futures RFC, which was postponed until more experience was gained on nightly.

Rendered

RFC status

The following need to be addressed prior to stabilization:

  • Detailed experience reports
  • Improved API documentation
  • Finalizing module structure (possibly nesting within a single top-level module)
  • Fuller rationale/improvements around wakeup APIs
  • Establish clear plan of record for task locals

@aturon aturon added the T-libs label Nov 10, 2018

@aturon

This comment has been minimized.

Member

aturon commented Nov 10, 2018

cc @rust-lang/lang -- I haven't tagged this as T-lang, since the Lang Team already approved the async/await RFC and this is "just" about library APIs. But of course y'all should feel free to weigh in.

cc @Nemo157 @MajorBreakfast @tinaun @carllerche @seanmonstar @olix0r

@aturon

This comment has been minimized.

Member

aturon commented Nov 10, 2018

@cramertj can you comment on UnsafeWake and whether waiting to stabilize that piece looks problematic?

@aturon

This comment has been minimized.

Member

aturon commented Nov 10, 2018

cc @rust-lang/libs, please take a look!

@aturon

This comment has been minimized.

Member

aturon commented Nov 10, 2018

@cramertj I only briefly mentioned Fuchsia in the RFC, but it might be helpful for you/your team to leave some commentary here about your experience with the various iterations of the futures APIs.

@Ixrec

This comment has been minimized.

Contributor

Ixrec commented Nov 10, 2018

Since then, the syntax, the std APIs, and the futures 0.3 crate have all evolved in tandem as we've gained experience with the APIs. A major driver in this experience has been Google's Fuchsia project, which is using all of these features at large scale in an operating system setting.

Although this isn't strictly relevant to the technical merits of the proposed APIs, considering the sheer scope and history of what we're talking about adding to std it seems worth asking: Are there any blog posts discussing Fuchsia's experience in more detail? This is the only part of the historical context I was completely unaware of, and I couldn't find any place that talks about it.

EDIT: I swear I started typing this before aturon's last comment 😅

@carllerche

This comment has been minimized.

Member

carllerche commented Nov 10, 2018

Thanks for putting this together.

My experience with the proposed Future trait is that it makes sense as an infrastructure level detail. In a world where the vast majority of async code is written with async / await, the trait makes sense. However, async / await is not there yet. In my experimentation to port existing code to use async / await, I hit limitations pretty quickly. Specifically, as far as I could tell, it is not possible to use a transport properly without extra allocation (due to split).

If the proposed Future is expected to be used be implemented by hand significantly, then moving to it is a net negative. The RFC mentions the ergonomic hit.

Because of this, my plan for Tokio will be to stick on futures 0.1 until async / await has reached a point where implementing Future by hand is no longer required for the user.

Also, most of Tokio would require the UnsafeWake trait, so even with the proposed stabilization, Tokio would not be able to migrate to it.

I understand the desire to drive progress forward. As I said, as far as I can tell, the proposed Future trait is good as an implementation detail of async / await. The ecosystem split will be unfortunate, but perhaps there is no other way.

Edit: I should clarify, Tokio will add support for async / await as it stabilizes, but it will not be considered the primary abstraction until it is able to handle all the cases.

@prasannavl prasannavl referenced this pull request Nov 10, 2018

Open

Future 0.3 #543

@aturon

This comment has been minimized.

Member

aturon commented Nov 10, 2018

@carllerche You and I have talked about this a bunch on other channels, so I'll be repeating myself, but I want to write a response here so that everyone else is on the same page as well.

There are indeed limitations with async/await today, due not so much to the feature itself as the lack of impl-trait-in-traits (or existential types) working sufficiently well (as well as, ultimately, GATs). They limit the ability to move foundational libraries to use async/await internally, and that's part of the reason we're not ready to stabilize the syntax itself yet. However, to be clear, none of these limitations connect to the Future or task APIs. (And, of course, we also have large-scale usage of these APIs and the current async/await mechanism in Fuchsia to draw from; the limitations largely apply to highly generic code.)

The 0.1/0.3 compatibility system, which allows for fine-grained/incremental migration, ends up doing a lot to lower the stakes. For example, it's already fairly painless to write code for hyper using async fn, and with a little bit more polish it could feel completely "first-class". So, it seems fine for some low-level libraries to stick with the 0.1 model until the above limitations are sufficiently lifted.

I think everything else you raise is discussed in the RFC as well, so I don't have more to add there!

@nrc

This comment has been minimized.

Member

nrc commented Nov 11, 2018

What’s the rationale for having both task and future modules? Since future only includes Future, it seems that having two modules doesn’t pull its weight. Are we expecting to move a bunch of future stuff into std in the future?

@Redrield

This comment has been minimized.

Redrield commented Nov 11, 2018

@nrc I'm not sure but perhaps the Iterator-like adapters talked about would be merged into that module once they make their way into std?

@ivandardi

This comment has been minimized.

ivandardi commented Nov 11, 2018

If Future is a trait then why isn't this example in the RFC

async fn read_frame(socket: &TcpStream) -> Result<Frame, io::Error> { ... }

written as

async fn read_frame(socket: &TcpStream) -> impl Result<Frame, io::Error> { ... }

?

@rpjohnst

This comment has been minimized.

rpjohnst commented Nov 11, 2018

A small bikeshed: Waker and LocalWaker might be better as SyncWaker and Waker, with the un-marked name being the local one by analogy to Arc and Rc. This makes the poll method signature a bit more straightforward to read and aligns with existing naming convention. I also think there might be a better and more noun-y name than Waker- maybe Task/task::Handle/etc, similar to 0.1?

I'm not sure I totally understand all the layers of &LocalWaker, NonNull<dyn UnsafeWake>, &Arc<T> where T: Wake, etc. and it seems like it might still be in flux, but exactly how many layers of indirection do we have here? Ideally there would be one (as if poll took Arc<dyn Wake> or equivalent), though I can sort-of see why there might need to be two (as if poll took &Arc<dyn Wake>), but it almost looks like there are three?

onto a single operating system thread.

To perform this cooperative scheduling we use a technique sometimes referred to
as a "trampoline". When a task would otherwise need to block waiting for some

This comment has been minimized.

@glaebhoerl

glaebhoerl Nov 11, 2018

Contributor

Is this the same "trampoline" concept which is used in the context of avoiding stack overflows for recursive calls?

This comment has been minimized.

@aturon

aturon Nov 11, 2018

Member

Yep!

@Nemo157

This comment has been minimized.

Contributor

Nemo157 commented Nov 11, 2018

@ivandardi that's related to the async/await syntax rather than the library code provided here. The design as I understand it is that async fn itself introduces the unnamable type and so entirely encompasses and hides the impl Future part of the transformed function signature. If you want to discuss further the async/await tracking issue would be the more appropriate location.

@aturon

This comment has been minimized.

Member

aturon commented Nov 11, 2018

@nrc

What’s the rationale for having both task and future modules? Since future only includes Future, it seems that having two modules doesn’t pull its weight. Are we expecting to move a bunch of future stuff into std in the future?

Yes, both modules are expected to grow substantially over time. The futures crate contains a similar module hierarchy with a much richer set of APIs. In addition, there will eventually be a stream module. Generally in std we tend to have a pretty flat set of top-level modules supporting various types, e.g. std::option and std::result.

@jethrogb jethrogb referenced this pull request Nov 11, 2018

Open

Tracking issue for async/await (RFC 2394) #50547

1 of 10 tasks complete
an API with greater flexibility for the cases where `Arc` is problematic.

In general async values are not coupled to any particular executor, so we use trait
objects to handle waking. These come in two forms: `Waker` for the general case, and

This comment has been minimized.

@jethrogb

jethrogb Nov 11, 2018

Contributor

These don't look like trait objects to me.

This comment has been minimized.

@aturon

aturon Nov 11, 2018

Member

They're trait objects internally.

Task execution always happens in the context of a `LocalWaker` that can be used to
wake the task up locally, or converted into a `Waker` that can be sent to other threads.

It's possible to construct a `Waker` using `From<Arc<dyn Wake>>`.

This comment has been minimized.

@jethrogb

jethrogb Nov 11, 2018

Contributor

Not right now, because dyn Wake is not Wake.

This comment has been minimized.

@aturon

aturon Nov 11, 2018

Member

Should be Arc<impl Wake> I suppose.

/// - [`Poll::Ready(val)`] with the result `val` of this future if it
/// finished successfully.
///
/// Once a future has finished, clients should not `poll` it again.

This comment has been minimized.

@jethrogb

jethrogb Nov 11, 2018

Contributor

No behavior is specified for when clients do do that. I think we should say something. For example, "implementors may panic".

This comment has been minimized.

@Nemo157

Nemo157 Nov 11, 2018

Contributor

I think we could even go a bit stronger than that, “implementors should panic, but clients may not rely on this”. All async fn futures guarantee this, and I believe so do the current futures 0.3 adaptors.

I think it would also be good to mention that calling poll again must not cause memory unsafety. The current mention that it can do anything at all makes it seem like it is allowed to have undefined behaviour, but since this is not an unsafe fn the implementer cannot rely on the client’s behaviour for memory safety purposes.

When a task returns `Poll::Ready`, the executor knows the task has completed and
can be dropped.

### Waking up

This comment has been minimized.

@jethrogb

jethrogb Nov 11, 2018

Contributor

It's unclear at first glance which of the code blocks starting with trait Wake/struct ExecutorInner/struct Waker are proposed for stabilization.

@brain0

This comment has been minimized.

brain0 commented Nov 11, 2018

(Probably none of you know me, yet I'd still like to offer my opinion, if that is appropriate.)

It is my understanding that the purpose of having an unstable API is that the ecosystem can experiment with it to ultimately avoid stabilizing a bad API. I don't see that this has happened here. Tokio has created a shim that essentially wraps an "std future" into a "0.1 future" with the only purpose of allowing async/await style futures. Apart from that, I haven't seen any experimentation with the std::future API. If tokio (as indicated above) is not even planning to use the new API instead of the old futures 0.1 API, then stabilizing it as-is will IMO be very bad for the ecosystem.

The situation for std::task is worse: From what I can see, it hasn't been used at all. The tokio shim merely provides a noop waker to satisfy std::future's poll signature, but that waker cannot be used and even panics when you try to. I'd like to see any implementation that actually uses std::task - I've been following TWIR all year and haven't found anything. I cannot see that there are comprehensive examples in the docs for std::task, or any reference implementations that show how the system is meant to be used as a whole.

My information is probably incomplete, so please tell me if I am missing anything.

As a side note, I started implementing an "as simple as possible" task executor based on the std APIs, just to understand them and play with them. I found the std::task stuff really complicated, and quickly realized that it still couldn't do everything I needed - most importantly, I needed to access some of the internal data in my Wake implementation, but this was not possible with LocalWaker. I would have to resort to storing information in thread-locals again, which defies the purpose of having the waker passed as an argument to poll.

@steveklabnik

This comment has been minimized.

Member

steveklabnik commented Nov 11, 2018

@brain0

This comment has been minimized.

brain0 commented Nov 11, 2018

Right, sorry, I must have overlooked that, it's even mentioned in the RFC. Is that stuff open source? I'd love to look at it.

@krircc

This comment has been minimized.

@brain0

This comment has been minimized.

brain0 commented Nov 11, 2018

Someone on reddit found this: https://fuchsia.googlesource.com/garnet/+/master/public/rust/fuchsia-async/src/ (executor.rs is interesting, for example).

@aturon

This comment has been minimized.

Member

aturon commented Nov 11, 2018

@brain0 after the weekend, I expect that @cramertj (or others from the Fuchsia team) will write in with more extensive detail about their experiences.

The RFC also went to some length to lay out the historical context. These APIs have seen plenty of use, both on top of Tokio/Hyper (through various shims) in e.g. web framework code, in embedded settings, and in custom operating systems (Fuchsia).

Could you spell out your concern re: the task system? It'd be helpful to keep discussion focused on specifics if possible.

@brain0

This comment has been minimized.

brain0 commented Nov 11, 2018

@aturon First things first: Last weekend, I decided to try out the std::task and std::future system by implementing the simplest task executor I could think of, then combine that with mio. Of course, the result would not be as feature-rich or performant as tokio, but it would demonstrate the new APIs.

There were lots of little details that felt "weird" about the task system:

  • When polling a future, you need a LocalWaker. From the documentation, it takes quite a while to figure out how to actually construct a LocalWaker using only safe code. The only way to do so is the local_waker_from_nonlocal function. I would have expected an inherent method on LocalWaker. But when you read the LocalWaker docs, you only find unsafe fn new(inner: NonNull<dyn UnsafeWake + 'static>). The whole API feels centered around the unsafe features and the safe features are "second class".

  • What we all love about Rust is how its type system ensures thread safety. But somehow, when creating a LocalWaker, you either need to call an unsafe function (local_waker) or construct a waker that does not take advantage of being a local waker (local_waker_from_nonlocal). I don't know if there is a better way, but this whole API feels un-rust-y.
    What could be done here is have two traits LocalWake and Wake (where LocalWake is not necessarily Send + Sync) and create a LocalWaker from Rc<dyn LocalWake>. This LocalWaker could then be transformed to a Waker using a method on LocalWake that returns Arc<dyn Wake>. Of course this would be more code and probably has other downsides and issues. Still, it would feel more like Rust, since the compiler verifies thread safety for us again. (Btw, fuchsia doesn't implement wake_local() either, is there any project that actually implements wake_local() to optimize local task wakeups compared to wake()?)

  • Lack of module-level documentation: Usually in Rust, the module-level documentation explains the "big picture" in quite some detail. The std::task documentation is "Types and Traits for working with asynchronous tasks." Only reading the API documentation (starting with LocalWaker, because that's what I needed for poll) didn't help me that much. It's clearly obvious to the people who designed the system, but not so for everyone else. I understand that std::task is new, but I imagine such documentation should exist before stabilization, not after. Due to the lack of documentation, I looked for projects that actually implemented this API and didn't find any (since I did not find the fuchsia executor). Now that I read some of the fuchsia code, I think I put the pieces together correctly.

  • Related to the first and third points above: I tried to create a waker using only safe code. As it turns out, the only way to do that is to put a Wake implementation into an Arc. Inside the Wake implementation, I need two things: A way of identifying the task that should be woken and some kind of reference to the executor itself. The latter has to be wrapped inside an Arc again, so you effectively get something like Arc<(Identifier, Arc<Executor>)>. This felt wrong at first, since it was atypical of Rust to force me to nest two Arcs. A small example in the docs would have helped here. Reading the fuchsia code earlier today showed me that I was not entirely wrong, since they're doing the same thing.

  • I looked into integrating std::task with the mio crate to do some basic networking to see if my (naive) executor would even work. In mio, to identify which Evented received an event, you get a Token (a newtype around a usize). Of course, I could maintain a map Token->Waker. However, since the mio::Poll is tied to my executor, I thought it would be useful to just extract some internal identifier from my Wake implementation and use that as Token. However, I cannot downcast the Waker's internals to its original type to get that information. (Maybe that problem is mio's fault - mio::Poll could be generic on the Token type, so I could simply use my Waker as a token.)

Sorry for the rather verbose reply, I hope it still helped you understand my concerns.

@Centril

This comment has been minimized.

Contributor

Centril commented Nov 29, 2018

I've added T-lang to this RFC since some of the traits are more or less #[lang_item = ...]s and because how critical the API of Future etc. is for async fn.

@withoutboats withoutboats removed the T-lang label Nov 29, 2018

@withoutboats

This comment has been minimized.

Contributor

withoutboats commented Nov 29, 2018

@Centril I've removed the T-langtag, the corresponding lang RFC is the (merged) #2394

@cramertj

This comment has been minimized.

Member

cramertj commented Dec 7, 2018

For folks who weren't aware, there's more conversation ongoing in aturon#15 discussing the precise API of Waker/LocalWaker/Wake/UnsafeWake.

@gnzlbg

This comment has been minimized.

Contributor

gnzlbg commented Dec 14, 2018

MPI asynchronous APIs look like this:

extern {
    fn mpi_op(request: *mut MPI_Request); 
}

And one uses them like this:

// Allocate a request and pin it:
let mut request: MPI_Request = MPI_REQUEST_NULL;
pin_mut!(request); 

// Schedule an asynchronous operation:
unsafe { mpi_op(&mut request) }

extern { fn mpi_test(request: *mut MPI_Request); } 
extern { fn mpi_wait(request: *mut MPI_Request); } 

// poll:
if unsafe { mpi_test(&mut request) } { .. done .. }

.. do something ...

// poll
if unsafe { mpi_test(&mut request) } { .. done .. }

.. do something ..

// block
unsafe { mpi_wait(&mut request) }

That is, MPI notifies task completion by writing the task status to some memory into the user process. This can happen synchronously, e.g., in mpi_test MPI executes, and updates the status. This can also happen asynchronously, e.g., MPI has a thread running in the background, or a DMA transfer writes to the request, etc.

What is the best way to map APIs like these to Futures ?

I find mapping this API to Waker/LocalWaker weird.

I could just implement the Futures such that they just call Waker::wake() directly before returning Poll::Pending, but that can result in the scheduler re-scheduling the same task before trying to poll other tasks, which is bad. Even if it that is the case, such an executor would just loop using 100% CPU trying to poll MPI futures continuously, so that isn't an option either.

I ended up completely ignoring the Waker. That is, I just return Poll::Pending, and in a normal executor, my tasks will never get repolled and never finish.

What I currently do is poll the futures manually to completion. I probably will end up creating my own executor that:

  • if I spawn it in a different thread, has a Waker (the executor itself) that triggers a re-poll of all pending tasks, not only those tasks that are ready.
  • if I use it inside my main thread (most likely), just has different poll methods that allow me to also poll pending but not ready tasks.

Either way, ignoring the Waker makes these futures unusable with any other executor. Is there a better way to do this? If not, I find it weird to have to deal with a Waker at all in Futures that have no use for it (I currently create a dummy waker just to comply with the current api).


The PoC experiment is here: https://github.com/gnzlbg/ampi

@Matthias247

This comment has been minimized.

Matthias247 commented Dec 14, 2018

@gnzlbg This should maybe be discussed elsewhere, since it's an application specific problem. However since it also sheds some light on what Futures are capable of and on what not, I will try to provide some insight here:

The short version: The MPI API you present doesn't really seem suitable for being wrapped in Futures in a good fashion,. What is missing is the part where the library notifies that an operation has finished (e.g. an even, completion queue, selector, etc).

The really only thing you can do is spinning up a second thread which monitors MPI operations, and calls mpi_test() on each outstanding operation in regular intervals, and if it's ready notifies the original thread. However that at the end is polling, and will either not work very well for high performance scenarios, or drain the CPU of the remaining application. It might also totally be possible that those APIs are not thread-safe, and it isn't even allowed to call mpi_test from another thread than the one which created the operation.
If MPI would provide an API like fn mpi_wait_for_completed_requests(requests: *mut MPI_Request, requests_len: *usize); that dequeues completed requests, one could run this in the background thread.

Now this is part1, however you might run into even more challenges: Rust Futures require that synchronous cancellation is required for each operation. If the Future is dropped the operation must be stopped, or at least there must be no negative side effects on the application. That again means you can't store the MPI data in the Future, because if the Future gets dropped MPI in the background would work on invalid data, and there is no API to cancel or stop that. In general MPI seems to be falling into the category of IO completion based operations (in contrast to readiness based ones), and Futures at least don't allow to abstract those in zero-cost fashion. I've written a bit more about this here: rust-lang-nursery/futures-rs#1278

A solution for this can be to create some kind of MpiManager component, that manages all outstanding operations. In order to start an operation, the ownership of all data that is part of the request gets passed to this, so that a Future that gets returned by the operation does not hold any state that is critical for memory safety. If the Future gets dropped, the operation might still continue to run in the background until it fully finished. One has to prevent MpiManager from getting destructed as long as any operations are outstanding. One can memory/object pool pending operations in MpiManager, in order to at least reduce some of the required allocations.

Since this is all a lot of hassle, the best solution for integrating those APIs into async code might be to just execute the operations in a synchronous fashion (with mpi_wait()) inside a threadpool.
Things could be improved by building a new MPI library, either directly in Rust based on Futures, or at least one which provides the ability to wait for multiple completed requests. I guess MPI might have a specification, and it's possible to build a different implementation for it.

@gnzlbg

This comment has been minimized.

Contributor

gnzlbg commented Dec 14, 2018

The short version: The MPI API you present doesn't really seem suitable for being wrapped in Futures in a good fashion,. What is missing is the part where the library notifies that an operation has finished (e.g. an even, completion queue, selector, etc).

Yes, this is certainly the feeling I was getting, which was resulting in some frustration.


It might also totally be possible that those APIs are not thread-safe,

Generally no, these APIs are not thread safe. One can request thread-safety on initialization, which adds some synchronization overhead, and one can query the "level" of thread-safety available for the current process. See MPI_Init_thread:

The valid values for the level of thread support are:

  • MPI_THREAD_SINGLE: Only one thread will execute.
  • MPI_THREAD_FUNNELED: The process may be multi-threaded, but only the main thread will make MPI calls (all MPI calls are funneled to the main thread).
  • MPI_THREAD_SERIALIZED: The process may be multi-threaded, and multiple threads may make MPI calls, but only one at a time: MPI calls are not made concurrently from two distinct threads (all MPI calls are serialized).
  • MPI_THREAD_MULTIPLE: Multiple threads may call MPI, with no restrictions.

If the Future is dropped the operation must be stopped, or at least there must be no negative side effects on the application.

There is an API for this: MPI_Cancel, I am not sure how well it works, but after it completes MPI should not access any of the resources on user code associated with that request anymore.


If MPI would provide an API like fn mpi_wait_for_completed_requests(requests: *mut MPI_Request, requests_len: *usize); that dequeues completed requests, one could run this in the background thread.

There is also an API for this: MPI_Waitall. There is also MPI_Waitany, etc.


Since this is all a lot of hassle, the best solution for integrating those APIs into async code might be to just execute the operations in a synchronous fashion (with mpi_wait()) inside a threadpool.

One of the main advantages of MPI is the ability to dispatch DMA or RMA operations and just let it happen while doing something else in the same single thread. If one can spawn multiple threads, then having a thread pool with a couple of threads, where one thread always schedule all MPI operations as quickly as possible, and the other threads just block on them, might definitely be an alternative, but this is often not desired.

(EDIT: I expect this to be easy to build on top of a zero-cost API, but a zero-cost API cannot be built on top of this).


Things could be improved by building a new MPI library, either directly in Rust based on Futures, or at least one which provides the ability to wait for multiple completed requests. I guess MPI might have a specification, and it's possible to build a different implementation for it.

I don't think this is a realistic option. While most MPI implementations nowadays rely at least partially on libfabric, often the only way to use some vendor-specific network hardware efficiently in a large cluster is to use the MPI implementation of its vendor, which aren't necessarily open source (although typically a fork of some open source one). This is why MPI is a specification and not an implementation, software pretty much always needs to be portable across different MPI implementations to be able to use the available hardware =/

@tmandry

This comment has been minimized.

tmandry commented Dec 14, 2018

MPI_Waitany is always thread-safe, per the linked docs. You may need a custom executor, but in principle it seems like MPI absolutely can be used with futures.

At a high level: Have a thread that does nothing but wait on all outstanding MPI operations with MPI_Waitany. MPI_Waitany will return the completed operation, and you can use that to drive the corresponding (thread-safe) Waker.

Adding new operations to be waited on is one tricky part. You may need a special MPI operation that you can manually "complete", to wake the thread up again once you've sent it a new operation. Then the thread can update the list of ops it passes to MPI_Waitany, and go back to sleep.

@Matthias247

This comment has been minimized.

Matthias247 commented Dec 14, 2018

@tmandry Yes, after seeing those APIs, running MPI_Waitany in a background thread is indeed what I would have recommended. As you said, adding new operations is difficult, if the thread is blocked on Waitany (won't pick them up anything new as long as no operation has finished). If it's ok that at operations finish sequentially, and that at least one needs to be finished before new ones get picked up it would work right away. On select/epoll/etcone would typically use a self-pipe or eventfd to wakeup a selector in order to let it pick up new work. Maybe something like this is also possible here, with making a mpi request to the same process.

@gnzlbg

This comment has been minimized.

Contributor

gnzlbg commented Dec 14, 2018

At a high level: Have a thread that

MPI lets you drive asynchronous execution without doing any memory allocations in a single-threaded process. The problem I don't know how to solve is how to offer a nice async fn/Future API over its API without adding overhead .

If the added overhead of boxing Futures, spawning multiple threads, using Arc/Mutex, etc. is acceptable, then we are on the same page: there are many many ways to use MPI that would work.

@Ralith

This comment has been minimized.

Ralith commented Dec 14, 2018

You should be able to write an executor that calls MPI_Waitany whenever there is no work to do, in the same way that tokio's current_thread runtime calls epoll whenever there is no work to do.

@withoutboats

This comment has been minimized.

Contributor

withoutboats commented Dec 19, 2018

This RFC has been updated to reflect the proposed changes made by @Matthias247 in aturon#15

The implementation of an executor schedules the tasks it owns in a cooperative
fashion. It is up to the implementation of an executor whether one or more
operation system threads are used for this, as well as how many tasks can be

This comment has been minimized.

@shepmaster

shepmaster Dec 19, 2018

Member

"operating system"

This comment has been minimized.

@ErichDonGubler

ErichDonGubler Dec 19, 2018

s/threads. *executor*s/threads. *Executor*s

?

marker traits, while `LocalWaker` doesn't. This means a `Waker` can be sent to
another thread and stored there in order to wake up the associated task later on,
while a `LocalWaker` cannot be sent. Depending on the capabilities of the underlying
executor a `LocalWaker` can be converted into a `Waker`. Most executors in the

This comment has been minimized.

@shepmaster

shepmaster Dec 19, 2018

Member

insert comma: "executor, a LocalWaker"

@Matthias247

This comment has been minimized.

Matthias247 commented Dec 19, 2018

Thanks @withoutboats for merging it.

I will copy my summary of open discussion points from the PR to here:

Here is a summary of open discussion points that I gathered from this thread and the original stabilization PR:

  • Naming of Wakers:
    We have the following options:
    • Stay with Waker and LocalWaker
    • Move to SyncWaker and Waker
    • Something else (e.g. Handle, TaskHandle, etc)
      I think all are fine and fairly descriptive. We should just choose for one. The Send prefix is relates to the associated trait. The Local prefix is known from things like thread-local storage. If nobody is concerned about churn in renaming things (@cramertj, @aturon, @Nemo157 ?) then SyncWaker and Waker might be closest to other naming conventions. If we go with these names the methods on ArcWake might also be required to be updated. Update: When I updated ArcWake it looked a bit weirder with the Send variant, since method names like wake_send() are less natural than wake_local().
  • Should LocalWakers be convertible into Wakers via conversion traits (e.g. From/TryFrom)?
    It's the idiomatic way to do conversions in Rust. However the conversion can panic for some waker types, and panics on implicit conversions might not be desired. Only implementing TryFrom does't make sense, 99.9% of all applications wouldn't face any conversion errors and adding .unwrap() everywhere just adds noise and is not better ergonomically than .into_waker().
  • Module organization:
    We currently have parts in core/std::task, and others in core/std::future, and for the pure std parts also things alloc.
    There was a strong request for a common top level module (at least for the core parts), where documentation can be moved. async is the most obvious, but not allowed.
    My recommendation would be task (or tasks), since it contains the building blocks for creating and running lightweight tasks.
    The building blocks are Wakers, Futures (as intermediate results of running those tasks), and the types that are required by them (e.g. Poll and RawWaker). I think those should be all directly within task. If at a later point more async compatible types are added, those could live in submodules like task::channel.
    The reasoning for having Future outside of task was that it might only be one async primitive, and others like Stream are equivalent. I however rather think that those should be redefined on top of Future (rust-lang-nursery/futures-rs#1365), since it's possible to do more kinds of things with Futures than Streams.
  • Discussions around ArcWake:
    • Include it into std at all or not? If we include it, with the associated conversion functions to Wakers or not?
    • Should the methods to create LocalWaker and Waker be on the trait itself, freestanding methods or on an extension trait. The latter two might have more discoverability issues, but also let the trait be more compact, and might leave more room for other conversion functions in the future if the find the currently defined ones not ideal.
    • The RFC currently proposes one into_local_waker() function, which directly creates a LocalWaker, where calling .wake() will call into it's wake_local() method.
      However the equivalent conversion methods in futures-rs has 2 methods:
      • One that creates a LocalWaker, where calling .wake() will call wake() on the ArcWake. This method is safe.
      • Another one that creates a LocalWaker, where calling .wake() will call wake_local() on the ArcWake. This method is defined as unsafe. I guess for the reasoning that this kind of Waker could be constructed in any code-path which has access to the ArcWake, which could be in any kind of thread, including ones that are not "local" to the associated executor. Strictly speaking this is more correct, even though it's unlikely that executor creators would use the functionality from non-executor threads.
        Having 2 methods raises naming questions (e.g. unsafe fn into_local_waker() -> LocalWaker and fn into_local_waker_with_nonlocal_wake() -> LocalWaker, or with the other naming fn into_waker_with_sync_wake() -> Waker and unsafe fn into_waker() -> Waker), and the associated discoverability and learnability questions.
        Things would definitely get easier if ArcWake would just contain a sync wake() method, but that leaves some performance on the table (although executors could still implement custom ArcWake -> LocalWaker conversion functions if they need that performance).
        I have some suspicion that splitting ArcWake into two traits (one with wake() and another one that inherits from the first and adds wake_local()) might clean things more up by at least only providing safe functions for ArcWake, but I haven't looked at it in detail.
    • futures-rs mostly doesn't create directly LocalWakers from ArcWake, but uses an intermediate structure called LocalWakerRef, that dereferences into a LocalWaker, but is a bit more efficient due to avoiding 2 atomic operations for running each task. Since many executors that make use of ArcWake might want to use this behavior, it raises the question whether it also should be included. However on the negative side, this adds more hard to understand APIs (LocalWakerRef and up to two functions to create it, similar to the ones for creating a general LocalWaker from an ArcWake as discussed above).
@Matthias247

This comment has been minimized.

Matthias247 commented Dec 19, 2018

And here was the latest idea for ArcWake - for the case we include it in std. I guess I would still be fine with having it either in futures or even a small crate that just improves for building wakers (e.g. waker-utils, similar to pin-utils).

If we split the trait into a version which is safe (wake() is implemented in a fashion where it can be safely called from any thread), and one which is less safe but more optimized (something along wake_local(), which will be called from the thread which creates the initial waker), things seem to get a bit more clear.
We would get an API along the following:

pub trait ArcWake: Send + Sync {
    fn wake(arc_self: &Arc<Self>);
    fn into_waker(wake: Arc<Self>) -> LocalWaker where Self: Sized;
    fn into_waker_ref(wake: &Arc<Self>) -> LocalWakerRef<'_> where Self: Sized;
}

pub trait ArcLocalWake: ArcWake {
    unsafe fn wake_local(arc_self: &Arc<Self>);
    unsafe fn into_waker_with_local_opt(wake: Arc<Self>) -> LocalWaker where Self: Sized;
    unsafe fn into_waker_ref_with_local_opt(wake: &Arc<Self>) -> LocalWakerRef<'_> where Self: Sized;
}

Users that want to hack together an executor in the simplest possible fashion can use ArcWake and it's associated methods, which is completely safe. Someone who wants to make use of an optimized wakeup implementation in the case where wake() is called before the LocalWaker is converted into a Waker can implement ArcLocalWake. The functions here are all are unsafe because for the same reasons the current equivalents in futures-rs are unsafe: If someone would create a waker that way from the non-executor thread, and pass it to a future, and that future would call wake() (which leads to wake_local), the requirement that wake_from_local might only be called from the executor thread is invalidated. I'm not 100% sure if all of those methods need to be unsafe, but that's maybe open for discussion (Creating a waker that way actually isn't unsafe - passing it to futures in certain codepath might be).

I'm still not 100% sure whether LocalWakerRef actually reflects in a good fashion what the type does without extensive documentation. It is a type that can be passed when a &LocalWaker is required, e.g. into poll, but which is cheaper to create from the Arc.
I thought about alternatives here and came up with LocalWakerBuilder, LocalWakerProxy, LocalWakerFactory, VirtualLocalWaker, LazyLocalWaker and WakerLocalPromise.

Out of those I found LazyLocalWaker or LazyArcLocalWaker or the existing [Arc]LocalWakerRef best.

defining `Waker`s is provided, which does not require implementing a `RawWaker`
and the associated vtable manually.

This convience method is based around the `ArcWake` trait. An implementor of

This comment has been minimized.

@OddCoincidence

OddCoincidence Dec 21, 2018

Suggested change Beta
This convience method is based around the `ArcWake` trait. An implementor of
This convenience method is based around the `ArcWake` trait. An implementor of
these requirements are fulfilled.

Since many of the ownership semantics that are required here can easily be met
through a reference-counted `Waker` implementation, a convienence method for

This comment has been minimized.

@OddCoincidence

OddCoincidence Dec 21, 2018

Suggested change Beta
through a reference-counted `Waker` implementation, a convienence method for
through a reference-counted `Waker` implementation, a convenience method for
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment