Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add the ability to spawn futures #679

Draft
wants to merge 1 commit into
base: master
from

Conversation

@cuviper
Copy link
Member

commented Aug 2, 2019

For Rust 1.36+ with std::future::Future, add a way to spawn tasks with
a returned Future. The task is immediately queued for the thread pool
to execute.

For Rust 1.36+ with `std::future::Future`, add a way to spawn tasks with
a returned `Future`. The task is immediately queued for the thread pool
to execute.
JobResult::Ok(x) => Poll::Ready(x),
JobResult::Panic(p) => {
drop(guard); // don't poison the lock
unwind::resume_unwinding(p);

This comment has been minimized.

Copy link
@cuviper

cuviper Aug 2, 2019

Author Member

This is usually how we propagate panics, but maybe the future should yield Result<T, ...> instead?

This comment has been minimized.

Copy link
@stjepang

stjepang Aug 4, 2019

Contributor

I think it would be best to let Rayon's panic_handler handle the actual panic, but also panic here with something like panic!("the spawned task has panicked") rather than resuming with the original one.

If one were to retrieve the result of a spawned task without using futures, they would probably create a channel and send the result through it. Then, if the task panics, the sender side of the channel gets dropped, thus disconnecting it. If one attempts to receive the result from the receiver side of the channel, the receiver.recv() call panics because the channel is disconnected.

So that way spawn_future would closely match the behavior of spawn + using a channel to retrieve the result.

Over the past few months I did a lot of exploration with panic propagation strategies in asynchronous contexts and talked to people about their use cases. In the end, I figured the generally best way of handling panics is to pass them to the panic handler and raise a new panic whenever the result of a failed task is polled.

This comment has been minimized.

Copy link
@cuviper

cuviper Aug 5, 2019

Author Member

I'd be OK with that too. I was thinking of Result as an analogy from std::thread::spawn + JoinHandle::join to rayon::spawn_future + Future::poll.

@cuviper

This comment has been minimized.

Copy link
Member Author

commented Aug 2, 2019

The CI failure is that my new inherent methods interrupted access to the extension trait methods in rayon-future's tests -- meh.

}
}

pub fn spawn_future<F, T>(func: F) -> impl Future<Output = T>

This comment has been minimized.

Copy link
@nikomatsakis

nikomatsakis Aug 26, 2019

Member

Hmm, I think I expected spawn_future to take a future as argument, so that you would do something like:

let future = rayon::spawn_future(async move {
    ...
});

where the returned future could then be awaited from other futures. This would be somewhat analagous to the spawn function from async.rs.

This comment has been minimized.

Copy link
@nikomatsakis

nikomatsakis Aug 26, 2019

Member

One thing I was wondering is if async-task could be useful to us here -- I still haven't fully grokked that crate. :)

This comment has been minimized.

Copy link
@cuviper

cuviper Aug 26, 2019

Author Member

I'm no expert here, but I think the difference is whether we want rayon to be a full executor, or just a new source of asynchronous events, and I was thinking more of the latter.

My intention was that you could still use tokio, async-std, or whatever with all of their abstractions working with file/network IO and such, and Rayon would just add something like an abstract CPU-IO. Other executors are usually latency-oriented, but Rayon is throughput-oriented with its greedy task stealing.

Anyway, I just tried async-task a bit, and it looks fine for the simple case:

pub fn spawn_future<F, T>(future: F) -> impl Future<Output = Option<T>>
where
    F: Future<Output = T> + Send + 'static,
    T: Send + 'static,
{
    let (task, handle) = async_task::spawn(future, |task| crate::spawn(|| task.run()), ());
    task.schedule();
    handle
}

It's OK for ThreadPool::spawn_future too, just tagged to a particular Registry. But we run into trouble with Scope, since async_task::spawn is all 'static. I don't think we can safely erase lifetimes here when we don't control the implementation behind it.

This comment has been minimized.

Copy link
@nikomatsakis

nikomatsakis Aug 30, 2019

Member

I don't think we can safely erase lifetimes here when we don't control the implementation behind it.

I think this was exactly the case where I introduced a bit of unsafety into the prior implementation.

This comment has been minimized.

Copy link
@cuviper

cuviper Aug 30, 2019

Author Member

Well yes, erasing lifetimes requires unsafe, but I meant that I'm not sure we can do that safely even in the "I know better than the compiler" sense. In the prior implementation, we kept complete control, which we wouldn't have under async-task.

This comment has been minimized.

Copy link
@nikomatsakis

nikomatsakis Sep 4, 2019

Member

OK, I see what you meant by this:

I don't think we can safely erase lifetimes here when we don't control the implementation behind it.

I'm not sure I totally agree, though, but definitely we would want to be very explicitly about what we are assuming and to have async-task commit (in a semver sense) to preserving those invariants.

@cuviper

This comment has been minimized.

Copy link
Member Author

commented Aug 29, 2019

@alexcrichton, @fitzgen, maybe you have some perspective from the WASM side? Would either of these signatures be better or worse for integrating with wasm/js futures?

  • spawn_future(impl FnOnce) -> impl Future
  • spawn_future(impl Future) -> impl Future

I suspect the latter might be problematic, having to wait for unknown input futures from rayon threads, since we can't really unwind the entire thread out of wasm.

But is the simpler FnOnce->Future feasible in that environment? e.g. Will it work for the rayon thread to call a Waker that ultimately notifies the javascript stuff?

@fitzgen

This comment has been minimized.

Copy link

commented Aug 30, 2019

I believe that should work and there aren't any technical restrictions on our side (although Alex knows more about our executor in a multithreaded context).

My one concern, unrelated to wasm, is using -> impl Future means that we can't put the result in a struct without boxing into a trait object. Is there a technical reason why it can't be a named type?

@nikomatsakis

This comment has been minimized.

Copy link
Member

commented Aug 30, 2019

@cuviper

Definitely rayon is "throughput optimized" and may not be the best choice to use for all your futures. But where I thought the impl Future -> impl Future signature could be useful is creating a DAG of computation nodes that are interdependent. Under this setup, one could e.g. have a matrix of tasks each task is able to .await other tasks. (Although I guess i'd be a bit of a pain to set it up -- interesting problem.)

In any case, I also don't think it causes a big problem if you block on I/O events in rayon. The future would simply not be in our thread-pools until the I/O event was "wakened". It's not like it would block in the normal sense. (You might not want to combine that with rayon::scope, I suppose.)

@cuviper

This comment has been minimized.

Copy link
Member Author

commented Aug 30, 2019

@fitzgen

My one concern, unrelated to wasm, is using -> impl Future means that we can't put the result in a struct without boxing into a trait object. Is there a technical reason why it can't be a named type?

Just that it's a smaller API surface to consider in this PR if we don't name it. But the type is simple enough that it shouldn't be a problem to return the concrete type.

@nikomatsakis

In any case, I also don't think it causes a big problem if you block on I/O events in rayon. The future would simply not be in our thread-pools until the I/O event was "wakened".

Yeah, so this is a case where I'm not certain how these things would work. I guess if the async work leading up to our part isn't ready, we'd just return NotReady too and let some outer executor deal with it? AIUI in async-task, we'd just end up queuing an underlying spawn for each step of the future's progress.

I brought up the WASM case because I thought there were some big caveats about notifying WASM threads about javascript futures. But maybe if the actual task suspension still looks like it happens outside of the threadpool, and we're respawning to get back in, it may work OK? Not sure.

We should probably play with some real examples before we commit to anything.

@alexcrichton

This comment has been minimized.

Copy link

commented Sep 3, 2019

As to the effect on wasm I think the main question here would be what extra APIs rayon needs from the host. For example in our demos we have a Pool structure which only supports a run method to run a closure on the pool, and then we use rayon's ThreadPoolBuilder to use that which allows us to schedule rayon's work on our own thread pool we manage.

If impl Future is the argument here then rayon would probably need to add another API along the lines of "please schedule this future work to happen on the rayon thread pool" or something like that? If not though then I don't think this really impacts wasm all that much! If you've got a demo of the API though I can poke around the code and see if anything looks like it'd break on wasm (or try to run the demo on wasm!)

@nikomatsakis

This comment has been minimized.

Copy link
Member

commented Sep 5, 2019

An update:

So we talked at some length to @alexcrichton and @fitzgen on Discord and came to the conclusion that "WASM interop is not a major issue here".

We left the meeting with the conclusion that I would spend a bit of time looking into what it would take to implement a spawn_future(impl Future) -> impl Future approach. I wanted to give a brief update on that.

(Side note: I know this will shock absolutely no one, but I think this is worth moving to an RFC. There are a number of smaller details concerning the APIs worth talking over and documenting. )

The set of APIs I think we want to support are something like this:

  • spawn a future into the global scope -- spawn_future(impl Future + 'static) -> impl Future
  • spawn a future into a Rayon scope -- spawn_future(impl Future + 'scope) -> impl Future

(And probably that impl Future in the return type is really going to be RayonFuture.)

Anyway, I dug into what an implementation would take. I think if we were to reimplement everything ourselves, the primary thing we need to create is a Waker. The idea would roughly be like this:

  • When you spawn a future, we create an Arc to represent that job
    • This Arc implements the Rayon Job trait, so it can enqueued in our thread-pools
    • This Arc also serves as the Waker
    • This Arc also serves as the resulting future
    • If launched inside of a Rayon scope, this Arc also starts out holding a ref-count on that scope, which prevents the scope from completing until the future has fully executed. This is what allows it to hide the 'scope lifetime.
  • The arc has an internal state machine that looks something like this:
    • Waiting(F) -- initial state. We have the future but are not enqueued in Rayon task pool.
    • Enqueued(F) -- enqueued in Rayon task pool.
    • Executing -- currently executing the future
    • PendingWake -- if a wake occurred while we were executing
    • Complete(F::Result) -- future completed
  • There are three ways to interact with this arc:
    • As a waker: when the "wake" method is called, we can be in any state. We transition as follows:
      • If Waiting, go to Enqueued state and enqueue ourselves into a Rayon thread-pool.
      • Otherwise, ignore.
    • As a Rayon job: when the rayon job executes, we must be in the enqueued state.
      • Move to Executing state and execute future.
      • If the result is "NotReady", move to the Waiting state, unless we were moved to PendingWake, in which case we can immediately re-execute (or re-enqueue ourselves)
      • If the result is Ready, move to the Complete state and store the result
    • As a future, we can be polled:
      • If not in the complete state, we return NotReady, and stash the waker
      • Else we return the result ("take"ing it)
      • Upon entering the complete state, we wake the waker (this actually occurs in the "job" code)

I started writing this code, but it turns out that the async-task crate basically implements all of this logic already, so it really makes sense I think to build on that.

The simplest integration (at the static level, and not "peak efficiency") just looks like this:

pub fn spawn_future<R>(future: impl Future<Output = R> + Send + 'static) -> impl Future<Output = R>
where
    R: Send + 'static,
{
    let (task, handle) = async_task::spawn(
        future,
        move |task| rayon_core::spawn(move || task.run()),
        (),
    );
    task.schedule();
    async move { handle.await.unwrap() }
}

I saw "not peak efficiency" because each call to rayon_core::spawn is going to create a new allocation, but the async_task::spawn already created one behind the scenes.

I discussed with @stjepang -- if async_task::spawn offered an API for converting from a async_task::Task to some pointer (analogous to Arc::into_raw), then we could do something like this:

  • convert the task to a raw pointer *const TaskRaw or whatever
  • implement rayon::Job for TaskRaw (Job is a private trait, so we can do this without the outside world knowing about it, but it does have to be in rayon-core presently)
  • enqueue the task directly onto the rayon thread-pool; when invoked, it'll use from_raw to convert the raw pointer back and then invoke run as above

To handle the 'scope API, we need to ensure that async-task meets some basic sanity requirements. But if we are doing this development within rayon-core, we don't need to expose any new APIs publicly. We would basically follow the approach I suggested above, I think -- we would grab a ref count and then augment the future we are spawning to decrement the ref count only when its completed its work. I didn't get so far as to prototype this, in part because I was doing this work in a separate crate. I realize now that I could prototype this better if I did the work directly in rayon-core (though we should discuss if this is a feature we would want in rayon-core, as it would up our minimum rust version -- perhaps gated by a cargo feature?).

@cuviper

This comment has been minimized.

Copy link
Member Author

commented Sep 5, 2019

(though we should discuss if this is a feature we would want in rayon-core, as it would up our minimum rust version -- perhaps gated by a cargo feature?)

I think it should be in core. It doesn't have to affect the general minimum rust if we gate the new functionality, whether by a cargo feature or just autocfg detection, as I did here for Future.

@jClaireCodesStuff

This comment has been minimized.

Copy link

commented Sep 18, 2019

I'm about halfway through a straightforward port of the existing rayon_futures to 0.3, without any thought into changing the API design beyond removing Future::wait.

The library builds and feels approximately correct but I haven't ported the tests yet, thus about halfway. I think the most interesting part of that work has been how much simpler ArcWake is than the previous Notify stuff.

I don't have too many opinions about what the API should be. I do think that dropping a Future should be considered a perfectly normal way to cancel it. That likely means we need to take a bit more care to propagate panics in a useful way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.