Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pre-RFC: rayon::yield_now #548

Closed
cuviper opened this issue Mar 1, 2018 · 4 comments · Fixed by #1026
Closed

pre-RFC: rayon::yield_now #548

cuviper opened this issue Mar 1, 2018 · 4 comments · Fixed by #1026

Comments

@cuviper
Copy link
Member

cuviper commented Mar 1, 2018

If you're in a situation where you need to wait for some external resource, but don't want to block, then calling rayon::yield_now could let the current thread search for other work in the pool. If nothing was found, or if you're not actually in a threadpool at the moment, it can just call std::thread::yield_now.

bors bot added a commit that referenced this issue Jun 6, 2018
550: add bridge from Iterator to ParallelIterator r=cuviper a=QuietMisdreavus

Half of #46

This started getting reviewed in QuietMisdreavus/polyester#6, but i decided to move my work to Rayon proper.

This PR adds a new trait, `AsParallel`, an implementation on `Iterator + Send`, and an iterator adapter `IterParallel` that implements `ParallelIterator` with a similar "cache items as you go" methodology as Polyester. I introduced a new trait because `ParallelIterator` was implemented on `Range`, which is itself an `Iterator`.

The basic idea is that you would start with a quick sequential `Iterator`, call `.as_parallel()` on it, and be able to use `ParallelIterator` adapters after that point, to do more expensive processing in multiple threads.

The design of `IterParallel` is like this:

* `IterParallel` defers background work to `IterParallelProducer`, which implements `UnindexedProducer`.
* `IterParallelProducer` will split as many times as there are threads in the current pool. (I've been told that #492 is a better way to organize this, but until that's in, this is how i wrote it. `>_>`)
* When folding items, `IterParallelProducer` keeps a `Stealer` from `crossbeam-deque` (added as a dependency, but using the same version as `rayon-core`) to access a deque of items that have already been loaded from the iterator.
* If the `Stealer` is empty, a worker will attempt to lock the Mutex to access the source `Iterator` and the `Deque`.
  * If the Mutex is already locked, it will call `yield_now`. The implementation in polyester used a `synchronoise::SignalEvent` but i've been told that worker threads should not block. In lieu of #548, a regular spin-loop was chosen instead.
  * If the Mutex is available, the worker will load a number of items from the iterator (currently (number of threads * number of threads * 2)) before closing the Mutex and continuing.
  * (If the Mutex is poisoned, the worker will just... stop. Is there a recommended approach here? `>_>`)

This design is effectively a first brush, has [the same caveats as polyester](https://docs.rs/polyester/0.1.0/polyester/trait.Polyester.html#implementation-note), probably needs some extra features in rayon-core, and needs some higher-level docs before i'm willing to let it go. However, i'm putting it here because it was not in the right place when i talked to @cuviper about it last time.

Co-authored-by: QuietMisdreavus <grey@quietmisdreavus.net>
Co-authored-by: Niko Matsakis <niko@alum.mit.edu>
@fredpointzero
Copy link

Hi, do you have any update on this? It will be really usefull!

@cuviper
Copy link
Member Author

cuviper commented May 1, 2019

I implemented this yesterday, and it was straightforward enough: master...cuviper:yield_now

I'm now wondering if it makes sense for the caller to be "blind" to what happened (work executed or just yielded), or whether we should even call thread::yield_now() at all.

A different design could be poll() -> bool returning true if any work was executed, and if not the caller can choose to follow up with thread::yield_now(), or sleep with exponential backoff, etc.

@fredpointzero
Copy link

I think polling makes more sense, because my actual use case is:

I know that I need to that condition A to be true and will be mutated by another thread of the thread pool. So, I want to pause this job for a short amount of time by executing another job to avoid a deadlock.

Knowing whether a job was executed can be important to detect that deadlock situation.

So I think that a poll design is better suited.

I don't know if there is a use case where you want more insight, like was a local job executed? was a job stolen? or is there nothing to do? It can be as well an information returned by the poll.

@cuviper
Copy link
Member Author

cuviper commented Feb 27, 2023

@fredpointzero (or anyone else watching), I would appreciate feedback on #1026, thanks!

@bors bors bot closed this as completed in #1026 Mar 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants