New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheduler work stealing. #3095

Closed
bblum opened this Issue Aug 2, 2012 · 24 comments

Comments

Projects
None yet
9 participants
@bblum
Contributor

bblum commented Aug 2, 2012

Currently: Tasks are scheduled on sched_loops in round-robin fashion, and are tied to their sched_loop for their entire lifetime. This can lead to surprising performance with irregular task setups.

It does not seem too hard to implement a 'steal' mechanism by which idle sched_loops can reach over to overloaded ones to balance some of the load. My impression is that the classic presentation involves idle processors busy-looping around the steal mechanism until they pick up something to do; I've even seen a talk on a purportedly very clever mechanism that was totally lockless. But... we just can't eat up all CPUs when not fully loaded.

One thing I haven't quite grasped is how to do this without thrashing some global synchronisation whenever we decide to steal. If I'm not mistaken, the whole point of work stealing is that each processor's runqueue not be in a cacheline that multiple processors want... but I think that if we want to suspend idle CPUs, we need to have a global semaphore-alike setup that gets poked every time a task (a) spawns, (b) blocks, (c) wakes, (d) exits. And, well, this is basically when scheduler runqueues get accessed anyway.

Currently we take a global (per-scheduler) lock on (a) and (d) already, but not (b) or (c). Is there an option for work stealing that puts idle CPUs to sleep without hitting global state during the extra lifecycle events?

@ghost ghost assigned bblum Aug 2, 2012

@brson

This comment has been minimized.

Show comment
Hide comment
@brson

brson Aug 3, 2012

Contributor

I don't have any answers right now, but I'm going to drop this link here: http://supertech.csail.mit.edu/papers.html. These are all the cilk papers, many about work stealing.

Contributor

brson commented Aug 3, 2012

I don't have any answers right now, but I'm going to drop this link here: http://supertech.csail.mit.edu/papers.html. These are all the cilk papers, many about work stealing.

@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis Aug 3, 2012

Contributor

One thing: not having to solve these kinds of annoying---but solved---implementation decisions is the reason why I thought it make sense to consider basing our implementation on a well-tested scheduler, like TBB. I still basically think we ought to try prototyping that.

That said, there are existing implementations that do solve these kinds of problems, and to the extent allowed by law we can also peek at what they do. I remember I did a few things when I was implementing my own work-stealing scheduler a while back, but I don't remember what worked well—I think I was never fully satisfied with this particular aspect of my design, as there was a potential race in which a thread might go to sleep just as new work was being posted and hence be idle. Of course that thread would wake if there were yet more work posted, or if a certain timeout elapsed, so in practice this wasn't a big deal. But I imagine there are more elegant solutions out there.

Contributor

nikomatsakis commented Aug 3, 2012

One thing: not having to solve these kinds of annoying---but solved---implementation decisions is the reason why I thought it make sense to consider basing our implementation on a well-tested scheduler, like TBB. I still basically think we ought to try prototyping that.

That said, there are existing implementations that do solve these kinds of problems, and to the extent allowed by law we can also peek at what they do. I remember I did a few things when I was implementing my own work-stealing scheduler a while back, but I don't remember what worked well—I think I was never fully satisfied with this particular aspect of my design, as there was a potential race in which a thread might go to sleep just as new work was being posted and hence be idle. Of course that thread would wake if there were yet more work posted, or if a certain timeout elapsed, so in practice this wasn't a big deal. But I imagine there are more elegant solutions out there.

@graydon

This comment has been minimized.

Show comment
Hide comment
@graydon

graydon Aug 3, 2012

Contributor

Hmm. So on irc I said I'd like you to clarify the bug in terms of sched vs thread, and I'll repeat that request here in more detail.

Here is my (possibly wrong) understanding of the current system:

By default there is one sched. It's multi threaded and the threads all share a pool of tasks. So that may involve too much locking but it already means that most tasks auto balance across idle cores.

If a user makes multiple scheds, it's generally because they want to pin a task to a thread. Or a smaller set of threads. That's the whole reason the api for multiple scheds exists. So in these cases, stealing tasks between scheds is incorrect, by definition.

If I'm right about those assumptions, then I don't really see what this bug means, except maybe taking fewer locks in the multi threaded sched case. Am I misinformed of how the current system works?

Contributor

graydon commented Aug 3, 2012

Hmm. So on irc I said I'd like you to clarify the bug in terms of sched vs thread, and I'll repeat that request here in more detail.

Here is my (possibly wrong) understanding of the current system:

By default there is one sched. It's multi threaded and the threads all share a pool of tasks. So that may involve too much locking but it already means that most tasks auto balance across idle cores.

If a user makes multiple scheds, it's generally because they want to pin a task to a thread. Or a smaller set of threads. That's the whole reason the api for multiple scheds exists. So in these cases, stealing tasks between scheds is incorrect, by definition.

If I'm right about those assumptions, then I don't really see what this bug means, except maybe taking fewer locks in the multi threaded sched case. Am I misinformed of how the current system works?

@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis Aug 3, 2012

Contributor

I had assumed that this referred to stealing within the main scheduler? It seems like we ought to have each scheduler map to a pool of threads (which possibly has size 1). Tasks may move around within this pool however we like but do not cross pools.

Also—there is lots of interesting work on scheduling for better locality. In particular for Servo, Leo Meyerovich's work at Berkeley on parallelizing layout uses a variation on work-stealing in which they first do classic work stealing, but they record which processor ended up processing which nodes in the tree. In latter phases they just replay this. Or something like that. He claimed it was essential for good performance, as classical work stealing proved too much overhead. So I'm sure we'll be tweaking this as we go to give user's more control!

Contributor

nikomatsakis commented Aug 3, 2012

I had assumed that this referred to stealing within the main scheduler? It seems like we ought to have each scheduler map to a pool of threads (which possibly has size 1). Tasks may move around within this pool however we like but do not cross pools.

Also—there is lots of interesting work on scheduling for better locality. In particular for Servo, Leo Meyerovich's work at Berkeley on parallelizing layout uses a variation on work-stealing in which they first do classic work stealing, but they record which processor ended up processing which nodes in the tree. In latter phases they just replay this. Or something like that. He claimed it was essential for good performance, as classical work stealing proved too much overhead. So I'm sure we'll be tweaking this as we go to give user's more control!

@bblum

This comment has been minimized.

Show comment
Hide comment
@bblum

bblum Aug 3, 2012

Contributor

@brson ooh, useful. I didn't look yet at the cilk papers, but the "worst page-replacement policy" paper (well, the abstract) is amazing.

@nikomatsakis I am still not sure what the nature of the problem is. if the answer is "Well yeah, you just suck it up and bounce on a global lock for every lifecycle operation", well, that wouldn't take much effort at all to implement with what we've got. if, on the other hand, there are clever heuristics that achieve good balancing and avoiding the global lock, that wheel would be a lot worse to reinvent.

@graydon No, threads within a sched do not share a pool of tasks; each thread has their own pool. The benefit which we'd lose from changing to what you described is the task block/wakeup paths only need a per-cpu lock (sched_loop->lock), rather than the global lock needed for spawn and exit (scheduler->lock).

Basically what you described is what we want; the question is how can we implement it right.

@nikomatsakis hmm. nodes in trees is getting a lot closer to data-aware scheduling, which is beyond the scope of the runtime scheduler. It might be easy to make the runtime expose a "cpu affinity" option, and have a data-aware library that makes use of that.

Contributor

bblum commented Aug 3, 2012

@brson ooh, useful. I didn't look yet at the cilk papers, but the "worst page-replacement policy" paper (well, the abstract) is amazing.

@nikomatsakis I am still not sure what the nature of the problem is. if the answer is "Well yeah, you just suck it up and bounce on a global lock for every lifecycle operation", well, that wouldn't take much effort at all to implement with what we've got. if, on the other hand, there are clever heuristics that achieve good balancing and avoiding the global lock, that wheel would be a lot worse to reinvent.

@graydon No, threads within a sched do not share a pool of tasks; each thread has their own pool. The benefit which we'd lose from changing to what you described is the task block/wakeup paths only need a per-cpu lock (sched_loop->lock), rather than the global lock needed for spawn and exit (scheduler->lock).

Basically what you described is what we want; the question is how can we implement it right.

@nikomatsakis hmm. nodes in trees is getting a lot closer to data-aware scheduling, which is beyond the scope of the runtime scheduler. It might be easy to make the runtime expose a "cpu affinity" option, and have a data-aware library that makes use of that.

@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis Aug 3, 2012

Contributor

@bblum I don't know what the best solutions are either. Clearly busy-looping trying to steal isn't it. I think what I had before was pretty straight-forward: basically a shared boolean flag indicated whether there were any idle threads combined with a linked list of idle threads. After they failed to steal, threads would acquire a lock, atomically add themselves to the list, and set the flag. When producing work, you would check the flag and, if it is true, wake up an idle task and deliver the work item directly to them rather than pushing it onto your queue. However, I don't claim this was particularly smart, as I said I was never fully satisfied with it. I don't know what sort of protocols are used in other systems, it feels like you can do better though!

Contributor

nikomatsakis commented Aug 3, 2012

@bblum I don't know what the best solutions are either. Clearly busy-looping trying to steal isn't it. I think what I had before was pretty straight-forward: basically a shared boolean flag indicated whether there were any idle threads combined with a linked list of idle threads. After they failed to steal, threads would acquire a lock, atomically add themselves to the list, and set the flag. When producing work, you would check the flag and, if it is true, wake up an idle task and deliver the work item directly to them rather than pushing it onto your queue. However, I don't claim this was particularly smart, as I said I was never fully satisfied with it. I don't know what sort of protocols are used in other systems, it feels like you can do better though!

@bblum

This comment has been minimized.

Show comment
Hide comment
@bblum

bblum Aug 3, 2012

Contributor

@nikomatsakis "when you had before" - what system was that for?

it sounds like that's analogous to the "global lock" approach i was mentioning - producing work is task-wakeup (always), and becoming idle is task-block-if-it-was-your-last-task. it sounds like becoming idle was the complicated path in your system, which is in keeping with the point of work-stealing that busy cpus not be burdened with it.

i think there is probably not a way to avoid global locking in the wakeup case. i suspect tasks blocking can bypass the global lock if it wasn't the last task on that cpu.

i will also note that the cpu affinity i mentioned above is a good reason to keep one-runqueue-per-cpu instead of a global pool of tasks.

Contributor

bblum commented Aug 3, 2012

@nikomatsakis "when you had before" - what system was that for?

it sounds like that's analogous to the "global lock" approach i was mentioning - producing work is task-wakeup (always), and becoming idle is task-block-if-it-was-your-last-task. it sounds like becoming idle was the complicated path in your system, which is in keeping with the point of work-stealing that busy cpus not be burdened with it.

i think there is probably not a way to avoid global locking in the wakeup case. i suspect tasks blocking can bypass the global lock if it wasn't the last task on that cpu.

i will also note that the cpu affinity i mentioned above is a good reason to keep one-runqueue-per-cpu instead of a global pool of tasks.

@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis Aug 3, 2012

Contributor

@bblum this was a scheduler I did as part of my PhD work. Ultimately the system relied on a global lock, the flag was used to avoid touching the lock in the common case where lots of tasks are being produced and the system is generally operating at a good level of throughput (in which case, tasks are not idle).

Contributor

nikomatsakis commented Aug 3, 2012

@bblum this was a scheduler I did as part of my PhD work. Ultimately the system relied on a global lock, the flag was used to avoid touching the lock in the common case where lots of tasks are being produced and the system is generally operating at a good level of throughput (in which case, tasks are not idle).

@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis Aug 3, 2012

Contributor

oh, @bblum, regarding the data-specific stuff, I am envisioning that there will be custom schedulers tailored to specific strategies, perhaps even schedulers implemented just for one application.

Contributor

nikomatsakis commented Aug 3, 2012

oh, @bblum, regarding the data-specific stuff, I am envisioning that there will be custom schedulers tailored to specific strategies, perhaps even schedulers implemented just for one application.

@bblum

This comment has been minimized.

Show comment
Hide comment
@bblum

bblum Aug 14, 2012

Contributor

I won't have time to do this justice before my internship is over.

Contributor

bblum commented Aug 14, 2012

I won't have time to do this justice before my internship is over.

@brson

This comment has been minimized.

Show comment
Hide comment
@brson

brson Aug 14, 2012

Contributor

That's ok! The work you did on tasks was awesome.

Contributor

brson commented Aug 14, 2012

That's ok! The work you did on tasks was awesome.

@bblum

This comment has been minimized.

Show comment
Hide comment
@bblum

bblum Aug 15, 2012

Contributor

One important implementation detail note: in rust_sched_loop - currently, current task stays on the RQ. If it gets taken off, i.e. to steal it, that may introduce a problem where it won't get killed by kill_all if the runtime is failing.

This has to do with the fix for #2671 - sched_loops now have a field called killed, which is set when the runtime fails. If this races with stealing (i.e., so the stolen task misses the kill signal), the sched_loop would need to check its killed flag before starting to run the stolen task.

Contributor

bblum commented Aug 15, 2012

One important implementation detail note: in rust_sched_loop - currently, current task stays on the RQ. If it gets taken off, i.e. to steal it, that may introduce a problem where it won't get killed by kill_all if the runtime is failing.

This has to do with the fix for #2671 - sched_loops now have a field called killed, which is set when the runtime fails. If this races with stealing (i.e., so the stolen task misses the kill signal), the sched_loop would need to check its killed flag before starting to run the stolen task.

@bblum bblum referenced this issue Sep 11, 2012

Closed

task sleep #3455

@jesse99 jesse99 referenced this issue Sep 25, 2012

Closed

task thrashing #3587

@brson

This comment has been minimized.

Show comment
Hide comment
@brson

brson Dec 13, 2012

Contributor

A student has expressed interest in this project: Mozilla-Student-Projects/Projects-Tracker#31

Contributor

brson commented Dec 13, 2012

A student has expressed interest in this project: Mozilla-Student-Projects/Projects-Tracker#31

@graydon

This comment has been minimized.

Show comment
Hide comment
@graydon

graydon Apr 25, 2013

Contributor

nominating for backwards-compatible milestone

Contributor

graydon commented Apr 25, 2013

nominating for backwards-compatible milestone

@brson

This comment has been minimized.

Show comment
Hide comment
@brson

brson May 11, 2013

Contributor

I'm going to outline the straw-man design I have in mind for work stealing to go along with the scheduler redesign in #4419. As the OP pointed out the trickiest part is dealing with waking up schedulers to steal, and the literature doesn't give any guidance on this. There are lots of ways to go about this, but the one I'm going to present is simple and fits well with the language and scheduler design.

The primary consideration is that as much synchronization work as possible is offloaded from the active schedulers to those that are seeking work. The original work stealing paper seems to imply this is done by making the schedulers that are out of work 'busy-steal', continually trying to steal from others until there is work available. This imposes no overhead on active schedulers but isn't practical - a power-efficient solution needs to be event driven. Under this design the local, working scheduler will pay a minimum of two atomic operations every time it defers work to the work queue: 1 to enqueue the task, 1 more to check if there is a sleeping scheduler to wake up.

There is a shared data structure, WorkList:

struct WorkList {
    sleeper: AtomicOption<SchedHandle>,
    work_queues: [(WorkQueue, WorkQueueClaim), .. MAX_SCHEDULERS]
}

AtomicOption is a pointer that can be cmpxchgd. SchedHandle is the type through which schedulers send messages to each other. WorkQueue is a work stealing queue. sleeper is a handle to a quiescent, sleeping scheduler that previously decided there was no work available. work_queues is a fixed-size vector of WorkQueue; as there are always MAX_SCHEDULERS WorkQueues these should have very little or no memory overhead when unused. WorkQueueClaim is a pointer-sized integer used for assigning WorkQueues to Schedulers.

WorkList is shared between schedulers as an UnsafeAtomicRcBox<WorkList>.

Access to work_queue is always random to distribute contention between all the queues in the system.

When a stealing scheduler can't find work it makes a SchedHandle available to be woken up with later. Sleeping schedulers form a linked list and are responsible for waking each other up in sequence when new work becomes available. Note that, because of async I/O a scheduler that goes to sleep may be woken in other ways. We need to be sure that I/O bound schedulers don't repeatedly spam other schedulers to indicate that they are sleeping.

Scheduler creation

  1. select an index between [0, MAX_SCHEDULERS),
  2. cmpxchg the value of the WorkQueueClaim at that index from 0 to 1
    2a) if the cmpxchg succeeded then the scheduler takes a handle to it, records the index and creates the scheduler
    2b) if the cmpxchg fails then increment the index and try 2 again
  3. if all indexes are occupied then scheduler creation fails

Work stealing

When a scheduler looks into it's local work queue and there is nothing to do, it starts the following stealing process:

  1. check the local waiting_for_work flag. if it is already true then we've been through this process before, there is no work available and we are waiting for the WorkAvailable message. go to sleep and wait. this can happen when the scheduler is woken to do I/O or to respond to other messages.
  2. otherwise, select an index between [0, MAX_SCHEDULERS),
  3. attempt to pop from the queue at that index
    3a) if success then schedule the task
    3b) if the steal fails then increment the index and try 3 again
  4. if all indexes fail then go to sleep (next section)

Sleeping

  1. set the local waiting_for_work flag to true. This will stay set until we receive the WorkAvailable message from the SchedHandle. This is to prevent the situation where I/O (or other events) wake up the scheduler followed by the scheduler going back to sleep and creating a new SchedHandle
  2. create a new SchedHandle to be woken up with via the WorkAvailable message.
  3. atomically swap the SchedHandle with WorkList.sleeper and store the previous value in the local next_sleeper variable
  4. wait to be woken up.

Waking

When a scheduler pushes a task onto it's queue it may need to send a signal to jumpstart the stealing process.

  1. after a working scheduler pushes onto its queue it then atomically swaps 0 with the WorkList.sleeper SchedHandle.
  2. if sleeper is non-null then it sends the WorkAvailable message and destroys the SchedHandle, then goes back to work.
  3. in the remote scheduler, upon receipt of WorkAvailable the scheduler sets waiting_for_work to 0, looks at it's next_sleeper variable to see if there are further schedulers to wake up. if so it sends the WorkAvailable message and sets next_sleeper to 0. the process continues until all schedulers are awake.

Issues

  • Inlining all these atomically operated words into this data structure will create very heavy cache contention. That is mostly suffered by stealers though so it shouldn't be so bad for workers.
  • This allows stealers to go to sleep even while there is available work to be stolen. A stealer could go through the list, find nothing, then sleep, but in the meantime another scheduler could push work onto it's queue - neither party notices this situation and the work doesn't get stolen. I don't think this affects correctness - it just means that you miss some parallelism.
  • For systems under light load this will end up with all pushes onto the work queue requiring the local scheduler to wake a remote scheduler, and that's not so cheap here. Maybe this can be mitigated in some way by keeping track of the number of schedulers that are trying to steal.
Contributor

brson commented May 11, 2013

I'm going to outline the straw-man design I have in mind for work stealing to go along with the scheduler redesign in #4419. As the OP pointed out the trickiest part is dealing with waking up schedulers to steal, and the literature doesn't give any guidance on this. There are lots of ways to go about this, but the one I'm going to present is simple and fits well with the language and scheduler design.

The primary consideration is that as much synchronization work as possible is offloaded from the active schedulers to those that are seeking work. The original work stealing paper seems to imply this is done by making the schedulers that are out of work 'busy-steal', continually trying to steal from others until there is work available. This imposes no overhead on active schedulers but isn't practical - a power-efficient solution needs to be event driven. Under this design the local, working scheduler will pay a minimum of two atomic operations every time it defers work to the work queue: 1 to enqueue the task, 1 more to check if there is a sleeping scheduler to wake up.

There is a shared data structure, WorkList:

struct WorkList {
    sleeper: AtomicOption<SchedHandle>,
    work_queues: [(WorkQueue, WorkQueueClaim), .. MAX_SCHEDULERS]
}

AtomicOption is a pointer that can be cmpxchgd. SchedHandle is the type through which schedulers send messages to each other. WorkQueue is a work stealing queue. sleeper is a handle to a quiescent, sleeping scheduler that previously decided there was no work available. work_queues is a fixed-size vector of WorkQueue; as there are always MAX_SCHEDULERS WorkQueues these should have very little or no memory overhead when unused. WorkQueueClaim is a pointer-sized integer used for assigning WorkQueues to Schedulers.

WorkList is shared between schedulers as an UnsafeAtomicRcBox<WorkList>.

Access to work_queue is always random to distribute contention between all the queues in the system.

When a stealing scheduler can't find work it makes a SchedHandle available to be woken up with later. Sleeping schedulers form a linked list and are responsible for waking each other up in sequence when new work becomes available. Note that, because of async I/O a scheduler that goes to sleep may be woken in other ways. We need to be sure that I/O bound schedulers don't repeatedly spam other schedulers to indicate that they are sleeping.

Scheduler creation

  1. select an index between [0, MAX_SCHEDULERS),
  2. cmpxchg the value of the WorkQueueClaim at that index from 0 to 1
    2a) if the cmpxchg succeeded then the scheduler takes a handle to it, records the index and creates the scheduler
    2b) if the cmpxchg fails then increment the index and try 2 again
  3. if all indexes are occupied then scheduler creation fails

Work stealing

When a scheduler looks into it's local work queue and there is nothing to do, it starts the following stealing process:

  1. check the local waiting_for_work flag. if it is already true then we've been through this process before, there is no work available and we are waiting for the WorkAvailable message. go to sleep and wait. this can happen when the scheduler is woken to do I/O or to respond to other messages.
  2. otherwise, select an index between [0, MAX_SCHEDULERS),
  3. attempt to pop from the queue at that index
    3a) if success then schedule the task
    3b) if the steal fails then increment the index and try 3 again
  4. if all indexes fail then go to sleep (next section)

Sleeping

  1. set the local waiting_for_work flag to true. This will stay set until we receive the WorkAvailable message from the SchedHandle. This is to prevent the situation where I/O (or other events) wake up the scheduler followed by the scheduler going back to sleep and creating a new SchedHandle
  2. create a new SchedHandle to be woken up with via the WorkAvailable message.
  3. atomically swap the SchedHandle with WorkList.sleeper and store the previous value in the local next_sleeper variable
  4. wait to be woken up.

Waking

When a scheduler pushes a task onto it's queue it may need to send a signal to jumpstart the stealing process.

  1. after a working scheduler pushes onto its queue it then atomically swaps 0 with the WorkList.sleeper SchedHandle.
  2. if sleeper is non-null then it sends the WorkAvailable message and destroys the SchedHandle, then goes back to work.
  3. in the remote scheduler, upon receipt of WorkAvailable the scheduler sets waiting_for_work to 0, looks at it's next_sleeper variable to see if there are further schedulers to wake up. if so it sends the WorkAvailable message and sets next_sleeper to 0. the process continues until all schedulers are awake.

Issues

  • Inlining all these atomically operated words into this data structure will create very heavy cache contention. That is mostly suffered by stealers though so it shouldn't be so bad for workers.
  • This allows stealers to go to sleep even while there is available work to be stolen. A stealer could go through the list, find nothing, then sleep, but in the meantime another scheduler could push work onto it's queue - neither party notices this situation and the work doesn't get stolen. I don't think this affects correctness - it just means that you miss some parallelism.
  • For systems under light load this will end up with all pushes onto the work queue requiring the local scheduler to wake a remote scheduler, and that's not so cheap here. Maybe this can be mitigated in some way by keeping track of the number of schedulers that are trying to steal.
@Aatch

This comment has been minimized.

Show comment
Hide comment
@Aatch

Aatch May 12, 2013

Contributor

I've been working on the work-stealing deque. I have an initial implementation here: https://github.com/Aatch/rust/work-steal that seems to work (at least as a regular queue) and I'm going to see what I can do in terms of implementing the rest of the work-stealing stuff.

Contributor

Aatch commented May 12, 2013

I've been working on the work-stealing deque. I have an initial implementation here: https://github.com/Aatch/rust/work-steal that seems to work (at least as a regular queue) and I'm going to see what I can do in terms of implementing the rest of the work-stealing stuff.

@brson

This comment has been minimized.

Show comment
Hide comment
@brson

brson May 24, 2013

Contributor

Some critiques of work-stealing as applied to the Java fork/join library: http://www.coopsoft.com/ar/CalamityArticle.html#steal

Contributor

brson commented May 24, 2013

Some critiques of work-stealing as applied to the Java fork/join library: http://www.coopsoft.com/ar/CalamityArticle.html#steal

@brson

This comment has been minimized.

Show comment
Hide comment
@brson

brson Jun 1, 2013

Contributor
Contributor

brson commented Jun 1, 2013

@toddaaro

This comment has been minimized.

Show comment
Hide comment
@toddaaro

toddaaro Jun 10, 2013

Contributor

"BWS: Balanced Work Stealing for Time-Sharing Multicores"
http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-12-1.pdf

The focus is on increasing performance/fairness in the concurrent workstealing "jobs" case, and they use a few kernel extensions to get a solid speedup. This isn't very applicable to Rust's use-case, but the comments they make on fairness+concurrency issues could be informative.

Contributor

toddaaro commented Jun 10, 2013

"BWS: Balanced Work Stealing for Time-Sharing Multicores"
http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-12-1.pdf

The focus is on increasing performance/fairness in the concurrent workstealing "jobs" case, and they use a few kernel extensions to get a solid speedup. This isn't very applicable to Rust's use-case, but the comments they make on fairness+concurrency issues could be informative.

@toddaaro

This comment has been minimized.

Show comment
Hide comment
@toddaaro

toddaaro Jun 11, 2013

Contributor

"Three Layer Cake for Shared-Memory Programming"
http://www.upcrc.illinois.edu/workshops/paraplop10/papers/paraplop10_submission_8.pdf

This paper discusses the composition of message-passing, fork-join, and SIMD parallelism. Extremely relevant to the Rust runtime design.

Contributor

toddaaro commented Jun 11, 2013

"Three Layer Cake for Shared-Memory Programming"
http://www.upcrc.illinois.edu/workshops/paraplop10/papers/paraplop10_submission_8.pdf

This paper discusses the composition of message-passing, fork-join, and SIMD parallelism. Extremely relevant to the Rust runtime design.

@huonw

This comment has been minimized.

Show comment
Hide comment
@huonw

huonw Jul 30, 2013

Member

Visiting for triage; I don't know enough about the new scheduler to be able to make a sensible comment on this at all.

Member

huonw commented Jul 30, 2013

Visiting for triage; I don't know enough about the new scheduler to be able to make a sensible comment on this at all.

@cpeterso

This comment has been minimized.

Show comment
Hide comment
@cpeterso

cpeterso Aug 13, 2013

Contributor

"Non-blocking steal-half work queues"
http://www.cs.bgu.ac.il/~hendlerd/papers/p280-hendler.pdf

This paper presents StealHalf, a new generalization of the Arora et al. algorithm that allows workers, instead of stealing one, to steal up to half of the items in a given work queue at a time. Workers don't need to steal as often and they are more likely to find a non-empty queue when they randomly pick one because items are distributed among more queues.

Contributor

cpeterso commented Aug 13, 2013

"Non-blocking steal-half work queues"
http://www.cs.bgu.ac.il/~hendlerd/papers/p280-hendler.pdf

This paper presents StealHalf, a new generalization of the Arora et al. algorithm that allows workers, instead of stealing one, to steal up to half of the items in a given work queue at a time. Workers don't need to steal as often and they are more likely to find a non-empty queue when they randomly pick one because items are distributed among more queues.

@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton Dec 17, 2013

Member

I'm going to close this for now. There's some awesome material in this thread, but for now work stealing is implemented in the M:N schedulers using the chase-lev deques. I think that we've been pretty happy with this and we don't appear to be rushing to completely redesign this any time soon.

There may still be bugs to flesh out in work-stealing, or just fine-tuning in general, but for now I think that the major action-items on this issue are taken care of. I'm fine reopening if others disagree though!

Member

alexcrichton commented Dec 17, 2013

I'm going to close this for now. There's some awesome material in this thread, but for now work stealing is implemented in the M:N schedulers using the chase-lev deques. I think that we've been pretty happy with this and we don't appear to be rushing to completely redesign this any time soon.

There may still be bugs to flesh out in work-stealing, or just fine-tuning in general, but for now I think that the major action-items on this issue are taken care of. I'm fine reopening if others disagree though!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment