Change default weight of parallel iterators to assume expensive ops #49

nikomatsakis · 2016-02-07T11:25:53Z

Currently parallel iterators assume cheap operations. I am thinking that should be changed to assume expensive operations (i.e., fine-grained parallel splits), and have people opt-in to the current behavior by manually adjusting the weights or calling weight_min. My reasoning is:

The worst case behavior today is that you don't see any parallel speedup, which sucks. I see a lot of questions bout this.
The worst case behavior with the new default would be that you see less speedup than you expected. If we do a good job optimizing, this should be fairly low.
It seems to be what I want more often.

Thoughts?

The text was updated successfully, but these errors were encountered:

dirvine · 2016-02-07T11:41:49Z

Intel et al' parallel_for etc. do not seem to assume cheap operations and in c++ at least force refactor of some code to allow parallelism. I wonder if there is a link between being completely automatic and specific. I see this lib and think wow imagine if it were just a drop in for iter() or similar than think a bit and realise folk need to code algorithms really to take advantage.

I know this is not a direct correlation to the question, but I feel it is linked. In re-factoring for parallel many times code gets cleaner. It forces/allows the programmer to reason. This is my quandary at the moment to be honest. I love the idea of automating parallelism but have found in the past being told by the compiler I need to refactor to be handy.

In essence I feel an opt-in to be the way forward here, possibly later giving hints at optimisation of algorithms for parallelism at compile tim . Hope that helps a little anyway

nikomatsakis · 2016-02-07T11:46:39Z

@dirvine thanks for the input. A few thoughts:

I'm not trying to make anything completely automatic -- you have to choose to write par_iter. But I do want to make it easy.
Rust's type system has the nice benefit of basically encouraging you (though not forcing you) to write parallel-safe from the get-go, so I do hope that only minimal refactoring is needed.
Depending on your loop and what it does, though, you may still need to do some manual refactoring. The more that you are relying on the iterator adapters, vs writing ad-hoc code in your for loop body, the better off you are, basically (though some adapters, like fold, are not very parallelizable, but hopefully you can make do with reduce).

nikomatsakis · 2016-05-24T10:03:55Z

Hmm. I have been experimenting with this in a branch. One interesting result that I found was that, when I ported the nbody demo, the par-reduce variant (which can generate quite a lot of inexpensive tasks...) ran ridiculously slow until I raised up the sequential threshold. This isn't really surprising I guess -- the defaults are very wrong for this case -- but it did point out of course the danger of changing our weights.

nikomatsakis · 2016-05-24T10:04:38Z

I guess if we did more work on making task spawning cheap (work that would be very profitable in any case) that might help out here. (For that matter, par-reduce is still always slower than the more coarse-grained version.)

nikomatsakis · 2016-05-24T10:05:02Z

The branch (for the record) is no-more-weight.

nikomatsakis · 2016-08-05T10:02:47Z

See https://github.com/nikomatsakis/rayon/pull/81

nikomatsakis · 2016-10-19T01:52:54Z

Definitely significant progress here with @cuviper's https://github.com/nikomatsakis/rayon/pull/106. I still think we want to remove the existing weight stuff before 1.0 -- and maybe add back with some other APIs.

edre · 2016-10-21T22:51:30Z

Maybe rayon could sample how long leaf nodes take to run and dynamically adjust? Of course some elements may require much more processing than others, but starting with fine grained splitting and dynamically increasing splits may get the best of both worlds.

nikomatsakis · 2016-10-21T23:47:33Z

@edre we already do this, effectively, via the mechanism of work stealing as well as the adaptive changes. What we are talking about is tuning that mechanism.

nikomatsakis · 2016-10-21T23:49:27Z

In particular I think the current mechanism should work pretty well except for when things are both highly variable between tasks and bigger tasks are clumped together.

nikomatsakis · 2016-11-04T09:58:20Z

Now that @cuviper added dynamic balancing, I think this is basically all done. Or done enough. Closing in favor of #111.

nikomatsakis added this to the 1.0 milestone May 16, 2016

nikomatsakis mentioned this issue Aug 5, 2016

Remove weight method and introduce sequential-threshold #81

Closed

nikomatsakis closed this as completed Nov 4, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change default weight of parallel iterators to assume expensive ops #49

Change default weight of parallel iterators to assume expensive ops #49

nikomatsakis commented Feb 7, 2016

dirvine commented Feb 7, 2016

nikomatsakis commented Feb 7, 2016

nikomatsakis commented May 24, 2016

nikomatsakis commented May 24, 2016

nikomatsakis commented May 24, 2016

nikomatsakis commented Aug 5, 2016

nikomatsakis commented Oct 19, 2016

edre commented Oct 21, 2016

nikomatsakis commented Oct 21, 2016

nikomatsakis commented Oct 21, 2016

nikomatsakis commented Nov 4, 2016

Change default weight of parallel iterators to assume expensive ops #49

Change default weight of parallel iterators to assume expensive ops #49

Comments

nikomatsakis commented Feb 7, 2016

dirvine commented Feb 7, 2016

nikomatsakis commented Feb 7, 2016

nikomatsakis commented May 24, 2016

nikomatsakis commented May 24, 2016

nikomatsakis commented May 24, 2016

nikomatsakis commented Aug 5, 2016

nikomatsakis commented Oct 19, 2016

edre commented Oct 21, 2016

nikomatsakis commented Oct 21, 2016

nikomatsakis commented Oct 21, 2016

nikomatsakis commented Nov 4, 2016