Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of par_bridge #685

Closed
gkudelis opened this issue Aug 28, 2019 · 4 comments
Closed

Performance of par_bridge #685

gkudelis opened this issue Aug 28, 2019 · 4 comments

Comments

@gkudelis
Copy link

Hello!

I'm trying to take an algorithm and parallelize a part of it. The first part generates combinations of items, I don't see how to parallelize it and it takes up the bulk of computation time. These combinations get fed into the next bit where I generate permutations of each combination, calculate some properties of these permutations and filter based on the properties. I then want to do something that pass the filter.

The second part is trivial to parallelize - I can take every combination from the first part and analyze it separately. I've tried the naive approach and the overall performance drops. Here's what I'm using:

combinations
    .par_bridge()
    .flat_map(|combination| permutations::permutations(&combination))
    .map(|permutation| {
        let phrase = permutation.join(" ");
        let has_property = has_property(phrase);
        (phrase, has_property)
    })
    .filter(|(_, has_property)| has_property)
    .for_each(|(phrase, _)| println!("{:?}", phrase));

I was expecting that one thread would be doing the work of computing items of the combinations iterator and the others in the pool would be consuming the items and processing them. I can see that's not the case from profiling the binary and the performance (rate of finding matching phrases) actually drops, but I'm not sure how to proceed from there. Am I using par_bridge the way it's intended? Is there a way to achieve a performance boost for this sort of problem using Rayon?

Thank you for your help!

@cuviper
Copy link
Member

cuviper commented Aug 29, 2019

The first part generates combinations of items, I don't see how to parallelize it and it takes up the bulk of computation time.

This is where you may run up against Amdahl's law -- if the parallelized work is a small part of the runtime, then there's not much room for overall improvement even in the ideal case. In the non-ideal case, rayon does add some overhead, and par_bridge even more so.

If your real code is just evaluated for side effects, like this for_each(print) rather than folding or collecting in some way, then you might have better luck replacing par_bridge with spawns. Something like this:

rayon::scope(|scope| {
    combinations.for_each(|combination| {
        scope.spawn(move |_| {
            permutations::permutations(&combination)
                .into_par_iter()
                .map(|permutation| {
                    let phrase = permutation.join(" ");
                    let has_property = has_property(phrase);
                    (phrase, has_property)
                })
                .filter(|(_, has_property)| has_property)
                .for_each(|(phrase, _)| println!("{:?}", phrase));
        });
    });
});

You might not even want the inner parallel iterator, and just let the rough spawns provide enough parallelism to go around.

@gkudelis
Copy link
Author

Oh, that's a good point, thank you! I've looked into it more and yeah, in my case there isn't much to gain in terms of performance. Thank you for the suggestion of using rayon::scope, I'm sure I'll end up using it eventually!

You mentioned side effects and this got me thinking - what if I wanted to feed the results of a ParallelIterator into some (presumably library) function that takes an Iterator? Is there a way to do this without collecting into a collection first? I'm a little confused about how to correctly interface non-parallel code to be fair, hope this is the right place to ask about this!

@cuviper
Copy link
Member

cuviper commented Aug 30, 2019

The general question of converting to an iterator is issue #210, and there are a few ideas therein. If you just want to produce items in any order that they are processed, a channel is probably easiest, where you give the receiver to the library that needs an iterator.

You might also take a look at rayon-adaptive (#616, repo). They have a very different parallel iterator design, still built on rayon-core to share the threadpool, and IIRC some of that made conversions to/from Iterator more feasible.

@gkudelis
Copy link
Author

Awesome, I'll check those out. Thank you very much for the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants