Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

for_each_mut/sync adapter #55

Open
flying-sheep opened this issue Feb 17, 2016 · 15 comments
Open

for_each_mut/sync adapter #55

flying-sheep opened this issue Feb 17, 2016 · 15 comments

Comments

@flying-sheep
Copy link

hi, i’d like to put things coming out of a ParallelIterator into a HashMap.

currently i don’t see a way to do it, so i’d either like to see a for_each_mut that takes a FnMut or something else.

@cuviper
Copy link
Member

cuviper commented Feb 17, 2016 via email

@flying-sheep
Copy link
Author

You could put the map under a mutex

ah, thanks!

@lilianmoraru
Copy link
Contributor

@flying-sheep I personally would not do this because of performance issues.

@nikomatsakis
Copy link
Member

Collecting is likely very fast, mutez maybe not, but it depends on how many items and the relative cost of inserting them vs creating them. You could also achieve the async pattern by making a port and having the parallel iterator send msgs to the port (and starting a thread too). I actually do intend to add some kind of combinators or helpers like this but am not sure just how they should look.
On Feb 18, 2016 2:47 PM, Lilian Anatolie Moraru notifications@github.com wrote:@flying-sheep I personally would not do this because of performance issues.

—Reply to this email directly or view it on GitHub.

@flying-sheep
Copy link
Author

ok, then i’ll reopen this.

i don’t really like the idea of having to collect things just to go over them and building a map from them 😐

@flying-sheep flying-sheep reopened this Feb 18, 2016
@tikue
Copy link

tikue commented Feb 26, 2016

I think the right solution is to just pass the buck to a ConcurrentHashMap. I don't know of any good implementations as of now, though.

@GyrosOfWar
Copy link

I've got a somewhat related problem:

I'm using Rayon to count duplicate files within a directory. To keep a progress bar updated, I would like to send the current progress of the operation to a different thread that displays the progress bar (since updating the progress bar from within the thread that does the work is very slow). I've reduced the example to the following code:
https://gist.github.com/GyrosOfWar/059b70688904c6b5b433

The issue I'm having is that I can't share the Sender in the for_each closure, since it's not Sync. The usual way would be to clone the Sender, but in this case, it doesn't seem to help. I'm pretty sure the problem is just me not understanding Rust's concurrency quite right, but I can't figure this out and Google is not helping (since cloning the Sender doesn't help)

@nikomatsakis
Copy link
Member

@GyrosOfWar interesting. So this is actually just a bit tricky. (As an aside, I wonder if there is a good reason that the channel is not Sync, I should check.) What you have to do is to clone the sender once for each directory, so something like:

let senders: Vec<_> = nums.iter().map(|_| sender.clone()).collect();
nums.into_par_iter().zip(senders).for_each(move |i, sender| {
        let d = rx.clone();
        d.send(i).unwrap();
});

but this seems sub-optimal. If nothing else, it might be nice to have a kind of "repeater" iterator for cloning things that are not Send only as needed. In other words, it would work something like

nums.into_par_iter().zip(repeat(sender)).for_each(|(num, sender_ref)| ...)

where sender_ref has type &Sender. The idea would be that it takes ownership of the sender and clones it for each thread split, but then lends it out for sequential processing (so as to do the minimal number of clones possible). That ought to be fairly easy to whip-up.

Of course the hard part is to come up with a good name: it seems like there are a number of similar "adapters" one might want for distinct ownership semantics (e.g., one that clones eagerly, since cloning is cheap, etc)

@tikue
Copy link

tikue commented Mar 28, 2016

Channels lazily become Sync internally when cloned, as an optimization for the case when they don't need to be Sync. There was a rejected RFC recently about adding a Sync channel function. The problem as seen by the libs team with exposing a Sender that is Sync as its own type is that all existing code would be incompatible with it, as well as being utterly confusing with the existing inaptly-namer SyncSender that means a completely different kind of synchronous.

@cuviper
Copy link
Member

cuviper commented Mar 28, 2016

.zip(repeat(sender))

Hmm, zip requires an IndexedParallelIterator, but repeat doesn't know the length. It could lie and say usize::MAX, but I think it might be better to have zip+repeat be its own special method, kind of like enumerate is nearly zip+range.

@nikomatsakis
Copy link
Member

Of course, it'd probably be easier to just not use the channel, and instead use one of the queue's available from crossbeam, e.g. https://github.com/aturon/crossbeam/blob/master/src/sync/ms_queue.rs.

@GyrosOfWar
Copy link

Thanks for the help! I'm using that lock-free queue plus a scoped thread from the crossbeam crate now and it seems to work well. Haven't tested it a lot yet, though. For anyone who's interested, the code is here (specifically, the find_duplicates method in src/dupfinder.rs)

@nikomatsakis
Copy link
Member

@GyrosOfWar I guess you mean this code? It seems a bit surprising though because I think that the spawned thread will be joined by the end of-scope, and hence I don't think that thread is really doing anything.

If however you move the into_par_iter code up inside the scope closure, that would probably work.

FWIW this WIP API https://github.com/nikomatsakis/rayon/pull/71 would also help, but obviously that's not landed yet.

@nikomatsakis
Copy link
Member

PR #131 adds for_each_atomic, which basically fulfills the original feature request.

@nikomatsakis
Copy link
Member

Sorry, I should have referred to https://github.com/nikomatsakis/rayon/pull/124, which specifically adds ability to collect into a hashmap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants