Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an Iterator::scan equivalent #329

Open
cuviper opened this issue May 1, 2017 · 13 comments
Open

Add an Iterator::scan equivalent #329

cuviper opened this issue May 1, 2017 · 13 comments

Comments

@cuviper
Copy link
Member

cuviper commented May 1, 2017

fn scan<St, B, F>(self, initial_state: St, f: F) -> Scan<Self, St, F>
    where F: FnMut(&mut St, Self::Item) -> Option<B>
{ ... }

@nikomatsakis's notes from the wiki:

  • this is generally possible (it will be a bit different; more like reduce), but it is non-trivial
  • it requires >1 pass and hence produces an intermediate collection
  • basically you would do a parallel fold operation, performing a sequential scan for each segment
    • the result of your scan is a VecTree<Vec> of same total length as your input iterator
    • this gives you partial sums (to consider one concrete application) at each point
  • you can then traverse the tree (sequentially) to compute, for each vector, the total sum of preceding vectors
  • you can then re-traverse the tree (in parallel) and add this total sum to each element of the vector (in place)
  • that last step can itself drive another consumer, so this can still be a parallel adapter
    • if we are clever, it can even be an exact one, though it will require a nice VecTree implementation
@martinhath
Copy link

I'd like to try this out, but I'm having a hard time understanding the notes from the wiki.

basically you would do a parallel fold operation, performing a sequential scan for each segment

Ok, so the iterator will be split up to a point, and then we scan each part individually (more or less by using std::iter::iterator::scan), but in parallel? If so, how do we handle the state, as it is &mut? Or do we assume the state is Copy, and copy it for each sequential scan such that all segments starts with a fresh state? The alternative (the one I see, anyways) is to just do everything sequentially, but what do we gain by that?

the result of your scan is a VecTree of same total length as your input iterator

I'm not sure what this means. VecTree? Is this just the "split tree", in which the nodes are the segments as Vecs?

The last three points are also unclear, but I don't want to flat out take a guess at what is meant, as it might be clear if I can understand the above 😄

@cuviper
Copy link
Member Author

cuviper commented Jun 5, 2017

I also asked @nikomatsakis to clarify, because I don't really get it either. 😄

I can answer about VecTree: that's a hypothetical tree-of-vectors that we had thought about for general purpose collect -- basically folding into a vector locally, then reducing those into a tree that maintains structure. But we realized that the structure doesn't actually matter, only the order, so today the collect implementations that need it just use LinkedList<Vec<T>>.

@nikomatsakis
Copy link
Member

Sorry for not responding. I'll try to figure out what I meant and write it up. ;)

@scottmcm
Copy link

Just dropped by to +1 this since I was looking to compute a prefix sum with rayon.

@wagnerf42
Copy link
Contributor

hi, I've done a LOT of prefix sum implementations.
you can find a complex one here : http://www-id.imag.fr/Laboratoire/Membres/Wagner_Frederic/beyond-fold-work.html and another one here : https://github.com/wagnerf42/rayon-adaptive/blob/new_api/examples/cut_prefix.rs with many other prefixes in the same folder.

I also have a "successors" implementation in my crate.

@Qqwy
Copy link
Contributor

Qqwy commented Oct 21, 2019

@wagnerf42 There is a 'TODO: wip' in the code you linked to. care to elaborate?

Possibly related/useful is this information about doing this kind of stuff on the GPU: https://people.eecs.berkeley.edu/~driscoll/cs267/papers/gpugems3_ch39.html. If I understand it correctly, it explains that even the first traversal/ creation of the tree (that the first post in this issue refers to) could be done in parallel, with every higher level of the tree being worked on by half of the threads of the lower level.

@zgbkdlm
Copy link

zgbkdlm commented Nov 20, 2021

I has been 5 years. ^^

@Qqwy
Copy link
Contributor

Qqwy commented Nov 21, 2021

I has been 5 years. ^^

It has, but it is never too late to add a cool and useful feature 😉

@zgbkdlm
Copy link

zgbkdlm commented Nov 21, 2021

I has been 5 years. ^^

It has, but it is never too late to add a cool and useful feature 😉

Agree! It would be excited to see scan implemented in Rayon!

@arieluy
Copy link

arieluy commented Mar 2, 2023

I'm interested in working on this. I've tried to write some code to test out the ideas here and in #393. I think a LinkedList might be sufficient, rather than adding the overhead of a tree.

@wagnerf42
Copy link
Contributor

could you give some details on how you want to do it ? there are several ways, with different pros and cons

@arieluy
Copy link

arieluy commented Mar 2, 2023

Generally following the way proposed in the first comment of this issue, but without the VecTree (also discussed here):

  • a parallel fold that performs sequential scan, accumulating into a Vec<T>
  • reduce these into a LinkedList<Vec<T>>
  • traverse sequentially to compute the offset for each segment
  • add the offsets into each vec in parallel

The first step would use a consumer, then some sequential work, and the last step would create a producer (it can be indexed with a little extra work).

@arieluy
Copy link

arieluy commented Jan 27, 2024

I heard from someone who wanted to use my above PR in their project, so I've released it as a separate crate: https://crates.io/crates/rayon-scan
I hope this will be helpful to anyone else in this thread who is looking for this feature!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants