Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel creation of random numbers #71

Open
dschrempf opened this issue Sep 18, 2019 · 3 comments
Open

Parallel creation of random numbers #71

dschrempf opened this issue Sep 18, 2019 · 3 comments

Comments

@dschrempf
Copy link

Hello, many times I use mwc-random to perform a calculation repeatedly, and then I collect the results in form of a list (or vector). Is there a way to parallelize this computation?

For example:

f :: PrimMonad m => Int -> Gen (PrimState m) -> m [Int]
f n g = replicateM n $ uniform g

I can parallelize this computation by splitting the generator, for example like so

splitGen :: PrimMonad m => Int -> Gen (PrimState m) -> m [Gen (PrimState m)]
splitGen n gen
  | n <= 0    = return []
  | otherwise = do
      seeds :: [V.Vector Word32] <- replicateM n $ uniformVector gen 256
      mapM initialize seeds

(One needs to activate scoped type variables for this to work). But then, in order to use these generators in parallel, I can only use mapConcurrently from the async library at the moment which requires the IO monad, something I do not really want.

Is there an easier possibility to perform these computations in a given number of parallel threads, each dragging along their private generator? I was, for example, thinking of parListChunk from Control.Parallel.Strategies, or similar techniques.

Thank you!

PS: I also found https://stackoverflow.com/a/16250010, but I do not really understand what's going on there.

@Shimuuar
Copy link
Collaborator

No good way that I know. Main problem is mwc-random is not well suited for parallelization. It has huge mutable state. Something like PCG or counting generator would work much better in this case.

But since computation morally pure I think it's possible to hide IO and forking behind unsafePerformIO or unsafeSTtoIOfromprimitive` I however not sure how well forking will interact with unsafety

Another concern. Number of thread should be determined beforehands since setting number of workers equal to number of cores will produce different result for different setting of RTS

@dschrempf
Copy link
Author

dschrempf commented Sep 23, 2019 via email

@lehins
Copy link
Contributor

lehins commented Feb 23, 2020

Main problem is mwc-random is not well suited for parallelization. It has huge mutable state.

This is true, but it doesn't mean you can't use separate mutable states per core. This means that there is a perfectly fine approach to parallelization here, in fact it parallelizes pretty well, which I can't say the same about PCG, see the benchmarks

Here is a function with a concrete example using mwc-random that can generate an array (which can be converted to a Vector in O(1) time and memory) of random numbers in parallel with randomArraWS

Here is another a bit lower level example that doesn't depend on massiv: withSchedulerWS and generates a list of random numbers in parallel, but that will likely be slower that generating an array.

See my SO answer for more detail on this topic.

computation morally pure I think it's possible to hide IO and forking behind unsafePerformIO

This is only possible with unsplittable stateful generators only if you do deterministic scheduling, which is not the case in above solutions, so generation will have to be done in IO, but considering we are talking about generation of random numbers I don't think it poses a problem for most scenarios.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants