New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mac/Linux with multiprocessing, all workers are seeded the same random state #14729
Comments
Without diving too deeply into your code, I wonder if you have seen the new (as of 1.17) random.BitGenerator api? In particular, you might be interested in the work done to ensure parallel processes get "independent" streams. Please let us know if we could improve the documentation to make it clearer, and if it helps solve your problem. |
Thanks for the information, I was not aware of the new API, I'll give it a try and get back to you soon. |
Something I've noticed right away: if I use the multiprocessing.set_start_method('spawn')
https://docs.python.org/3.7/library/multiprocessing.html Now let me take a look at your new API. |
@mattip I think I'll stop here for now. I will investigate the new RNG API later, if I decide to make changes to my project. For now, though, I'm done - the master branch now does everything I need. Thank you. |
Closing. Thanks for the update. Hopefully you will try the new API. |
Reproducing code example:
Full code is here, I will leave this branch untouched so you can see the behavior I'm talking about:
https://github.com/FlorinAndrei/nsphere/tree/numpy-mp
On Mac or Linux, edit
xpu_workers.py
and comment out the rseed lines, and the bug will be triggered.You can tell the bug has been triggered because there are very few dots in the Monte Carlo simulation graph, in the Jupyter notebook. There are supposedly 100 dots there, but due to the bug there are far fewer - and the whole population is far less random, which affects the app as a whole.
What's really going on:
I create a pool of workers with:
And within the worker I have something like this:
Parts of the pts array are returned as samples from all workers to the master process, and are collated in the work_out matrix. Each worker is supposed to make random samples - and of course the expectation is that each sample is different. https://dilbert.com/strip/2001-10-25
On Windows this works great.
On Mac and Linux, all pts arrays are generated with the exact same "random" content. The samples from workers are all identical. Within each sample the content looks random enough (just an eyeball estimate) but all samples coincide perfectly with each other.
It's a very frustrating bug, hard to figure out the cause, and makes the code misbehave in weird ways.
I have to do this in each worker to get rid of the bug:
Numpy/Python version information:
The text was updated successfully, but these errors were encountered: