Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mac/Linux with multiprocessing, all workers are seeded the same random state #14729

Closed
FlorinAndrei opened this issue Oct 16, 2019 · 5 comments
Closed

Comments

@FlorinAndrei
Copy link

Reproducing code example:

Full code is here, I will leave this branch untouched so you can see the behavior I'm talking about:

https://github.com/FlorinAndrei/nsphere/tree/numpy-mp

On Mac or Linux, edit xpu_workers.py and comment out the rseed lines, and the bug will be triggered.

You can tell the bug has been triggered because there are very few dots in the Monte Carlo simulation graph, in the Jupyter notebook. There are supposedly 100 dots there, but due to the bug there are far fewer - and the whole population is far less random, which affects the app as a whole.


What's really going on:

I create a pool of workers with:

import multiprocessing
from multiprocessing import Pool

p = Pool(processes = num_p)        
arglist = [(points, d, num_p, sysmem, gpumem, pointloops)] * num_p
work_out = p.map(make_dots, arglist)

And within the worker I have something like this:

pts = np.random.random_sample((points, d)) - 0.5

Parts of the pts array are returned as samples from all workers to the master process, and are collated in the work_out matrix. Each worker is supposed to make random samples - and of course the expectation is that each sample is different. https://dilbert.com/strip/2001-10-25

On Windows this works great.

On Mac and Linux, all pts arrays are generated with the exact same "random" content. The samples from workers are all identical. Within each sample the content looks random enough (just an eyeball estimate) but all samples coincide perfectly with each other.

It's a very frustrating bug, hard to figure out the cause, and makes the code misbehave in weird ways.

I have to do this in each worker to get rid of the bug:

rseed = random.randint(0, 4294967296)
xp.random.seed(rseed)

Numpy/Python version information:

1.16.4 3.7.4 (default, Jul  9 2019, 18:13:23) 
[Clang 10.0.1 (clang-1001.0.46.4)]
@mattip
Copy link
Member

mattip commented Oct 16, 2019

Without diving too deeply into your code, I wonder if you have seen the new (as of 1.17) random.BitGenerator api? In particular, you might be interested in the work done to ensure parallel processes get "independent" streams. Please let us know if we could improve the documentation to make it clearer, and if it helps solve your problem.

@FlorinAndrei
Copy link
Author

Thanks for the information, I was not aware of the new API, I'll give it a try and get back to you soon.

@FlorinAndrei
Copy link
Author

FlorinAndrei commented Oct 17, 2019

Something I've noticed right away: if I use the spawn method for the multiprocessing pool, then the random sequence issue disappears - the random number generator makes different sequences in each worker, as it should.

multiprocessing.set_start_method('spawn')

spawn is the default on Windows, which is probably why the RNG works fine on Windows. On Unix-like OSes, the default method is fork. If I force spawn regardless of the OS, the RNG is fine.

https://docs.python.org/3.7/library/multiprocessing.html

Now let me take a look at your new API.

@FlorinAndrei
Copy link
Author

FlorinAndrei commented Oct 18, 2019

@mattip I think I'll stop here for now. set_start_method('spawn') works fine for me and ensures that each worker gets a different RNG sequence.

I will investigate the new RNG API later, if I decide to make changes to my project. For now, though, I'm done - the master branch now does everything I need.

Thank you.

@mattip
Copy link
Member

mattip commented Nov 4, 2019

Closing. Thanks for the update. Hopefully you will try the new API.

@mattip mattip closed this as completed Nov 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants