Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: replace GIL of random module with a per state lock #4609

Merged
merged 1 commit into from May 4, 2014

Conversation

juliantaylor
Copy link
Contributor

The random module currently relies on the GIL for the state
synchronization which hampers threading performance.
Instead add a lock to the RandomState object and take it for all
operations calling into randomkit while releasing the GIL.
This allows parallizing random number generation using multiple states
or asynchronous generation in a worker thread.

Note that with a large number of threads the standard mersenne twister
used may exhibit overlap if the number of parallel streams is large
compared to the size of the state space, though due to the limited
scalability of Python in regards to threads this is likely not a big
issue.

@juliantaylor
Copy link
Contributor Author

work in progress posted early for comments, please no style review
probably not all that can be covered is covered yet by the changeset.
currently only tested superficially (parallel st.randn with 4 threads and 4 states gives the same result as serial)

@rkern your the expert on random, do you see any issues with the general approach?

@@ -55,65 +55,66 @@ cdef extern from "randomkit.h":
void rk_seed(unsigned long seed, rk_state *state)
rk_error rk_randomseed(rk_state *state)
unsigned long rk_random(rk_state *state)
long rk_long(rk_state *state)
long rk_long(rk_state *state) nogil
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

relies on nogil in a declaration only indicates the gil can be released, it does not release it when it is called.
cython documentation says it behaves that way.

@rkern
Copy link
Member

rkern commented Apr 11, 2014

I have no particular objection, but I do wonder about the use case. How much time in your program is actually being spent inside these routines? What kind of speedup are you getting? What is the performance overhead for single-threaded use? Acquiring Python Locks can be expensive.

@juliantaylor
Copy link
Contributor Author

I haven't got an application where it is really relevant, but requiring fast generation of random numbers is not uncommon. We could ping the list and ask.
the random number generation is one of the few places in numpy where threading without blocking actually is actually worthwhile, e.g. generation normal random numbers requires more CPU than memory bandwidth so its profits reasonably linear.

acquiring an uncontended lock is not very expensive:

In [4]: %timeit l.acquire(); l.release()
1000000 loops, best of 3: 262 ns per loop

@juliantaylor
Copy link
Contributor Author

added gil release to some more operations and added a basic test, should be ready for review

The random module currently relies on the GIL for the state
synchronization which hampers threading performance.
Instead add a lock to the RandomState object and take it for all
operations calling into randomkit while releasing the GIL.
This allows parallizing random number generation using multiple states
or asynchronous generation in a worker thread.

Note that with a large number of threads the standard mersenne twister
used may exhibit overlap if the number of parallel streams is large
compared to the size of the state space, though due to the limited
scalability of Python in regards to threads this is likely not a big
issue.
@charris
Copy link
Member

charris commented May 4, 2014

Note that with a large number of threads the standard mersenne twister
used may exhibit overlap if the number of parallel streams is large
compared to the size of the state space.

What do you mean by overlap here? Is it true that the random numbers returned in the multi-threaded case are no longer deterministic, but depend on the order that the threads are run? Might be worth mentioning if so.

@rkern
Copy link
Member

rkern commented May 4, 2014

The Mersenne Twister has one non-trivial orbit. The state just picks which point along that orbit the RandomState is on at any given time. The more parallel streams you have, the more likely one will eventually get to a state that another had been on. The streams of pseudorandom numbers will not be independent since one stream will be generating the same exact sequence that another had generated. Non-determinism isn't the issue here.

@rkern
Copy link
Member

rkern commented May 4, 2014

But this is just a property of the Mersenne Twister in general. It has nothing to do with this change.

@charris
Copy link
Member

charris commented May 4, 2014

OK. LGTM and there are no standing objections, so in it goes. Thanks Julian.

charris added a commit that referenced this pull request May 4, 2014
ENH: replace GIL of random module with a per state lock
@charris charris merged commit 34fc091 into numpy:master May 4, 2014
@charris
Copy link
Member

charris commented May 4, 2014

Just to be clear, if each thread has it's own random streams with their own state, this allows the GIL to be released and the lock serializes access to the mersenne twistor among the streams?

@rkern
Copy link
Member

rkern commented May 4, 2014

If you have questions, it's best to ask before merging. ;-) Each RandomState has its own lock, on the off-chance that one happens to be shared between threads (e.g. the global one behind the numpy.random aliases). If each thread has its own RandomState, nothing will ever contend for each Lock.

@juliantaylor
Copy link
Contributor Author

the lock is only required if multiple threads use the same state, if two threads use two states there should be no contention.
Sharing a state between multiple threads requires the lock but the stream would be nondetermistic in which thread gets which numbers and have serial performance. Its the same as if the GIL is used, so this is not so useful.

The intention of the state lock is to allow producing random numbers from multiple states in parallel or to produce them in an asynchronous worker thread that does not block the rest of the program.

@charris
Copy link
Member

charris commented May 4, 2014

@rkern Sometimes, I gamble ;) But I do have a fair amount of trust in Julian and yourself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants