New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: replace GIL of random module with a per state lock #4609
Conversation
work in progress posted early for comments, please no style review @rkern your the expert on random, do you see any issues with the general approach? |
@@ -55,65 +55,66 @@ cdef extern from "randomkit.h": | |||
void rk_seed(unsigned long seed, rk_state *state) | |||
rk_error rk_randomseed(rk_state *state) | |||
unsigned long rk_random(rk_state *state) | |||
long rk_long(rk_state *state) | |||
long rk_long(rk_state *state) nogil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
relies on nogil in a declaration only indicates the gil can be released, it does not release it when it is called.
cython documentation says it behaves that way.
I have no particular objection, but I do wonder about the use case. How much time in your program is actually being spent inside these routines? What kind of speedup are you getting? What is the performance overhead for single-threaded use? Acquiring Python |
I haven't got an application where it is really relevant, but requiring fast generation of random numbers is not uncommon. We could ping the list and ask. acquiring an uncontended lock is not very expensive:
|
added gil release to some more operations and added a basic test, should be ready for review |
The random module currently relies on the GIL for the state synchronization which hampers threading performance. Instead add a lock to the RandomState object and take it for all operations calling into randomkit while releasing the GIL. This allows parallizing random number generation using multiple states or asynchronous generation in a worker thread. Note that with a large number of threads the standard mersenne twister used may exhibit overlap if the number of parallel streams is large compared to the size of the state space, though due to the limited scalability of Python in regards to threads this is likely not a big issue.
What do you mean by overlap here? Is it true that the random numbers returned in the multi-threaded case are no longer deterministic, but depend on the order that the threads are run? Might be worth mentioning if so. |
The Mersenne Twister has one non-trivial orbit. The state just picks which point along that orbit the |
But this is just a property of the Mersenne Twister in general. It has nothing to do with this change. |
OK. LGTM and there are no standing objections, so in it goes. Thanks Julian. |
ENH: replace GIL of random module with a per state lock
Just to be clear, if each thread has it's own random streams with their own state, this allows the GIL to be released and the lock serializes access to the mersenne twistor among the streams? |
If you have questions, it's best to ask before merging. ;-) Each |
the lock is only required if multiple threads use the same state, if two threads use two states there should be no contention. The intention of the state lock is to allow producing random numbers from multiple states in parallel or to produce them in an asynchronous worker thread that does not block the rest of the program. |
@rkern Sometimes, I gamble ;) But I do have a fair amount of trust in Julian and yourself. |
The random module currently relies on the GIL for the state
synchronization which hampers threading performance.
Instead add a lock to the RandomState object and take it for all
operations calling into randomkit while releasing the GIL.
This allows parallizing random number generation using multiple states
or asynchronous generation in a worker thread.
Note that with a large number of threads the standard mersenne twister
used may exhibit overlap if the number of parallel streams is large
compared to the size of the state space, though due to the limited
scalability of Python in regards to threads this is likely not a big
issue.