Compile numba code with nogil=True #159
Conversation
This helps when parallelizing around sparse
Generic example: In [1]: import numba
In [2]: import numpy as np
In [3]: x = np.random.random(100000000)
In [4]: @numba.jit(nopython=True)
...: def f(x):
...: total = 0
...: for i in range(len(x)):
...: total += x[i]
...: return total
...:
In [5]: f(x) # run once to compile
Out[5]: 50001234.14227918
In [6]: %load_ext ptime
In [7]: %ptime -n 4 f(x)
Total serial time: 0.52 s
Total parallel time: 0.50 s
For a 1.05X speedup across 4 threads
In [8]: @numba.jit(nopython=True, nogil=True)
...: def f(x):
...: total = 0
...: for i in range(len(x)):
...: total += x[i]
...: return total
...:
In [9]: f(x) # run once to compile
Out[9]: 50001234.14227918
In [10]: %ptime -n 4 f(x)
Total serial time: 0.52 s
Total parallel time: 0.14 s
For a 3.63X speedup across 4 threads |
Codecov Report
@@ Coverage Diff @@
## master #159 +/- ##
=======================================
Coverage 96.92% 96.92%
=======================================
Files 11 11
Lines 1205 1205
=======================================
Hits 1168 1168
Misses 37 37
Continue to review full report at Codecov.
|
Is there any risk of incorrect results when doing this, assuming no race conditions? |
That's probably more a question for you than it is for me. Do any of these functions rely on global state or mutate their inputs? If they were called concurrently in multiple threads would that be a problem? |
Usually the answer is "no", but we may be doing inplace modifications for performance sake,. |
Previously these functions were run with a giant lock around them so that no other Python function would run at the same time. Now we are removing that lock. |
No, they don't do that. :-) Thanks for this, I wondered why the |
This gets a +1. Thanks for this. |
There are very likely other reasons than just this. This is something that will likely improve over time as it gets some use. |
Thanks for the review @hameerabbasi . Merging! |
This helps when parallelizing around sparse