New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
performance improvements to np.full and np.ones #7682
Conversation
Many thanks for the PR! I'll take a look at this on Wednesday when I'm back at work. |
It will probably be of relevance to take the following into account when reviewing this issue: |
@rishi-kulkarni do you have any benchmarks to illustrate the performance gains here? |
Yep, here it is. This is on current master branch: import numpy as np
from numba import jit
@jit
def nb_full(shape, fill_value, dtype):
ans = np.empty(shape, dtype)
fl = ans.flat
for idx, v in enumerate(fl):
fl[idx] = fill_value
return ans
@jit
def f4(shape):
return nb_full(shape, np.nan, np.float64)
@jit
def f1(shape):
return np.full(shape, np.nan, np.float64)
shape = (128000, 1000)
f1(shape)
f4(shape)
%timeit np.full(shape, np.nan, np.float64)
%timeit f1(shape)
%timeit f4(shape)
# output
116 ms ± 4.79 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) # numpy
206 ms ± 1.28 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) # current implementation
110 ms ± 2.27 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) # array.flat implementation On this branch: 116 ms ± 5.48 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) # numpy
114 ms ± 7.35 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) # proposed implementation
114 ms ± 2.79 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) # array.flat implementation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. Whilst fixing #7753 might also fix this issue, this change doesn't appear to add significant complexity and fixes #7665 right now, so I think it would be OK to merge.
Note that we still have a big difference between Numba and NumPy for zeros
, because NumPy uses calloc
:
import numpy as np
from numba import jit
@jit
def f0(shape):
return np.zeros(shape, dtype=np.float64)
shape = (128000, 1000)
f0(shape)
%timeit f0(shape)
%timeit np.zeros(shape, dtype=np.float64)
produces:
401 ms ± 119 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
8.33 µs ± 40.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
(and is unaffected by this branch).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch and fixes.
Closes #7665. np.full and np.ones use the slow np.ndindex iterator, which results in both functions being about twice as slow as their numpy counterparts. Iterating over the flat array instead makes the functions as fast as their numpy versions.