Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance improvements to np.full and np.ones #7682

Merged
merged 2 commits into from Jan 21, 2022

Conversation

rishi-kulkarni
Copy link
Contributor

Closes #7665. np.full and np.ones use the slow np.ndindex iterator, which results in both functions being about twice as slow as their numpy counterparts. Iterating over the flat array instead makes the functions as fast as their numpy versions.

@gmarkall
Copy link
Member

Many thanks for the PR! I'll take a look at this on Wednesday when I'm back at work.

@esc
Copy link
Member

esc commented Jan 19, 2022

It will probably be of relevance to take the following into account when reviewing this issue:

#7753

@esc
Copy link
Member

esc commented Jan 19, 2022

@rishi-kulkarni do you have any benchmarks to illustrate the performance gains here?

@rishi-kulkarni
Copy link
Contributor Author

rishi-kulkarni commented Jan 19, 2022

Yep, here it is. This is on current master branch:

import numpy as np
from numba import jit

@jit
def nb_full(shape, fill_value, dtype):
    ans = np.empty(shape, dtype)
    fl = ans.flat
    for idx, v in enumerate(fl):
      fl[idx] = fill_value
    return ans

@jit
def f4(shape):
  return nb_full(shape, np.nan, np.float64)

@jit
def f1(shape):
    return np.full(shape, np.nan, np.float64)

shape = (128000, 1000)

f1(shape)
f4(shape)

%timeit np.full(shape, np.nan, np.float64)

%timeit f1(shape)

%timeit f4(shape)


# output
116 ms ± 4.79 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) # numpy
206 ms ± 1.28 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) # current implementation
110 ms ± 2.27 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) # array.flat implementation 

On this branch:

116 ms ± 5.48 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) # numpy
114 ms ± 7.35 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) # proposed implementation
114 ms ± 2.79 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) # array.flat implementation 

gmarkall
gmarkall previously approved these changes Jan 20, 2022
Copy link
Member

@gmarkall gmarkall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Whilst fixing #7753 might also fix this issue, this change doesn't appear to add significant complexity and fixes #7665 right now, so I think it would be OK to merge.

Note that we still have a big difference between Numba and NumPy for zeros, because NumPy uses calloc:

import numpy as np
from numba import jit


@jit
def f0(shape):
    return np.zeros(shape, dtype=np.float64)


shape = (128000, 1000)

f0(shape)

%timeit f0(shape)

%timeit np.zeros(shape, dtype=np.float64)

produces:

401 ms ± 119 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
8.33 µs ± 40.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

(and is unaffected by this branch).

Copy link
Contributor

@stuartarchibald stuartarchibald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch and fixes.

@stuartarchibald stuartarchibald added 5 - Ready to merge Review and testing done, is ready to merge and removed 3 - Ready for Review labels Jan 21, 2022
@sklam sklam merged commit 14b170a into numba:master Jan 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to merge Review and testing done, is ready to merge Effort - short Short size effort needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Poor np.full performace with jit
5 participants