performance improvements to np.full and np.ones #7682

rishi-kulkarni · 2021-12-21T22:31:38Z

Closes #7665. np.full and np.ones use the slow np.ndindex iterator, which results in both functions being about twice as slow as their numpy counterparts. Iterating over the flat array instead makes the functions as fast as their numpy versions.

gmarkall · 2021-12-22T23:54:39Z

Many thanks for the PR! I'll take a look at this on Wednesday when I'm back at work.

esc · 2022-01-19T13:56:26Z

It will probably be of relevance to take the following into account when reviewing this issue:

#7753

esc · 2022-01-19T13:57:25Z

@rishi-kulkarni do you have any benchmarks to illustrate the performance gains here?

rishi-kulkarni · 2022-01-19T15:21:08Z

Yep, here it is. This is on current master branch:

import numpy as np
from numba import jit

@jit
def nb_full(shape, fill_value, dtype):
    ans = np.empty(shape, dtype)
    fl = ans.flat
    for idx, v in enumerate(fl):
      fl[idx] = fill_value
    return ans

@jit
def f4(shape):
  return nb_full(shape, np.nan, np.float64)

@jit
def f1(shape):
    return np.full(shape, np.nan, np.float64)

shape = (128000, 1000)

f1(shape)
f4(shape)

%timeit np.full(shape, np.nan, np.float64)

%timeit f1(shape)

%timeit f4(shape)


# output
116 ms ± 4.79 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) # numpy
206 ms ± 1.28 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) # current implementation
110 ms ± 2.27 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) # array.flat implementation

On this branch:

116 ms ± 5.48 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) # numpy
114 ms ± 7.35 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) # proposed implementation
114 ms ± 2.79 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) # array.flat implementation

gmarkall

This looks good to me. Whilst fixing #7753 might also fix this issue, this change doesn't appear to add significant complexity and fixes #7665 right now, so I think it would be OK to merge.

Note that we still have a big difference between Numba and NumPy for zeros, because NumPy uses calloc:

import numpy as np
from numba import jit


@jit
def f0(shape):
    return np.zeros(shape, dtype=np.float64)


shape = (128000, 1000)

f0(shape)

%timeit f0(shape)

%timeit np.zeros(shape, dtype=np.float64)

produces:

401 ms ± 119 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
8.33 µs ± 40.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

(and is unaffected by this branch).

numba/np/arrayobj.py

stuartarchibald

Thanks for the patch and fixes.

performance improvements to np.full and np.ones

e1dc133

rishi-kulkarni requested review from esc, sklam and stuartarchibald as code owners December 21, 2021 22:31

gmarkall added 3 - Ready for Review Effort - short Short size effort needed labels Dec 22, 2021

gmarkall added this to the Numba 0.56 RC milestone Dec 22, 2021

gmarkall previously approved these changes Jan 20, 2022

View reviewed changes

stuartarchibald reviewed Jan 20, 2022

View reviewed changes

numba/np/arrayobj.py Outdated Show resolved Hide resolved

use range(len()) instead of enumerate

540c910

rishi-kulkarni dismissed gmarkall’s stale review via 540c910 January 20, 2022 14:38

rishi-kulkarni requested a review from gmarkall January 20, 2022 19:11

stuartarchibald approved these changes Jan 21, 2022

View reviewed changes

stuartarchibald added 5 - Ready to merge Review and testing done, is ready to merge and removed 3 - Ready for Review labels Jan 21, 2022

sklam merged commit 14b170a into numba:master Jan 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

performance improvements to np.full and np.ones #7682

performance improvements to np.full and np.ones #7682

rishi-kulkarni commented Dec 21, 2021

gmarkall commented Dec 22, 2021

esc commented Jan 19, 2022

esc commented Jan 19, 2022

rishi-kulkarni commented Jan 19, 2022 •

edited

gmarkall left a comment

stuartarchibald left a comment

performance improvements to np.full and np.ones #7682

performance improvements to np.full and np.ones #7682

Conversation

rishi-kulkarni commented Dec 21, 2021

gmarkall commented Dec 22, 2021

esc commented Jan 19, 2022

esc commented Jan 19, 2022

rishi-kulkarni commented Jan 19, 2022 • edited

gmarkall left a comment

Choose a reason for hiding this comment

stuartarchibald left a comment

Choose a reason for hiding this comment

rishi-kulkarni commented Jan 19, 2022 •

edited