Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor np.median performance #2458

Closed
RutgerK opened this issue Jul 12, 2017 · 0 comments · Fixed by #2470
Closed

Poor np.median performance #2458

RutgerK opened this issue Jul 12, 2017 · 0 comments · Fixed by #2470
Assignees
Milestone

Comments

@RutgerK
Copy link

RutgerK commented Jul 12, 2017

For certain arrays the np.median function slows down dramatically. The slowdown occurs when all (or several) values in the array are identical, which hints at some sorting issue.

See also some discussion on the mailing list, Antoine Pitrou has some great comments on the cause.
https://groups.google.com/a/continuum.io/forum/#!topic/numba-users/ENbEof1CmVU

Consider the following example which calculates the median repeatedly. A pure Numpy version and an equivalent Numba njit function:

import numba
import numpy as np

def numpy_median(x):    
    for i in range(x.size):
        np.median(x)
        
@numba.njit
def numba_median(x):    
    for i in range(x.size):
        np.median(x)

Two arrays, one with all identical values, and another with random data:

n = 3000
x_equal = np.zeros(n) 
x_random = np.random.randn(n)
assert x_equal.dtype == x_random.dtype # both are f64

The timings on my machine (Win 10, Py3.6 and Numba 0.34):

np_eq = %timeit -o numpy_median(x_equal)
298 ms ± 37.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

np_ra = %timeit -o numpy_median(x_random)
288 ms ± 24.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

nb_eq = %timeit -o numba_median(x_equal)
32.4 s ± 787 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

nb_ra = %timeit -o numba_median(x_random)
93.3 ms ± 8.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

So for random data Numba is a few times faster, but for identical data, Numba is over 300x times slower. I get similar results with Numba 0.33, 0.32 and 0.30.

Here is a notebook containing the code above, and a custom median implementation using np.sort, which doesn't suffer from the same performance hit.
https://gist.github.com/RutgerK/57297c37cd6a62349e789bc23c8f04ec

edit:
Btw i encountered this while working on an implementation of "repeated medians regression". I don't mind adding this to the examples in this repo if its considered a nice addition. I'm not sure if it really adds something in the sense of explaining how Numba works.
https://gist.github.com/RutgerK/66b3edc665c0897ea4e777e54ec6a985

@seibert seibert added this to the Numba 0.35 RC milestone Jul 17, 2017
stuartarchibald added a commit to stuartarchibald/numba that referenced this issue Jul 17, 2017
This fixes the reported poor performance in `np.median`, it
copies in the rest of the algorithm for partitioning from the
Numba quicksort implementation.
stuartarchibald added a commit to stuartarchibald/numba that referenced this issue Jul 17, 2017
This fixes the reported poor performance in `np.median`, it
copies in the rest of the algorithm for partitioning from the
Numba quicksort implementation.
stuartarchibald added a commit to stuartarchibald/numba that referenced this issue Jul 17, 2017
This fixes the reported poor performance in `np.median`, it
copies in the rest of the algorithm for partitioning from the
Numba quicksort implementation.
stuartarchibald added a commit to stuartarchibald/numba that referenced this issue Jul 17, 2017
This fixes the reported poor performance in `np.median`, it
copies in the rest of the algorithm for partitioning from the
Numba quicksort implementation.

Fixes numba#2458
seibert added a commit that referenced this issue Jul 17, 2017
ninegua pushed a commit to IntelLabs/numba that referenced this issue Aug 8, 2017
This fixes the reported poor performance in `np.median`, it
copies in the rest of the algorithm for partitioning from the
Numba quicksort implementation.

Fixes numba#2458
bmerry added a commit to ska-sa/katsdpdockerbase that referenced this issue Mar 20, 2018
This fixes poor performance of the 2D flagger in katsdpsigproc, possibly
related to numba/numba#2458.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants