New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor np.median performance #2458
Milestone
Comments
stuartarchibald
added a commit
to stuartarchibald/numba
that referenced
this issue
Jul 17, 2017
This fixes the reported poor performance in `np.median`, it copies in the rest of the algorithm for partitioning from the Numba quicksort implementation.
stuartarchibald
added a commit
to stuartarchibald/numba
that referenced
this issue
Jul 17, 2017
This fixes the reported poor performance in `np.median`, it copies in the rest of the algorithm for partitioning from the Numba quicksort implementation.
stuartarchibald
added a commit
to stuartarchibald/numba
that referenced
this issue
Jul 17, 2017
This fixes the reported poor performance in `np.median`, it copies in the rest of the algorithm for partitioning from the Numba quicksort implementation.
stuartarchibald
added a commit
to stuartarchibald/numba
that referenced
this issue
Jul 17, 2017
This fixes the reported poor performance in `np.median`, it copies in the rest of the algorithm for partitioning from the Numba quicksort implementation. Fixes numba#2458
seibert
added a commit
that referenced
this issue
Jul 17, 2017
Fixes #2458, poor performance in np.median
ninegua
pushed a commit
to IntelLabs/numba
that referenced
this issue
Aug 8, 2017
This fixes the reported poor performance in `np.median`, it copies in the rest of the algorithm for partitioning from the Numba quicksort implementation. Fixes numba#2458
bmerry
added a commit
to ska-sa/katsdpdockerbase
that referenced
this issue
Mar 20, 2018
This fixes poor performance of the 2D flagger in katsdpsigproc, possibly related to numba/numba#2458.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
For certain arrays the
np.median
function slows down dramatically. The slowdown occurs when all (or several) values in the array are identical, which hints at some sorting issue.See also some discussion on the mailing list, Antoine Pitrou has some great comments on the cause.
https://groups.google.com/a/continuum.io/forum/#!topic/numba-users/ENbEof1CmVU
Consider the following example which calculates the median repeatedly. A pure Numpy version and an equivalent Numba njit function:
Two arrays, one with all identical values, and another with random data:
The timings on my machine (Win 10, Py3.6 and Numba 0.34):
np_eq = %timeit -o numpy_median(x_equal)
298 ms ± 37.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
np_ra = %timeit -o numpy_median(x_random)
288 ms ± 24.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
nb_eq = %timeit -o numba_median(x_equal)
32.4 s ± 787 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
nb_ra = %timeit -o numba_median(x_random)
93.3 ms ± 8.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
So for random data Numba is a few times faster, but for identical data, Numba is over 300x times slower. I get similar results with Numba 0.33, 0.32 and 0.30.
Here is a notebook containing the code above, and a custom median implementation using
np.sort
, which doesn't suffer from the same performance hit.https://gist.github.com/RutgerK/57297c37cd6a62349e789bc23c8f04ec
edit:
Btw i encountered this while working on an implementation of "repeated medians regression". I don't mind adding this to the examples in this repo if its considered a nice addition. I'm not sure if it really adds something in the sense of explaining how Numba works.
https://gist.github.com/RutgerK/66b3edc665c0897ea4e777e54ec6a985
The text was updated successfully, but these errors were encountered: