numpy 1.8 median faster than bottleneck #74

Closed
juliantaylor opened this Issue Sep 30, 2013 · 4 comments

Comments

Projects
None yet
2 participants
@juliantaylor

numpy 1.8 implements median now as a selection too and it is quite a bit faster than bottlenecks version while having a better worst case complexity.
It is implemented as iterative quickselect with fallback to median of median of 5 for lack of progress (introselect).

benchmark with bn and numpy git head/1.8.x (both compiled with gcc 4.7 -O2 on amd64)

In [1]: import numpy as np;
In [2]: import bottleneck as bn
In [46]: np.__version__
Out[46]: '1.8.0.dev-3e0f71a'
In [2]: bn.__version__
Out[2]: '0.8.0dev'

In [3]: d = np.arange((40000000.))
In [4]: print d.dtype
float64
In [5]: np.random.seed(546)
In [6]: np.random.shuffle(d.ravel())
In [7]: d2 = d.copy()

# one loop to not reuse overwritten data
In [8]: %timeit -n 1  np.median(d, overwrite_input=True)
1 loops, best of 3: 127 ms per loop

In [9]: %timeit -n 1  bn.median(d2)
1 loops, best of 3: 781 ms per loop
# caching effect?
In [10]: %timeit -n 1  np.median(d, overwrite_input=True)
1 loops, best of 3: 122 ms per loop

(reproduced with multiple seeds, so its not just a lucky pivot)

possibly bottlenecks could fall back to numpy if its >= 1.8 in the cases where its faster e.g. larger than a couple hundred elements, continuous axis unless big enough that copying is irrelevant.

note that bottlenecks default benchmark is unfair as np.median copies the input by default while bn.median does not, for 1000,1000 data this is very relevant.

@kwgoodman

This comment has been minimized.

Show comment Hide comment
@kwgoodman

kwgoodman Oct 1, 2013

Owner

Thanks for reporting. I did not know that the overhaul of np.median was finished (or, for that matter, started).

Bottleneck does not change the input array:

In [1]: a = np.random.rand(5)
In [2]: a
Out[2]: array([ 0.60412969,  0.90867329,  0.13375729,  0.8580905 ,  0.87165752])
In [3]: bn.median(a)
Out[3]: 0.85809050231714357
In [4]: a
Out[4]: array([ 0.60412969,  0.90867329,  0.13375729,  0.8580905 ,  0.87165752])

I'm sure numpy will still be faster but can you update the timings with overwrite_input=False?

Owner

kwgoodman commented Oct 1, 2013

Thanks for reporting. I did not know that the overhaul of np.median was finished (or, for that matter, started).

Bottleneck does not change the input array:

In [1]: a = np.random.rand(5)
In [2]: a
Out[2]: array([ 0.60412969,  0.90867329,  0.13375729,  0.8580905 ,  0.87165752])
In [3]: bn.median(a)
Out[3]: 0.85809050231714357
In [4]: a
Out[4]: array([ 0.60412969,  0.90867329,  0.13375729,  0.8580905 ,  0.87165752])

I'm sure numpy will still be faster but can you update the timings with overwrite_input=False?

@juliantaylor

This comment has been minimized.

Show comment Hide comment
@juliantaylor

juliantaylor Oct 1, 2013

hm I must have made a mistake when I tried that, now I also found the copy in the code...
sorry for the false claim.
if numpy copies the difference is rather insignificant:

In [5]: %timeit -n 3  np.median(d, overwrite_input=False)
3 loops, best of 3: 615 ms per loop

In [6]: %timeit -n 3  bn.median(d)
3 loops, best of 3: 724 ms per loop

numpy mostly beats bottleneck but dependent on how good the the pivots are (numpy and bn use different strategies) on average its around 10%. And for small arrays (< 10000) bn usually is faster.
probably not worth any extra logic in bottleneck, you can close the issue if you like.

hm I must have made a mistake when I tried that, now I also found the copy in the code...
sorry for the false claim.
if numpy copies the difference is rather insignificant:

In [5]: %timeit -n 3  np.median(d, overwrite_input=False)
3 loops, best of 3: 615 ms per loop

In [6]: %timeit -n 3  bn.median(d)
3 loops, best of 3: 724 ms per loop

numpy mostly beats bottleneck but dependent on how good the the pivots are (numpy and bn use different strategies) on average its around 10%. And for small arrays (< 10000) bn usually is faster.
probably not worth any extra logic in bottleneck, you can close the issue if you like.

@kwgoodman

This comment has been minimized.

Show comment Hide comment
@kwgoodman

kwgoodman Oct 1, 2013

Owner

Are you the one who implemented the new numpy median? Do you think there's a chance that the new median algo would run even faster in bottleneck?

Owner

kwgoodman commented Oct 1, 2013

Are you the one who implemented the new numpy median? Do you think there's a chance that the new median algo would run even faster in bottleneck?

@juliantaylor

This comment has been minimized.

Show comment Hide comment
@juliantaylor

juliantaylor Oct 1, 2013

yes I implemented it. The algorithm is not that different. The minimum search bottleneck uses for even elements is not much slower than the iterative partitioning. The main advantage of iterative comes in when you select more kth (e.g. percentile, kth-order statistics).

From what I can tell the difference comes from that the partition part of quickselect seems to optimize much better in numpy than in with the cython code. Would be interesting to see why that happens (just from the assembler it might be a wrong __builtin_expect() as its jumping all over the place).

yes I implemented it. The algorithm is not that different. The minimum search bottleneck uses for even elements is not much slower than the iterative partitioning. The main advantage of iterative comes in when you select more kth (e.g. percentile, kth-order statistics).

From what I can tell the difference comes from that the partition part of quickselect seems to optimize much better in numpy than in with the cython code. Would be interesting to see why that happens (just from the assembler it might be a wrong __builtin_expect() as its jumping all over the place).

@kwgoodman kwgoodman closed this Jan 22, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment