Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
numpy 1.8 median faster than bottleneck #74
numpy 1.8 implements median now as a selection too and it is quite a bit faster than bottlenecks version while having a better worst case complexity.
benchmark with bn and numpy git head/1.8.x (both compiled with gcc 4.7 -O2 on amd64)
(reproduced with multiple seeds, so its not just a lucky pivot)
possibly bottlenecks could fall back to numpy if its >= 1.8 in the cases where its faster e.g. larger than a couple hundred elements, continuous axis unless big enough that copying is irrelevant.
note that bottlenecks default benchmark is unfair as np.median copies the input by default while bn.median does not, for 1000,1000 data this is very relevant.
Thanks for reporting. I did not know that the overhaul of np.median was finished (or, for that matter, started).
Bottleneck does not change the input array:
I'm sure numpy will still be faster but can you update the timings with
hm I must have made a mistake when I tried that, now I also found the copy in the code...
numpy mostly beats bottleneck but dependent on how good the the pivots are (numpy and bn use different strategies) on average its around 10%. And for small arrays (< 10000) bn usually is faster.
yes I implemented it. The algorithm is not that different. The minimum search bottleneck uses for even elements is not much slower than the iterative partitioning. The main advantage of iterative comes in when you select more kth (e.g. percentile, kth-order statistics).
From what I can tell the difference comes from that the partition part of quickselect seems to optimize much better in numpy than in with the cython code. Would be interesting to see why that happens (just from the assembler it might be a wrong __builtin_expect() as its jumping all over the place).