New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Requiring numba? #8
Comments
What kind of speedups do you see from using numba? If those are significant requiring it could make sense (I'd prefer that to including C++ code). |
In general, the speed-ups can be order of magnitute or more, even if the code already used numpy. It is hard to say without trying, but I would expect similar speed ups for this library. So maybe the best course of action is to write some benchmarks with the current code, and then rewrite some of it to use numba, and compare. I can do that. |
We can then decide whether numba is optional or required, ok? |
I tried numba on jackknife, which seemed most promising for acceleration. The speedup is about 10x on my machine, depending strongly on the number of samples in the original array. The speedup is probably also fairly platform-dependent. https://github.com/HDembinski/essays/blob/master/resample%20and%20numba.ipynb |
Based on this evidence, I think moving to numba is benefinical. Perhaps the code can be made even faster. The jackknife in particular could be much faster in the future, as soon as numba starts to support Python callables (currently it does not). Already today, numba can call ufuncs that offer a ctypes or cffi interface. In this case, we would not need to generate N arrays with N-1 elements at once, but only one N-1 array and reuse the allocated memory N times. |
I just updated the numba example with a better implementation. |
Since you are ok with making breaking changes, I think the version of jackknife that returns the Nx(N-1) array should be removed. For N > 10000 this becomes dangerous as it allocates huge amount of memory. Algorithms that have a O(N^2) memory requirement on input with arbitrary N can easily bring down computers. |
Thanks for working on this, that does look like a nice speedup. I think I'm fine with removing the N x (N - 1) output option. Currently the jackknife is only implemented internally with a non-None statistical function so things should still work with that feature removed. It also feels a bit cleaner to have the shape of the output not depend on the input (this is also a question I have about the behavior of the bootstrap function, but it's not as problematic since you can control the number of bootstrap replications). |
I agree. I think we should always require, for jacknife or bootstrap, that the user passes the statistical function, to avoid holding all the resampled data in memory at once. For the bootstrap it is less of a problem, agreed. I am concerned about the balanced bootstrap. I think the implementation requires one to hold all the N x M values in memory at once. |
Yes, I think you are right. Off the top of my head I'm wondering if it might be reasonable to, instead of copying a permutation of all the data repeated M times, do the same for the integer positions and then use some kind of sliding window to calculate the statistical function. Or maybe that could be done in a lazy way and only "realize" the positions during the calculation? |
I have to look into this algorithm again, but I think one has to store at least the full NxM index array at once, where N is the number of samples and M the number of replicas. If the data values are doubles, one could safe a bit by using a smaller integer to hold the indices, say int32. |
I am closing this, since I added numba :) |
Makes sense, we can open specific issues for other applications of numba in the future |
The speed of the functions
jackknife
andbootstrap
could be greatly enhanced withnumba
. While it is possible to have two implements for each function, one requiringnumba
and one just requiringnumpy
, I think it would make sense to requirenumba
. It used to be a pain to install, but nowadays one can justpip install
it and of course it is also available from anaconda.I know
numba
well enough to work on this. Another option (but less preferred), would be to write the implementations in C++ and wrap the code to Python with pybind11. I can do that as well, but then the package is not pure Python anymore and has to be compiled for every target platform. This is a huge added burden, and the C++ code may not run faster thannumba
(I did some tests a while ago, wherenumba
beat my C++ implemenation).The text was updated successfully, but these errors were encountered: