New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds keyword argument to choose faster convolution method #5608
Conversation
This looks interesting, potentially large speedups.
They seem to pass on TravisCI without any adjustment, which is slightly surprising. Do you have the test failures? Also, did you check how much the relative precision changes for a range of input arrays? |
I found that the relative precision difference was small (on the order of 1e-15). np.random.seed(42)
pwr = 16
x = np.random.rand(2**pwr)
h = np.random.rand((2**pwr)//10)
y_fft = convolve(x, h, method='fft')
y_direct = convolve(x, h, method='direct')
error = abs(y_fft - y_direct) / y_direct
print("error std. dev = {:0.4e}".format(np.sqrt(error.var())))
print("error mean = {:0.4e}".format(error.mean()))
print("error max = {:0.4e}".format(error.max()))
print("error median = {:0.4e}".format(np.median(error)))
# prints the following output:
# error std. dev = 1.3594e-14
# error mean = 6.9658e-16
# error max = 3.0790e-12
# error median = 2.8107e-16 The tests failed to build on my machine for an unrelated issue. I ran into some issue with gcc and Homebrew. |
# convolution method is faster (discussed in scikit-image PR #1792) | ||
direct_time = np.prod(volume.shape + kernel.shape) | ||
fft_time = np.sum([n*np.log(n) for n in volume.shape + kernel.shape]) | ||
time_ratio = 40.032 * fft_time / direct_time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be a good idea to make 40.032 a named constant, in case the code is ever parted from the comment in a future refactor...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this may be a premature micro-optimization, but if instead of time_ratio
you computed
time_diff = 40.032 * fft_time - direct_time
you could later use time_diff < 0
as your condition, replacing an always expensive division by a cheap subtraction, and without hurting readability much, if at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or I could just do a direct comparison! fft_time < direct_time
! Also, the constant has been named.
The code LGTM, but I'll let someone more current on things SciPy take the decision to merge it. |
@@ -416,6 +416,18 @@ def convolve(in1, in2, mode='full'): | |||
``same`` | |||
The output is the same size as `in1`, centered | |||
with respect to the 'full' output. | |||
method : str {'auto', 'fft', 'direct'}, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this default to 'direct'
to not change the current behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for keeping old default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed the default to direct.
@scottsievert do you need others to run a benchmark script to tune your algorithm? I have a 6-core desktop running Linux (self-built libs) and a 2-core laptop running Windows (Anaconda) that I could use if you want more data. |
And have you looked at all at whether the use of |
Actually @scottsievert do you have a link to the code used to generate |
I updated the gist to include code to generate I only did limited testing in the skimage PR (I only ran two tests). I just wanted a very rough rule that could detect when speedups of 10x or greater where present. |
Did you also look at larger images? In my domain the scientifically relevant image sizes are typically (1-2k) x (1-2k) . |
Here's the image from a speed test we did again: The speed gains only increase for larger images and larger kernels. fftconvolve is faster for large images and large kernels -- even for a 200x200 image and 25x25 kernel, fftconvolve took roughly a factor of 2^-4 or 2^-5 as long and was much faster. We didn't test larger images because |
We went down a similar route on trackpy and came to the opposite conclusion (see soft-matter/trackpy#219 and links @danielballan and @caspervdw can comment in more detail than I can). I think the difference is that we were only using separable kernels so direct 1D convolution still won. |
@tacaswell I think the above plot is saying that direct 1D convolution does win over FFT-based, since it's a plot of |
FWIW: http://scipy-cookbook.readthedocs.org/items/ApplyFIRFilter.html The plot is a few years old. I haven't run that script in quite a while, so I don't know if it would look the same now. |
Hmm, so maybe we need some additional testing, tuning, and/or triage in the 1D case. |
I think the default method should be 'auto'. The existing docstring only says that the function computes the convolution. Mathematically, both the direct method and the FFT method are equivalent, so we don't really have a problem with backwards compatibility. Of course, the results will not be exactly the same, but scipy does not promise bit-level reproducibility between versions. 'auto' is what we want in the long term; I think we can safely do that now. |
Fair enough
|
Modified FIR script is here: https://gist.github.com/WarrenWeckesser/e2f4de588967ea01226e I dropped Here's the result when the script is run on a Macbook Pro (OS X 10.9.5), using Anaconda's packages (scipy 0.16.1, numpy 1.10.2): |
Hm, interesting. Looks like these changes should be merged upstream to numpy (after changing the constant). If bit-level reproducibility is not required, I believe the default for |
If it all matches with relative precision of 1e-15 as you said above, that's OK. Bit-level reproducibility is anyway not guaranteed - depends on OS, compilers, CPU, etc. |
I've done more testing. I've found that the 1D kernel estimation tends to be inaccurate. For 1D, I had to modify the big O constant by a factor of 26 and even then it tends to break down with small inputs. I think it'd be best to add an if statement to catch these cases; maybe if |
I have made changes to the big O constant for the 1D case. What else should be done before this PR can be merged? |
It should have tests to prove that it works identically for all possible combinations of settings, dimensions, dtypes, etc. Ideally the keyword would also be added to |
Test added for the keyword argument Agreed, ideally they would... but that would require elaborate testing plus correlate/etc is implemented in C. |
Well correlation and convolution are basically the same thing: |
507efca
to
829dc20
Compare
@rgommers I've resolved those issues and the tests on 64-bit systems pass. Can you verify that the checks pass 32-bit systems? |
That fixed it, thanks @stsievert |
Now that @rgommers is happy, I'll merge later today since it looks like all remaining comments have been addressed |
Thanks @stsievert! |
In at last:) Thanks @stsievert, @endolith and everyone else who helped. |
Note that some of the benchmarks/signal.py benchmark results got worse
--- there's probably a tradeoff here, or the decision criteria could be
fine-tuned:
https://pv.github.io/scipy-bench/index.html#regressions?sort=1&dir=desc
|
@pv I would guess that most of the tests are small N, and would benefit from the small N shortcuts that we didn't put in |
As mentioned in #2651, this PR adds the keyword argument
method
toscipy.signal.convolve
to choose the choose the convolution method.method
can take vales'auto'
,'fft'
or'direct'
.In scikit-image PR 1792 we chose the faster convolution method, either the direct method or with fftconvolve. We merged these changes in, and I am merging these changes upstream.
Timing comparison
In the plot below, the ratio
fft_time / direct_time
is plotted with a log scaling. (more detail can be found in the scikit-image PR). These plots are lifted from the scikit-image PR.In the plot below, the fit values to this ratio are shown. If this ratio is less than 1, this PR chooses to use fftconvolve.
The complete code and data to generate these plots are in a gist. These timings were performed on a Mid-2012 Macbook Air.
Tests
Tests on my machine failed to build. However, my own test passes:
Documentation
This PR includes documentation of this keyword argument in the docstring. I tried to follow the form of the
mode
keyword argument.