Adds keyword argument to choose faster convolution method #5608

stsievert · 2015-12-15T03:53:53Z

As mentioned in #2651, this PR adds the keyword argument method to scipy.signal.convolve to choose the choose the convolution method. method can take vales 'auto', 'fft' or 'direct'.

In scikit-image PR 1792 we chose the faster convolution method, either the direct method or with fftconvolve. We merged these changes in, and I am merging these changes upstream.

Timing comparison

In the plot below, the ratio fft_time / direct_time is plotted with a log scaling. (more detail can be found in the scikit-image PR). These plots are lifted from the scikit-image PR.

In the plot below, the fit values to this ratio are shown. If this ratio is less than 1, this PR chooses to use fftconvolve.

The complete code and data to generate these plots are in a gist. These timings were performed on a Mid-2012 Macbook Air.

Tests

Tests on my machine failed to build. However, my own test passes:

for pwr in [4, 5, 6, 7, 8, 9, 10, 11, 12]:
    for _ in range(int(1e3)):
        x = np.random.rand(2**pwr)
        h = np.random.rand((2**pwr)//10)

        y1 = convolve(x, h, method='fft')
        y2 = convolve(x, h, method='direct')

        assert np.allclose(y1, y2), "convolve should give equivalent answers"

Documentation

This PR includes documentation of this keyword argument in the docstring. I tried to follow the form of the mode keyword argument.

rgommers · 2015-12-15T20:45:29Z

This looks interesting, potentially large speedups.

Tests on my machine failed to build.

They seem to pass on TravisCI without any adjustment, which is slightly surprising. Do you have the test failures? Also, did you check how much the relative precision changes for a range of input arrays?

stsievert · 2015-12-16T01:16:05Z

I found that the relative precision difference was small (on the order of 1e-15).

np.random.seed(42)
pwr = 16
x = np.random.rand(2**pwr)
h = np.random.rand((2**pwr)//10)
y_fft = convolve(x, h, method='fft')
y_direct = convolve(x, h, method='direct')

error = abs(y_fft - y_direct) / y_direct

print("error std. dev = {:0.4e}".format(np.sqrt(error.var())))
print("error mean = {:0.4e}".format(error.mean()))
print("error max = {:0.4e}".format(error.max()))
print("error median = {:0.4e}".format(np.median(error)))

# prints the following output:
# error std. dev = 1.3594e-14
# error mean = 6.9658e-16
# error max = 3.0790e-12
# error median = 2.8107e-16

The tests failed to build on my machine for an unrelated issue. I ran into some issue with gcc and Homebrew.

jaimefrio · 2015-12-16T04:10:45Z

scipy/signal/signaltools.py

+    # convolution method is faster (discussed in scikit-image PR #1792)
+    direct_time = np.prod(volume.shape + kernel.shape)
+    fft_time = np.sum([n*np.log(n) for n in volume.shape + kernel.shape])
+    time_ratio = 40.032 * fft_time / direct_time


It may be a good idea to make 40.032 a named constant, in case the code is ever parted from the comment in a future refactor...

Also, this may be a premature micro-optimization, but if instead of time_ratio you computed

time_diff = 40.032 * fft_time - direct_time

you could later use time_diff < 0 as your condition, replacing an always expensive division by a cheap subtraction, and without hurting readability much, if at all.

Or I could just do a direct comparison! fft_time < direct_time! Also, the constant has been named.

jaimefrio · 2015-12-16T22:19:46Z

The code LGTM, but I'll let someone more current on things SciPy take the decision to merge it.

tacaswell · 2015-12-17T15:02:31Z

scipy/signal/signaltools.py

@@ -416,6 +416,18 @@ def convolve(in1, in2, mode='full'):
        ``same``
           The output is the same size as `in1`, centered
           with respect to the 'full' output.
+    method : str {'auto', 'fft', 'direct'}, optional


Should this default to 'direct' to not change the current behavior?

+1 for keeping old default

I have changed the default to direct.

larsoner · 2015-12-18T16:37:43Z

@scottsievert do you need others to run a benchmark script to tune your algorithm? I have a 6-core desktop running Linux (self-built libs) and a 2-core laptop running Windows (Anaconda) that I could use if you want more data.

larsoner · 2015-12-18T16:38:52Z

And have you looked at all at whether the use of MKL affects it? I suppose it might be a bit too fine-grained for the optimization scheme you're going for, though.

larsoner · 2015-12-18T17:05:51Z

Actually @scottsievert do you have a link to the code used to generate times.p? I can only seem to find the code for doing the plot in the linked issue and gist, not the code to actually generate the times. I'd like to extend it to (and test) 1D as well as 2D/3D, and validate the existing results on my system(s).

stsievert · 2015-12-18T17:29:46Z

I updated the gist to include code to generate times.p (and it also includes a link to the Jupyter notebook). I tested the mentioned times on a Macbook Air with 8GB memory and Anaconda installed (without MKL).

I only did limited testing in the skimage PR (I only ran two tests). I just wanted a very rough rule that could detect when speedups of 10x or greater where present.

tacaswell · 2015-12-18T17:43:00Z

Did you also look at larger images? In my domain the scientifically relevant image sizes are typically (1-2k) x (1-2k) .

stsievert · 2015-12-18T17:59:59Z

Here's the image from a speed test we did again:

The speed gains only increase for larger images and larger kernels. fftconvolve is faster for large images and large kernels -- even for a 200x200 image and 25x25 kernel, fftconvolve took roughly a factor of 2^-4 or 2^-5 as long and was much faster. We didn't test larger images because convolve took too long.

larsoner · 2015-12-18T18:17:03Z

I tweaked the script and ran it on my MKL system, and the decision boundary looks good (the shortest 1D case is for a 3 sample signal, which is why I suspect it's off -- but it shouldn't really matter which version you use there I guess):

tacaswell · 2015-12-18T18:17:50Z

We went down a similar route on trackpy and came to the opposite conclusion (see soft-matter/trackpy#219 and links @danielballan and @caspervdw can comment in more detail than I can). I think the difference is that we were only using separable kernels so direct 1D convolution still won.

larsoner · 2015-12-18T18:22:55Z

@tacaswell I think the above plot is saying that direct 1D convolution does win over FFT-based, since it's a plot of t_fft / t_direct (and above unity most of the instances)...?

WarrenWeckesser · 2015-12-18T18:26:59Z

FWIW: http://scipy-cookbook.readthedocs.org/items/ApplyFIRFilter.html

The plot is a few years old. I haven't run that script in quite a while, so I don't know if it would look the same now.

larsoner · 2015-12-18T18:33:25Z

Hmm, so maybe we need some additional testing, tuning, and/or triage in the 1D case.

WarrenWeckesser · 2015-12-18T22:10:24Z

I think the default method should be 'auto'. The existing docstring only says that the function computes the convolution. Mathematically, both the direct method and the FFT method are equivalent, so we don't really have a problem with backwards compatibility. Of course, the results will not be exactly the same, but scipy does not promise bit-level reproducibility between versions. 'auto' is what we want in the long term; I think we can safely do that now.

larsoner · 2015-12-18T22:35:46Z

Fair enough

WarrenWeckesser · 2015-12-18T23:01:23Z

Modified FIR script is here: https://gist.github.com/WarrenWeckesser/e2f4de588967ea01226e

I dropped lfilter from the script. Thanks to @ewmoore, lfilter is now smart about handling the case when the a argument of lfilter is a scalar, and the results for lfilter are pretty much the same as those of np.convolve.

Here's the result when the script is run on a Macbook Pro (OS X 10.9.5), using Anaconda's packages (scipy 0.16.1, numpy 1.10.2):

stsievert · 2015-12-20T00:15:06Z

Hm, interesting. Looks like these changes should be merged upstream to numpy (after changing the constant).

If bit-level reproducibility is not required, I believe the default for method should be auto, which I agree would be desired long-term.

rgommers · 2015-12-20T11:45:54Z

If bit-level reproducibility is not required, I believe the default for method should be auto, which I agree would be desired long-term.

If it all matches with relative precision of 1e-15 as you said above, that's OK. Bit-level reproducibility is anyway not guaranteed - depends on OS, compilers, CPU, etc.

stsievert · 2016-01-31T20:25:50Z

I've done more testing. I've found that the 1D kernel estimation tends to be inaccurate. For 1D, I had to modify the big O constant by a factor of 26 and even then it tends to break down with small inputs. I think it'd be best to add an if statement to catch these cases; maybe if kernel.ndims == 1 and kernel.size*signal.size < 5e3? I believe these small 1D cases are off because of additional overhead in the FFT function.

stsievert · 2016-02-09T00:04:19Z

I have made changes to the big O constant for the 1D case. What else should be done before this PR can be merged?

endolith · 2016-02-09T01:08:07Z

It should have tests to prove that it works identically for all possible combinations of settings, dimensions, dtypes, etc.

Ideally the keyword would also be added to correlate, correlate2d and convolve2d.

stsievert · 2016-02-09T01:46:15Z

Test added for the keyword argument method. I based it off the other tests which don't test for every case and only write down one case.

Agreed, ideally they would... but that would require elaborate testing plus correlate/etc is implemented in C.

endolith · 2016-02-09T02:01:39Z

Well correlation and convolution are basically the same thing: correlate(a, b) == fftconvolve(a, b[::-1]). convolve is actually implemented using correlate, no?

stsievert · 2016-07-16T23:03:28Z

@rgommers I've resolved those issues and the tests on 64-bit systems pass. Can you verify that the checks pass 32-bit systems?

rgommers · 2016-07-19T06:47:51Z

That fixed it, thanks @stsievert

larsoner · 2016-07-19T15:59:13Z

Now that @rgommers is happy, I'll merge later today since it looks like all remaining comments have been addressed

larsoner · 2016-07-20T03:11:16Z

Thanks @stsievert!

rgommers · 2016-07-20T05:40:36Z

In at last:) Thanks @stsievert, @endolith and everyone else who helped.

pv · 2016-07-21T09:23:53Z

Note that some of the benchmarks/signal.py benchmark results got worse --- there's probably a tradeoff here, or the decision criteria could be fine-tuned: https://pv.github.io/scipy-bench/index.html#regressions?sort=1&dir=desc

endolith · 2016-07-21T23:07:03Z

@pv I would guess that most of the tests are small N, and would benefit from the small N shortcuts that we didn't put in _fftconv_faster yet.

DOC: release notes for #5608

rgommers added enhancement A new feature or improvement scipy.signal labels Dec 15, 2015

stsievert mentioned this pull request Dec 16, 2015

Cleans up richardson-lucy deconvolution function scikit-image/scikit-image#1830

Merged

jaimefrio reviewed Dec 16, 2015
View reviewed changes

tacaswell reviewed Dec 17, 2015
View reviewed changes

stsievert force-pushed the convolve-method branch from 507efca to 829dc20 Compare July 16, 2016 21:44

larsoner merged commit 2f36b17 into scipy:master Jul 20, 2016

ev-br added this to the 0.19.0 milestone Jul 20, 2016

endolith mentioned this pull request Jul 21, 2016

ENH: Fold fftconvolve into convolve/correlate functions as a parameter #2651

Closed

larsoner mentioned this pull request Sep 6, 2016

Scipy 1.0 Roadmap #2908

Merged

stsievert pushed a commit to stsievert/scipy that referenced this pull request Nov 29, 2016

DOC: release notes for scipy#5608, faster convolve/correlate

b8007e0

stsievert mentioned this pull request Nov 29, 2016

DOC: release notes for #5608 #6823

Merged

stsievert pushed a commit to stsievert/scipy that referenced this pull request Nov 30, 2016

DOC: release notes for scipy#5608, faster convolve/correlate

854cebc

stsievert pushed a commit to stsievert/scipy that referenced this pull request Nov 30, 2016

DOC: release notes for scipy#5608, faster convolve/correlate

14948b0

rgommers added a commit that referenced this pull request Nov 30, 2016

Merge pull request #6823 from stsievert/relnotes-fftconv

4b77de5

DOC: release notes for #5608

Linkid pushed a commit to Linkid/scipy that referenced this pull request Feb 9, 2017

DOC: release notes for scipy#5608, faster convolve/correlate

473cd5e

larsoner mentioned this pull request Mar 20, 2017

scipy.signal.convolve produces inconsistent and backwards incompatible behavior with method parameter set to fft vs direct #7193

Closed

stsievert mentioned this pull request Mar 21, 2017

BUG: convolve may yield inconsistent dtypes with method changed #7211

Closed

endolith mentioned this pull request Feb 3, 2018

ENH: signal: implement circular convolution #8305

Open

stsievert mentioned this pull request Mar 24, 2018

MAINT: add 2D benchmarks for convolve #8607

Merged

stsievert mentioned this pull request Dec 1, 2018

Blind Richardson–Lucy deconvolution scikit-image/scikit-image#3524

Open

9 tasks

larsoner mentioned this pull request Jul 8, 2019

ENH: convolve numbers should be updated #10427

Closed

stsievert mentioned this pull request Oct 26, 2019

MAINT: use SciPy's implementation of convolution method scikit-image/scikit-image#4267

Merged

5 tasks

This was referenced May 26, 2020

Supporting (optional) FFT-based convolution in ndimage.convolve #12228

Open

Supporting (optional) FFT-based convolution in ndimage.convolve cupy/cupy#3378

Open

jakirkham mentioned this pull request Oct 23, 2020

Deconvolution post dask/dask-blog#77

Merged

stsievert mentioned this pull request Oct 23, 2020

choose_conv_method ignores the oaconvolve #12993

Open

stsievert deleted the convolve-method branch May 3, 2021 16:28

Adds keyword argument to choose faster convolution method #5608

Adds keyword argument to choose faster convolution method #5608

Conversation

stsievert commented Dec 15, 2015

Timing comparison

Tests

Documentation

rgommers commented Dec 15, 2015

stsievert commented Dec 16, 2015

jaimefrio Dec 16, 2015

Choose a reason for hiding this comment

jaimefrio Dec 16, 2015

Choose a reason for hiding this comment

stsievert Dec 16, 2015

Choose a reason for hiding this comment

jaimefrio commented Dec 16, 2015

tacaswell Dec 17, 2015

Choose a reason for hiding this comment

larsoner Dec 17, 2015

Choose a reason for hiding this comment

stsievert Dec 18, 2015

Choose a reason for hiding this comment

larsoner commented Dec 18, 2015

larsoner commented Dec 18, 2015

larsoner commented Dec 18, 2015

stsievert commented Dec 18, 2015

tacaswell commented Dec 18, 2015

stsievert commented Dec 18, 2015

larsoner commented Dec 18, 2015

tacaswell commented Dec 18, 2015

larsoner commented Dec 18, 2015

WarrenWeckesser commented Dec 18, 2015

larsoner commented Dec 18, 2015

WarrenWeckesser commented Dec 18, 2015

larsoner commented Dec 18, 2015 via email

WarrenWeckesser commented Dec 18, 2015

stsievert commented Dec 20, 2015

rgommers commented Dec 20, 2015

stsievert commented Jan 31, 2016

stsievert commented Feb 9, 2016

endolith commented Feb 9, 2016

stsievert commented Feb 9, 2016

endolith commented Feb 9, 2016

stsievert commented Jul 16, 2016

rgommers commented Jul 19, 2016

larsoner commented Jul 19, 2016

larsoner commented Jul 20, 2016

rgommers commented Jul 20, 2016

pv commented Jul 21, 2016 via email

endolith commented Jul 21, 2016