Uses fftconvolve instead of convolve2d for speedups #1792

stsievert · 2015-12-01T05:47:39Z

This pull request uses scipy.signal.fftconvolve instead of scipy.signal.convolve2d in skimage/restoration/deconvolution.py. These are equivalent functions, as noted in this StackOverflow answer but fftconvolve is much faster. Quoting the documentation page for fftconvolve,

This is generally much faster than convolve for large arrays (n > ~500)

This function change gives equivalent results (up a to a machine epsilon) as indicated by the test below:

from scipy.signal import convolve2d, fftconvolve
import numpy as np

N, m = 1e3, 10
np.random.seed(42)
x = np.random.randn(int(N), int(N))
h = np.ones((m, m)) / m**2

y1 = convolve2d(x, h, mode='same')
y2 = fftconvolve(x, h, mode='same')

assert np.allclose(y1, y2)

print("{:0.4e}".format(np.max(np.abs(y1 - y2)))) # prints 1.7876e-15

When I use N = 1e3, I see speedups of roughly 6x.

MartinSavc · 2015-12-01T06:07:32Z

Convolution using FFT is not necessarily faster than ordinary convolution. It depends on the relative sizes of the kernels. Both inputs must be large for FFT to be faster. If one input is ~ 1x1 compared to the other, than ordinary convolution has a time complexity of ~O(n), whereas FFTs time complexity is O(n log n). For optimal speed both methods should be considered.

stefanv · 2015-12-01T06:43:36Z

Do you know of a good heuristic to determine when to switch?

MartinSavc · 2015-12-01T08:39:37Z

This is from the OpenCV documentation on filter2D:

The function uses the DFT-based algorithm in case of sufficiently large kernels (~11 x 11 or larger) and the direct algorithm (that uses the engine retrieved by createLinearFilter() ) for small kernels.

They do not seem to provide any clear reasoning though. Based on how FFT works, I think it should mostly be dependent on the size ratio between the inputs.

stefanv · 2015-12-01T08:54:09Z

Would you like to play around and come up with a sensible heuristic? I don't think it needs to be perfect. Padding is required before applying the Fourier transform when doing convolutions, so you have more data to work with. But then you also have a very efficient algorithm, probably implemented well in a low-level language. It may also be worth double checking whether fftconvolve intelligently pads to sizes for which the FFT can be executed rapidly (so called smooth numbers, see https://en.wikipedia.org/wiki/Cooley%E2%80%93Tukey_FFT_algorithm).

stsievert · 2015-12-01T14:50:25Z

It looks like fftconvolve does the required zero padding (source). It pads to the next power of 2 large.

True, convolve2d can be faster depending on kernel size. In my tests with a 500x500 image and a 4x4 kernel, I found fftconvolve and convolve2d to be (roughly) equally fast.

stefanv · 2015-12-01T21:22:22Z

We should definitely improve fftconvolve as a start. There is no need to pad to a power of two when instead we can pad to a nearby smooth number.

JDWarner · 2015-12-01T21:43:32Z

scipy.signal.fftconvolve is doing about as well as it can. It doesn't pad to the next power of 2; it pads to the next composite of small primes with a hidden function _next_regular here: https://github.com/scipy/scipy/blob/v0.16.1/scipy/signal/signaltools.py#L206 which I assume is what Stefan means by smooth numbers.

I believe the previously linked line no. 348 in that same file is the required internal padding for any convolution calculation.

I think 4x4 against 500x500 is somewhat of a minimal case for deconvolution. Real problems will usually involve larger PSFs or images. If that's where fftconvolve starts to pull ahead, IMO we should just use fftconvolve.

stsievert · 2015-12-02T04:03:21Z

I would agree with @JDWarner -- I think 4x4 convolution is fast enough for both fftconvolve and convolve2d... the real slowdown arises when larger PSFs are used. I discovered this bug while convolving a 512x512 image with a 512x512 PSF.

stefanv · 2015-12-02T05:11:11Z

That sounds reasonable—let's go ahead then.

stsievert · 2015-12-04T15:49:18Z

I've looked at the test output, and the error doesn't seem related to what I committed.

Before we merge, let me test the timing out more. I'm still not convinced what I said earlier is true... the timing depends on the kernel size for convolve2d, not the image size.

jni · 2015-12-07T03:08:32Z

@scottsievert @stefanv I was going to leave this in your capable hands but given the latest discussion here's what I would want to see before merging / adding logic to decide which method to use: an image of log2(time_fft / time_direct) for varying sizes of image (rows) and kernel (columns), starting at (3, 3) for both. Then display with the red/blue diverging colormap. That should give us a good idea about the performance characteristics in different scenarios.

stsievert · 2015-12-07T05:33:02Z

Looks like for all images larger than 44x44, fftconvolve is faster.

stefanv · 2015-12-07T05:58:32Z

Extra kudos to satisfying @jni's highly specific requirements :)

jni · 2015-12-07T07:22:08Z

Brilliant! That was quick. I'd add "and kernels bigger than 3". Even though the advantage is small for large images, it can accumulate when repeated often, and a kernel size of (3, 3) is extremely common.

Here's what I would read from this chart:

if max(kernel_size) == 3 or image.size < 2000:
    use_direct_convolve()
else:
    use_fft_convolve()

for suitable function definitions. =)

But, others may disagree. At least now we can disagree with data. =D Thanks @scottsievert!

stsievert · 2015-12-07T15:19:17Z

I just did it naturally using pandas/seaborn! Maybe thank @jni for having easy requirements?

MartinSavc · 2015-12-08T15:51:14Z

Interestingly, from the visualization provided by @scottsievert, I looks like 261 is a turning point. The next step in image size after 630 should see that a kernel of size 7 is faster with direct than fft convolution (eyeballing it). I've expected such an effect but haven't noticed it in my plots though (I haven't been as elegant with mine, so I've kept them in my "basement").

stsievert · 2015-12-09T16:32:28Z

When running once, it doesn't matter as much when N is small. I think I'd make a decision rule to use fftconvolve if k >= 7. But that said, we function that takes two parameters. I can use scipy.optimize.curve_fit to find the more precise rule.

stsievert · 2015-12-11T01:58:10Z

I've used scipy.optimize.curve_fit to find a more precise rule, from the data in this plot:

The below plot tries to predict the ratio, and would look to see whether the predicted ratio is above or below method and choose fftconvolve/convolve2d accordingly. This plot accurately predicts all the ratios (from the plot above) except for three points. At worst, this misclassifications are off by a factor of roughly 3 (but not 2^10).

This decision boundary was obtained by using curve_fit to find solutions of the form

t_fft = 2*(N*np.log(N) + k*np.log(k))
t_direct = k**2 * N**2
ratio = c * t_fft / t_direct # curve_fit says c \approx 71.468 (on my machine)

I implemented this change, and used m*n, assuming an M x N image (and likewise for the kernel). This change has not been tested; I ran into issues and thought I'd run it past you folks first.

stefanv · 2015-12-11T02:07:01Z

@scottsievert Will you please upload the code you used to a gist? Thanks for the detailed analysis! Of course, this may vary from machine to machine, but a rough rule of thumb is good enough for our purposes. Nathaniel Smith mentioned that we may want to push this upstream to SciPy as well into a unified convolve functions (I guess that would also mean looking into the accuracy of the various functions).

stsievert · 2015-12-11T02:23:33Z

Here's a gist of the code I used to generate the plot.

Pushing this upstream would make sense -- the user only wants to convolve and shouldn't care about the implementation details (i.e., whether it's implemented with fft or a direct method)... I like the keyword argument.

emmanuelle · 2015-12-11T20:16:11Z

skimage/restoration/deconvolution.py

+    def fft_time(m, n, k, l):
+        return m*np.log(m) + n*np.log(n) + k*np.log(k) + l*np.log(l)
+
+    time_ratio = 71.468 * fft_time(*image.shape, *psf.shape)


this line seems to be responsible for travis failures, I don't understand why...

time_ratio = 71.468 * fft_time(*tuple(list(image.shape) + list(psf.shape)))

solves the issue. I'm not sure you're allowed to unpack two tuples in a list of arguments.

@emmanuelle You don't need to recast as tuple, and, even more, you don't need to cast as list!

>>> (5, 9) + (5, 8) (5, 9, 5, 8)

So fft_time(*(image.shape + psf.shape)) should work.

However, I would change all this quite dramatically, since it takes almost no effort to make this function work on nD images instead of just 2D:

def direct_time(im, psf): return np.product(im.shape + psf.shape) def fft_time(im, psf): return sum(k*np.log(k) for k in im.shape + psf.shape) def time_ratio(im, psf): # constant factor obtained by curve fitting, see # https://github.com/scikit-image/scikit-image/pull/1792 return 71.468 * fft_time(im, psf) / direct_time(im, psf) convolve_function = fftconvolve if time_ratio(im, psf) < 1 else convolve

By using the convolve function and the more general function definitions, we get nD functionality essentially for free!

The only downside is that the constant factors might change now. =\

@jni thanks for the improvement. I think I had seen this syntax (casting tuples to list to add them and then to tuple again) somewhere else in skimage's code and since then I replicated this pattern. I'll have a quick look to see if we can discard other clumsy castings of the same type.

For example here
https://github.com/scikit-image/scikit-image/blob/master/skimage/util/shape.py#L260

@emmanuelle fascinating! I just double-checked that the tuple addition syntax works in 2.6 so I think this is indeed totally unnecessary, unless I'm missing something!

stsievert · 2015-12-13T01:34:15Z

Changes are implemented; hopefully the builds pass. I'll also be forking scipy to add convolve(..., method='fft').

jni · 2015-12-13T02:37:30Z

@scottsievert Would you like to tackle nD support (as detailed above)? It's ok if not, I can raise an issue to track it. Other than that I think this is good to go.

stsievert · 2015-12-13T02:47:35Z

My thoughts were to to add a parameter to scipy.signal.convolve as mentioned in the scipy issue. I haven't developed this yet, but I was thinking of adding something like:

def convolve(..., method='fft')
    if method == 'fft':
        return fftconvolve(...)
    # ...

jni · 2015-12-13T02:52:13Z

@scottsievert that's a separate issue. fftconvolve and convolve both support nD images, so you can keep the same logic as now, but by changing these three things:

use convolve instead of convolve2d
change the timing functions to be nD (as above)
fiddle with the time ratio constant

the function will now work with both 2D and 3D images.

ahojnnes · 2015-12-13T03:05:30Z

skimage/restoration/deconvolution.py

+    def fft_time(m, n, k, l):
+        return m*np.log(m) + n*np.log(n) + k*np.log(k) + l*np.log(l)
+
+    time_ratio = 71.468 * fft_time(*(image.shape + psf.shape))


Can we add some explanation as a comment here for future readers of the code. This looks pretty magic without knowing about this whole discussion here. Thanks.

Added in 3fd4709

stsievert · 2015-12-13T03:14:42Z

Ah, I see. Thanks for the clarification. I'll develop that sometime shortly, probably tomorrow.

jni · 2015-12-13T03:22:31Z

@scottsievert brilliant, thank you!! =)

stsievert · 2015-12-13T21:15:18Z

Commit makes the following changes:

If the 3D or greater, it automatically selects fftconvolve (see plot below)
Cleans and add comments (longer one in source, short one mentioning works for N dimensions in docstring).
changes the constant to fit the data I collected today. The constant only changed by a factor of 1.75 and has a better fit (see second plot)

3D timing

I reran the timing tests for 3 dimensions and found that fftconvolve is faster in all cases except when convolving a 3x3 kernel with a 3x3 image. The plot below is the ratio and doesn't look at how fast each data point is.

Constant change

I also changed the constant: with these new tests I found the best prediction accuracy (in the least squares sense) with the constant equal to 40.032.

I have updated the gist to reflect these changes.

jni · 2015-12-13T22:58:02Z

skimage/restoration/deconvolution.py

+
+    # see whether the fourier transform convolution method or the direct
+    # convolution method is faster (discussed in scikit-image PR #1792)
+    time_ratio = 40.032 * fft_time(image.shape, psf.shape))


Looks like you made a typo here — extra parenthesis!

jni · 2015-12-13T22:59:05Z

I still think we should switch the method in the size-3 image vs size-3 kernel...

kidding! =D Thanks for the great work, @scottsievert! Just one typo to fix then we can get this in!

stsievert · 2015-12-13T23:02:01Z

Typo fixed!

jni · 2015-12-14T02:31:26Z

I've confirmed that the rebase fixes all the Travis tests (minus the 2.7 amg issue) so I'm merging. Thank you @scottsievert!

Uses fftconvolve instead of convolve2d for speedups

uses fftconvolve instead of convolve2d

5aabc19

WarrenWeckesser mentioned this pull request Dec 7, 2015

ENH: Fold fftconvolve into convolve/correlate functions as a parameter scipy/scipy#2651

Closed

adds decision when to use fftconvolve/convolve2d

1be8336

emmanuelle reviewed Dec 11, 2015
View reviewed changes

fixes import, tuple addition

3f3b705

ahojnnes reviewed Dec 13, 2015
View reviewed changes

adds comment explaining time_ratio decision

3fd4709

soupault added the ⏩ type: Enhancement Improve existing features label Dec 13, 2015

now N dimensional, changes constant, cleans and comments

4d6c9e7

jni reviewed Dec 13, 2015
View reviewed changes

removes paren

2892e90

jni added a commit that referenced this pull request Dec 14, 2015

Merge pull request #1792 from scottsievert/fftconvolve

975d1a4

Uses fftconvolve instead of convolve2d for speedups

jni merged commit 975d1a4 into scikit-image:master Dec 14, 2015

This was referenced Dec 15, 2015

Adds keyword argument to choose faster convolution method scipy/scipy#5608

Merged

Cleans up richardson-lucy deconvolution function #1830

Merged

jni mentioned this pull request Dec 1, 2018

Blind Richardson–Lucy deconvolution #3524

Open

9 tasks

stsievert mentioned this pull request Oct 26, 2019

MAINT: use SciPy's implementation of convolution method #4267

Merged

5 tasks

stsievert mentioned this pull request Oct 23, 2020

Deconvolution post dask/dask-blog#77

Merged

Uses fftconvolve instead of convolve2d for speedups #1792

Uses fftconvolve instead of convolve2d for speedups #1792

Conversation

stsievert commented Dec 1, 2015

MartinSavc commented Dec 1, 2015

stefanv commented Dec 1, 2015 via email

MartinSavc commented Dec 1, 2015

stefanv commented Dec 1, 2015 via email

stsievert commented Dec 1, 2015

stefanv commented Dec 1, 2015 via email

JDWarner commented Dec 1, 2015

stsievert commented Dec 2, 2015

stefanv commented Dec 2, 2015 via email

stsievert commented Dec 4, 2015

jni commented Dec 7, 2015

stsievert commented Dec 7, 2015

stefanv commented Dec 7, 2015 via email

jni commented Dec 7, 2015

stsievert commented Dec 7, 2015

MartinSavc commented Dec 8, 2015

stsievert commented Dec 9, 2015

stsievert commented Dec 11, 2015

stefanv commented Dec 11, 2015

stsievert commented Dec 11, 2015 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stsievert commented Dec 13, 2015

jni commented Dec 13, 2015

stsievert commented Dec 13, 2015

jni commented Dec 13, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stsievert commented Dec 13, 2015

jni commented Dec 13, 2015

stsievert commented Dec 13, 2015

3D timing

Constant change

Choose a reason for hiding this comment

jni commented Dec 13, 2015

stsievert commented Dec 13, 2015

jni commented Dec 14, 2015

stsievert commented Dec 11, 2015 •

edited

Loading