Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uses fftconvolve instead of convolve2d for speedups #1792

Merged
merged 6 commits into from
Dec 14, 2015

Conversation

stsievert
Copy link
Contributor

This pull request uses scipy.signal.fftconvolve instead of scipy.signal.convolve2d in skimage/restoration/deconvolution.py. These are equivalent functions, as noted in this StackOverflow answer but fftconvolve is much faster. Quoting the documentation page for fftconvolve,

This is generally much faster than convolve for large arrays (n > ~500)

This function change gives equivalent results (up a to a machine epsilon) as indicated by the test below:

from scipy.signal import convolve2d, fftconvolve
import numpy as np

N, m = 1e3, 10
np.random.seed(42)
x = np.random.randn(int(N), int(N))
h = np.ones((m, m)) / m**2

y1 = convolve2d(x, h, mode='same')
y2 = fftconvolve(x, h, mode='same')

assert np.allclose(y1, y2)

print("{:0.4e}".format(np.max(np.abs(y1 - y2)))) # prints 1.7876e-15

When I use N = 1e3, I see speedups of roughly 6x.

@MartinSavc
Copy link
Contributor

Convolution using FFT is not necessarily faster than ordinary convolution. It depends on the relative sizes of the kernels. Both inputs must be large for FFT to be faster. If one input is ~ 1x1 compared to the other, than ordinary convolution has a time complexity of ~O(n), whereas FFTs time complexity is O(n log n). For optimal speed both methods should be considered.

@stefanv
Copy link
Member

stefanv commented Dec 1, 2015 via email

@MartinSavc
Copy link
Contributor

This is from the OpenCV documentation on filter2D:

The function uses the DFT-based algorithm in case of sufficiently large kernels (~11 x 11 or larger) and the direct algorithm (that uses the engine retrieved by createLinearFilter() ) for small kernels.

They do not seem to provide any clear reasoning though. Based on how FFT works, I think it should mostly be dependent on the size ratio between the inputs.

@stefanv
Copy link
Member

stefanv commented Dec 1, 2015 via email

@stsievert
Copy link
Contributor Author

It looks like fftconvolve does the required zero padding (source). It pads to the next power of 2 large.

True, convolve2d can be faster depending on kernel size. In my tests with a 500x500 image and a 4x4 kernel, I found fftconvolve and convolve2d to be (roughly) equally fast.

@stefanv
Copy link
Member

stefanv commented Dec 1, 2015 via email

@JDWarner
Copy link
Contributor

JDWarner commented Dec 1, 2015

scipy.signal.fftconvolve is doing about as well as it can. It doesn't pad to the next power of 2; it pads to the next composite of small primes with a hidden function _next_regular here: https://github.com/scipy/scipy/blob/v0.16.1/scipy/signal/signaltools.py#L206 which I assume is what Stefan means by smooth numbers.

I believe the previously linked line no. 348 in that same file is the required internal padding for any convolution calculation.

I think 4x4 against 500x500 is somewhat of a minimal case for deconvolution. Real problems will usually involve larger PSFs or images. If that's where fftconvolve starts to pull ahead, IMO we should just use fftconvolve.

@stsievert
Copy link
Contributor Author

I would agree with @JDWarner -- I think 4x4 convolution is fast enough for both fftconvolve and convolve2d... the real slowdown arises when larger PSFs are used. I discovered this bug while convolving a 512x512 image with a 512x512 PSF.

@stefanv
Copy link
Member

stefanv commented Dec 2, 2015 via email

@stsievert
Copy link
Contributor Author

I've looked at the test output, and the error doesn't seem related to what I committed.

Before we merge, let me test the timing out more. I'm still not convinced what I said earlier is true... the timing depends on the kernel size for convolve2d, not the image size.

@jni
Copy link
Member

jni commented Dec 7, 2015

@scottsievert @stefanv I was going to leave this in your capable hands but given the latest discussion here's what I would want to see before merging / adding logic to decide which method to use: an image of log2(time_fft / time_direct) for varying sizes of image (rows) and kernel (columns), starting at (3, 3) for both. Then display with the red/blue diverging colormap. That should give us a good idea about the performance characteristics in different scenarios.

@stsievert
Copy link
Contributor Author

screen shot 2015-12-06 at 11 29 27 pm

Looks like for all images larger than 44x44, fftconvolve is faster.

@stefanv
Copy link
Member

stefanv commented Dec 7, 2015 via email

@jni
Copy link
Member

jni commented Dec 7, 2015

Brilliant! That was quick. I'd add "and kernels bigger than 3". Even though the advantage is small for large images, it can accumulate when repeated often, and a kernel size of (3, 3) is extremely common.

Here's what I would read from this chart:

if max(kernel_size) == 3 or image.size < 2000:
    use_direct_convolve()
else:
    use_fft_convolve()

for suitable function definitions. =)

But, others may disagree. At least now we can disagree with data. =D Thanks @scottsievert!

@stsievert
Copy link
Contributor Author

I just did it naturally using pandas/seaborn! Maybe thank @jni for having easy requirements?

@MartinSavc
Copy link
Contributor

Interestingly, from the visualization provided by @scottsievert, I looks like 261 is a turning point. The next step in image size after 630 should see that a kernel of size 7 is faster with direct than fft convolution (eyeballing it). I've expected such an effect but haven't noticed it in my plots though (I haven't been as elegant with mine, so I've kept them in my "basement").

@stsievert
Copy link
Contributor Author

When running once, it doesn't matter as much when N is small. I think I'd make a decision rule to use fftconvolve if k >= 7. But that said, we function that takes two parameters. I can use scipy.optimize.curve_fit to find the more precise rule.

@stsievert
Copy link
Contributor Author

I've used scipy.optimize.curve_fit to find a more precise rule, from the data in this plot:

screen shot 2015-12-10 at 7 55 13 pm

The below plot tries to predict the ratio, and would look to see whether the predicted ratio is above or below method and choose fftconvolve/convolve2d accordingly. This plot accurately predicts all the ratios (from the plot above) except for three points. At worst, this misclassifications are off by a factor of roughly 3 (but not 2^10).

pr

This decision boundary was obtained by using curve_fit to find solutions of the form

t_fft = 2*(N*np.log(N) + k*np.log(k))
t_direct = k**2 * N**2
ratio = c * t_fft / t_direct # curve_fit says c \approx 71.468 (on my machine)

I implemented this change, and used m*n, assuming an M x N image (and likewise for the kernel). This change has not been tested; I ran into issues and thought I'd run it past you folks first.

@stefanv
Copy link
Member

stefanv commented Dec 11, 2015

@scottsievert Will you please upload the code you used to a gist? Thanks for the detailed analysis! Of course, this may vary from machine to machine, but a rough rule of thumb is good enough for our purposes. Nathaniel Smith mentioned that we may want to push this upstream to SciPy as well into a unified convolve functions (I guess that would also mean looking into the accuracy of the various functions).

@stsievert
Copy link
Contributor Author

stsievert commented Dec 11, 2015

Here's a gist of the code I used to generate the plot.

Pushing this upstream would make sense -- the user only wants to convolve and shouldn't care about the implementation details (i.e., whether it's implemented with fft or a direct method)... I like the keyword argument.

def fft_time(m, n, k, l):
return m*np.log(m) + n*np.log(n) + k*np.log(k) + l*np.log(l)

time_ratio = 71.468 * fft_time(*image.shape, *psf.shape)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line seems to be responsible for travis failures, I don't understand why...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time_ratio = 71.468 * fft_time(*tuple(list(image.shape) + list(psf.shape)))

solves the issue. I'm not sure you're allowed to unpack two tuples in a list of arguments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emmanuelle You don't need to recast as tuple, and, even more, you don't need to cast as list!

>>> (5, 9) + (5, 8)
(5, 9, 5, 8)

So fft_time(*(image.shape + psf.shape)) should work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I would change all this quite dramatically, since it takes almost no effort to make this function work on nD images instead of just 2D:

def direct_time(im, psf):
    return np.product(im.shape + psf.shape)
def fft_time(im, psf):
    return sum(k*np.log(k) for k in im.shape + psf.shape)
def time_ratio(im, psf):
    # constant factor obtained by curve fitting, see
    # https://github.com/scikit-image/scikit-image/pull/1792
    return 71.468 * fft_time(im, psf) / direct_time(im, psf)
convolve_function = fftconvolve if time_ratio(im, psf) < 1 else convolve

By using the convolve function and the more general function definitions, we get nD functionality essentially for free!

The only downside is that the constant factors might change now. =\

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jni thanks for the improvement. I think I had seen this syntax (casting tuples to list to add them and then to tuple again) somewhere else in skimage's code and since then I replicated this pattern. I'll have a quick look to see if we can discard other clumsy castings of the same type.

For example here
https://github.com/scikit-image/scikit-image/blob/master/skimage/util/shape.py#L260

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emmanuelle fascinating! I just double-checked that the tuple addition syntax works in 2.6 so I think this is indeed totally unnecessary, unless I'm missing something!

@stsievert
Copy link
Contributor Author

Changes are implemented; hopefully the builds pass. I'll also be forking scipy to add convolve(..., method='fft').

@jni
Copy link
Member

jni commented Dec 13, 2015

@scottsievert Would you like to tackle nD support (as detailed above)? It's ok if not, I can raise an issue to track it. Other than that I think this is good to go.

@stsievert
Copy link
Contributor Author

My thoughts were to to add a parameter to scipy.signal.convolve as mentioned in the scipy issue. I haven't developed this yet, but I was thinking of adding something like:

def convolve(..., method='fft')
    if method == 'fft':
        return fftconvolve(...)
    # ...

@jni
Copy link
Member

jni commented Dec 13, 2015

@scottsievert that's a separate issue. fftconvolve and convolve both support nD images, so you can keep the same logic as now, but by changing these three things:

  • use convolve instead of convolve2d
  • change the timing functions to be nD (as above)
  • fiddle with the time ratio constant

the function will now work with both 2D and 3D images.

def fft_time(m, n, k, l):
return m*np.log(m) + n*np.log(n) + k*np.log(k) + l*np.log(l)

time_ratio = 71.468 * fft_time(*(image.shape + psf.shape))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some explanation as a comment here for future readers of the code. This looks pretty magic without knowing about this whole discussion here. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 3fd4709

@stsievert
Copy link
Contributor Author

Ah, I see. Thanks for the clarification. I'll develop that sometime shortly, probably tomorrow.

@jni
Copy link
Member

jni commented Dec 13, 2015

@scottsievert brilliant, thank you!! =)

@soupault soupault added the ⏩ type: Enhancement Improve existing features label Dec 13, 2015
@stsievert
Copy link
Contributor Author

Commit makes the following changes:

  • If the 3D or greater, it automatically selects fftconvolve (see plot below)
  • Cleans and add comments (longer one in source, short one mentioning works for N dimensions in docstring).
  • changes the constant to fit the data I collected today. The constant only changed by a factor of 1.75 and has a better fit (see second plot)

3D timing

I reran the timing tests for 3 dimensions and found that fftconvolve is faster in all cases except when convolving a 3x3 kernel with a 3x3 image. The plot below is the ratio and doesn't look at how fast each data point is.

figure_4

Constant change

I also changed the constant: with these new tests I found the best prediction accuracy (in the least squares sense) with the constant equal to 40.032.

figure_2

I have updated the gist to reflect these changes.


# see whether the fourier transform convolution method or the direct
# convolution method is faster (discussed in scikit-image PR #1792)
time_ratio = 40.032 * fft_time(image.shape, psf.shape))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you made a typo here — extra parenthesis!

@jni
Copy link
Member

jni commented Dec 13, 2015

I still think we should switch the method in the size-3 image vs size-3 kernel...

kidding! =D Thanks for the great work, @scottsievert! Just one typo to fix then we can get this in!

@stsievert
Copy link
Contributor Author

Typo fixed!

@jni
Copy link
Member

jni commented Dec 14, 2015

I've confirmed that the rebase fixes all the Travis tests (minus the 2.7 amg issue) so I'm merging. Thank you @scottsievert!

jni added a commit that referenced this pull request Dec 14, 2015
Uses fftconvolve instead of convolve2d for speedups
@jni jni merged commit 975d1a4 into scikit-image:master Dec 14, 2015
@jni jni mentioned this pull request Dec 1, 2018
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⏩ type: Enhancement Improve existing features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants