-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uses fftconvolve instead of convolve2d for speedups #1792
Conversation
Convolution using FFT is not necessarily faster than ordinary convolution. It depends on the relative sizes of the kernels. Both inputs must be large for FFT to be faster. If one input is ~ 1x1 compared to the other, than ordinary convolution has a time complexity of ~O(n), whereas FFTs time complexity is O(n log n). For optimal speed both methods should be considered. |
Do you know of a good heuristic to determine when to switch?
|
This is from the OpenCV documentation on filter2D:
They do not seem to provide any clear reasoning though. Based on how FFT works, I think it should mostly be dependent on the size ratio between the inputs. |
Would you like to play around and come up with a sensible heuristic? I
don't think it needs to be perfect.
Padding is required before applying the Fourier transform when doing
convolutions, so you have more data to work with. But then you also
have a very efficient algorithm, probably implemented well in a
low-level language. It may also be worth double checking whether
fftconvolve intelligently pads to sizes for which the FFT can be
executed rapidly (so called smooth numbers, see
https://en.wikipedia.org/wiki/Cooley%E2%80%93Tukey_FFT_algorithm).
|
It looks like fftconvolve does the required zero padding (source). It pads to the next power of 2 large. True, convolve2d can be faster depending on kernel size. In my tests with a 500x500 image and a 4x4 kernel, I found fftconvolve and convolve2d to be (roughly) equally fast. |
We should definitely improve fftconvolve as a start. There is no need to
pad to a power of two when instead we can pad to a nearby smooth number.
|
I believe the previously linked line no. 348 in that same file is the required internal padding for any convolution calculation. I think 4x4 against 500x500 is somewhat of a minimal case for deconvolution. Real problems will usually involve larger PSFs or images. If that's where fftconvolve starts to pull ahead, IMO we should just use fftconvolve. |
I would agree with @JDWarner -- I think 4x4 convolution is fast enough for both fftconvolve and convolve2d... the real slowdown arises when larger PSFs are used. I discovered this bug while convolving a 512x512 image with a 512x512 PSF. |
That sounds reasonable—let's go ahead then.
|
I've looked at the test output, and the error doesn't seem related to what I committed. Before we merge, let me test the timing out more. I'm still not convinced what I said earlier is true... the timing depends on the kernel size for convolve2d, not the image size. |
@scottsievert @stefanv I was going to leave this in your capable hands but given the latest discussion here's what I would want to see before merging / adding logic to decide which method to use: an image of log2(time_fft / time_direct) for varying sizes of image (rows) and kernel (columns), starting at (3, 3) for both. Then display with the red/blue diverging colormap. That should give us a good idea about the performance characteristics in different scenarios. |
Extra kudos to satisfying @jni's highly specific requirements :)
|
Brilliant! That was quick. I'd add "and kernels bigger than 3". Even though the advantage is small for large images, it can accumulate when repeated often, and a kernel size of (3, 3) is extremely common. Here's what I would read from this chart: if max(kernel_size) == 3 or image.size < 2000:
use_direct_convolve()
else:
use_fft_convolve() for suitable function definitions. =) But, others may disagree. At least now we can disagree with data. =D Thanks @scottsievert! |
I just did it naturally using pandas/seaborn! Maybe thank @jni for having easy requirements? |
Interestingly, from the visualization provided by @scottsievert, I looks like 261 is a turning point. The next step in image size after 630 should see that a kernel of size 7 is faster with direct than fft convolution (eyeballing it). I've expected such an effect but haven't noticed it in my plots though (I haven't been as elegant with mine, so I've kept them in my "basement"). |
When running once, it doesn't matter as much when N is small. I think I'd make a decision rule to use fftconvolve if k >= 7. But that said, we function that takes two parameters. I can use scipy.optimize.curve_fit to find the more precise rule. |
I've used scipy.optimize.curve_fit to find a more precise rule, from the data in this plot: The below plot tries to predict the ratio, and would look to see whether the predicted ratio is above or below method and choose fftconvolve/convolve2d accordingly. This plot accurately predicts all the ratios (from the plot above) except for three points. At worst, this misclassifications are off by a factor of roughly 3 (but not 2^10). This decision boundary was obtained by using curve_fit to find solutions of the form t_fft = 2*(N*np.log(N) + k*np.log(k))
t_direct = k**2 * N**2
ratio = c * t_fft / t_direct # curve_fit says c \approx 71.468 (on my machine) I implemented this change, and used |
@scottsievert Will you please upload the code you used to a gist? Thanks for the detailed analysis! Of course, this may vary from machine to machine, but a rough rule of thumb is good enough for our purposes. Nathaniel Smith mentioned that we may want to push this upstream to SciPy as well into a unified |
Here's a gist of the code I used to generate the plot. Pushing this upstream would make sense -- the user only wants to convolve and shouldn't care about the implementation details (i.e., whether it's implemented with fft or a direct method)... I like the keyword argument. |
def fft_time(m, n, k, l): | ||
return m*np.log(m) + n*np.log(n) + k*np.log(k) + l*np.log(l) | ||
|
||
time_ratio = 71.468 * fft_time(*image.shape, *psf.shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line seems to be responsible for travis failures, I don't understand why...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
time_ratio = 71.468 * fft_time(*tuple(list(image.shape) + list(psf.shape)))
solves the issue. I'm not sure you're allowed to unpack two tuples in a list of arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@emmanuelle You don't need to recast as tuple, and, even more, you don't need to cast as list!
>>> (5, 9) + (5, 8)
(5, 9, 5, 8)
So fft_time(*(image.shape + psf.shape))
should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, I would change all this quite dramatically, since it takes almost no effort to make this function work on nD images instead of just 2D:
def direct_time(im, psf):
return np.product(im.shape + psf.shape)
def fft_time(im, psf):
return sum(k*np.log(k) for k in im.shape + psf.shape)
def time_ratio(im, psf):
# constant factor obtained by curve fitting, see
# https://github.com/scikit-image/scikit-image/pull/1792
return 71.468 * fft_time(im, psf) / direct_time(im, psf)
convolve_function = fftconvolve if time_ratio(im, psf) < 1 else convolve
By using the convolve
function and the more general function definitions, we get nD functionality essentially for free!
The only downside is that the constant factors might change now. =\
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jni thanks for the improvement. I think I had seen this syntax (casting tuples to list to add them and then to tuple again) somewhere else in skimage's code and since then I replicated this pattern. I'll have a quick look to see if we can discard other clumsy castings of the same type.
For example here
https://github.com/scikit-image/scikit-image/blob/master/skimage/util/shape.py#L260
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@emmanuelle fascinating! I just double-checked that the tuple addition syntax works in 2.6 so I think this is indeed totally unnecessary, unless I'm missing something!
Changes are implemented; hopefully the builds pass. I'll also be forking scipy to add |
@scottsievert Would you like to tackle nD support (as detailed above)? It's ok if not, I can raise an issue to track it. Other than that I think this is good to go. |
My thoughts were to to add a parameter to scipy.signal.convolve as mentioned in the scipy issue. I haven't developed this yet, but I was thinking of adding something like: def convolve(..., method='fft')
if method == 'fft':
return fftconvolve(...)
# ... |
@scottsievert that's a separate issue.
the function will now work with both 2D and 3D images. |
def fft_time(m, n, k, l): | ||
return m*np.log(m) + n*np.log(n) + k*np.log(k) + l*np.log(l) | ||
|
||
time_ratio = 71.468 * fft_time(*(image.shape + psf.shape)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some explanation as a comment here for future readers of the code. This looks pretty magic without knowing about this whole discussion here. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in 3fd4709
Ah, I see. Thanks for the clarification. I'll develop that sometime shortly, probably tomorrow. |
@scottsievert brilliant, thank you!! =) |
Commit makes the following changes:
3D timingI reran the timing tests for 3 dimensions and found that fftconvolve is faster in all cases except when convolving a 3x3 kernel with a 3x3 image. The plot below is the ratio and doesn't look at how fast each data point is. Constant changeI also changed the constant: with these new tests I found the best prediction accuracy (in the least squares sense) with the constant equal to 40.032. I have updated the gist to reflect these changes. |
|
||
# see whether the fourier transform convolution method or the direct | ||
# convolution method is faster (discussed in scikit-image PR #1792) | ||
time_ratio = 40.032 * fft_time(image.shape, psf.shape)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you made a typo here — extra parenthesis!
I still think we should switch the method in the size-3 image vs size-3 kernel... kidding! =D Thanks for the great work, @scottsievert! Just one typo to fix then we can get this in! |
Typo fixed! |
I've confirmed that the rebase fixes all the Travis tests (minus the 2.7 amg issue) so I'm merging. Thank you @scottsievert! |
Uses fftconvolve instead of convolve2d for speedups
This pull request uses
scipy.signal.fftconvolve
instead ofscipy.signal.convolve2d
in skimage/restoration/deconvolution.py. These are equivalent functions, as noted in this StackOverflow answer butfftconvolve
is much faster. Quoting the documentation page for fftconvolve,This function change gives equivalent results (up a to a machine epsilon) as indicated by the test below:
When I use
N = 1e3
, I see speedups of roughly 6x.