Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds keyword argument to choose faster convolution method #5608

Merged
merged 13 commits into from Jul 20, 2016

Conversation

stsievert
Copy link
Contributor

As mentioned in #2651, this PR adds the keyword argument method to scipy.signal.convolve to choose the choose the convolution method. method can take vales 'auto', 'fft' or 'direct'.

In scikit-image PR 1792 we chose the faster convolution method, either the direct method or with fftconvolve. We merged these changes in, and I am merging these changes upstream.

Timing comparison

In the plot below, the ratio fft_time / direct_time is plotted with a log scaling. (more detail can be found in the scikit-image PR). These plots are lifted from the scikit-image PR.

06ad4864-9f78-11e5-99a4-abbd7a5f63ed

In the plot below, the fit values to this ratio are shown. If this ratio is less than 1, this PR chooses to use fftconvolve.

7cfdbe20-a1aa-11e5-918c-cfd2f187b560

The complete code and data to generate these plots are in a gist. These timings were performed on a Mid-2012 Macbook Air.

Tests

Tests on my machine failed to build. However, my own test passes:

for pwr in [4, 5, 6, 7, 8, 9, 10, 11, 12]:
    for _ in range(int(1e3)):
        x = np.random.rand(2**pwr)
        h = np.random.rand((2**pwr)//10)

        y1 = convolve(x, h, method='fft')
        y2 = convolve(x, h, method='direct')

        assert np.allclose(y1, y2), "convolve should give equivalent answers"

Documentation

This PR includes documentation of this keyword argument in the docstring. I tried to follow the form of the mode keyword argument.

@rgommers rgommers added enhancement A new feature or improvement scipy.signal labels Dec 15, 2015
@rgommers
Copy link
Member

This looks interesting, potentially large speedups.

Tests on my machine failed to build.

They seem to pass on TravisCI without any adjustment, which is slightly surprising. Do you have the test failures? Also, did you check how much the relative precision changes for a range of input arrays?

@stsievert
Copy link
Contributor Author

I found that the relative precision difference was small (on the order of 1e-15).

np.random.seed(42)
pwr = 16
x = np.random.rand(2**pwr)
h = np.random.rand((2**pwr)//10)
y_fft = convolve(x, h, method='fft')
y_direct = convolve(x, h, method='direct')

error = abs(y_fft - y_direct) / y_direct

print("error std. dev = {:0.4e}".format(np.sqrt(error.var())))
print("error mean = {:0.4e}".format(error.mean()))
print("error max = {:0.4e}".format(error.max()))
print("error median = {:0.4e}".format(np.median(error)))

# prints the following output:
# error std. dev = 1.3594e-14
# error mean = 6.9658e-16
# error max = 3.0790e-12
# error median = 2.8107e-16

The tests failed to build on my machine for an unrelated issue. I ran into some issue with gcc and Homebrew.

# convolution method is faster (discussed in scikit-image PR #1792)
direct_time = np.prod(volume.shape + kernel.shape)
fft_time = np.sum([n*np.log(n) for n in volume.shape + kernel.shape])
time_ratio = 40.032 * fft_time / direct_time
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be a good idea to make 40.032 a named constant, in case the code is ever parted from the comment in a future refactor...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this may be a premature micro-optimization, but if instead of time_ratio you computed

time_diff = 40.032 * fft_time - direct_time

you could later use time_diff < 0 as your condition, replacing an always expensive division by a cheap subtraction, and without hurting readability much, if at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or I could just do a direct comparison! fft_time < direct_time! Also, the constant has been named.

@jaimefrio
Copy link
Member

The code LGTM, but I'll let someone more current on things SciPy take the decision to merge it.

@@ -416,6 +416,18 @@ def convolve(in1, in2, mode='full'):
``same``
The output is the same size as `in1`, centered
with respect to the 'full' output.
method : str {'auto', 'fft', 'direct'}, optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this default to 'direct' to not change the current behavior?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for keeping old default

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the default to direct.

@larsoner
Copy link
Member

@scottsievert do you need others to run a benchmark script to tune your algorithm? I have a 6-core desktop running Linux (self-built libs) and a 2-core laptop running Windows (Anaconda) that I could use if you want more data.

@larsoner
Copy link
Member

And have you looked at all at whether the use of MKL affects it? I suppose it might be a bit too fine-grained for the optimization scheme you're going for, though.

@larsoner
Copy link
Member

Actually @scottsievert do you have a link to the code used to generate times.p? I can only seem to find the code for doing the plot in the linked issue and gist, not the code to actually generate the times. I'd like to extend it to (and test) 1D as well as 2D/3D, and validate the existing results on my system(s).

@stsievert
Copy link
Contributor Author

I updated the gist to include code to generate times.p (and it also includes a link to the Jupyter notebook). I tested the mentioned times on a Macbook Air with 8GB memory and Anaconda installed (without MKL).

I only did limited testing in the skimage PR (I only ran two tests). I just wanted a very rough rule that could detect when speedups of 10x or greater where present.

@tacaswell
Copy link
Contributor

Did you also look at larger images? In my domain the scientifically relevant image sizes are typically (1-2k) x (1-2k) .

@stsievert
Copy link
Contributor Author

Here's the image from a speed test we did again:

The speed gains only increase for larger images and larger kernels. fftconvolve is faster for large images and large kernels -- even for a 200x200 image and 25x25 kernel, fftconvolve took roughly a factor of 2^-4 or 2^-5 as long and was much faster. We didn't test larger images because convolve took too long.

@larsoner
Copy link
Member

I tweaked the script and ran it on my MKL system, and the decision boundary looks good (the shortest 1D case is for a 3 sample signal, which is why I suspect it's off -- but it shouldn't really matter which version you use there I guess):

figure_2

figure_4

figure_6

@tacaswell
Copy link
Contributor

We went down a similar route on trackpy and came to the opposite conclusion (see soft-matter/trackpy#219 and links @danielballan and @caspervdw can comment in more detail than I can). I think the difference is that we were only using separable kernels so direct 1D convolution still won.

@larsoner
Copy link
Member

@tacaswell I think the above plot is saying that direct 1D convolution does win over FFT-based, since it's a plot of t_fft / t_direct (and above unity most of the instances)...?

@WarrenWeckesser
Copy link
Member

FWIW: http://scipy-cookbook.readthedocs.org/items/ApplyFIRFilter.html

The plot is a few years old. I haven't run that script in quite a while, so I don't know if it would look the same now.

@larsoner
Copy link
Member

Hmm, so maybe we need some additional testing, tuning, and/or triage in the 1D case.

@WarrenWeckesser
Copy link
Member

I think the default method should be 'auto'. The existing docstring only says that the function computes the convolution. Mathematically, both the direct method and the FFT method are equivalent, so we don't really have a problem with backwards compatibility. Of course, the results will not be exactly the same, but scipy does not promise bit-level reproducibility between versions. 'auto' is what we want in the long term; I think we can safely do that now.

@larsoner
Copy link
Member

larsoner commented Dec 18, 2015 via email

@WarrenWeckesser
Copy link
Member

Modified FIR script is here: https://gist.github.com/WarrenWeckesser/e2f4de588967ea01226e

I dropped lfilter from the script. Thanks to @ewmoore, lfilter is now smart about handling the case when the a argument of lfilter is a scalar, and the results for lfilter are pretty much the same as those of np.convolve.

Here's the result when the script is run on a Macbook Pro (OS X 10.9.5), using Anaconda's packages (scipy 0.16.1, numpy 1.10.2):

figure_1

@stsievert
Copy link
Contributor Author

Hm, interesting. Looks like these changes should be merged upstream to numpy (after changing the constant).

If bit-level reproducibility is not required, I believe the default for method should be auto, which I agree would be desired long-term.

@rgommers
Copy link
Member

If bit-level reproducibility is not required, I believe the default for method should be auto, which I agree would be desired long-term.

If it all matches with relative precision of 1e-15 as you said above, that's OK. Bit-level reproducibility is anyway not guaranteed - depends on OS, compilers, CPU, etc.

@stsievert
Copy link
Contributor Author

I've done more testing. I've found that the 1D kernel estimation tends to be inaccurate. For 1D, I had to modify the big O constant by a factor of 26 and even then it tends to break down with small inputs. I think it'd be best to add an if statement to catch these cases; maybe if kernel.ndims == 1 and kernel.size*signal.size < 5e3? I believe these small 1D cases are off because of additional overhead in the FFT function.

fft_direct_shape1
fft_direct_shape2
fft_direct_shape3

@stsievert
Copy link
Contributor Author

I have made changes to the big O constant for the 1D case. What else should be done before this PR can be merged?

@endolith
Copy link
Member

endolith commented Feb 9, 2016

It should have tests to prove that it works identically for all possible combinations of settings, dimensions, dtypes, etc.

Ideally the keyword would also be added to correlate, correlate2d and convolve2d.

@stsievert
Copy link
Contributor Author

Test added for the keyword argument method. I based it off the other tests which don't test for every case and only write down one case.

Agreed, ideally they would... but that would require elaborate testing plus correlate/etc is implemented in C.

@endolith
Copy link
Member

endolith commented Feb 9, 2016

Well correlation and convolution are basically the same thing: correlate(a, b) == fftconvolve(a, b[::-1]). convolve is actually implemented using correlate, no?

@stsievert
Copy link
Contributor Author

@rgommers I've resolved those issues and the tests on 64-bit systems pass. Can you verify that the checks pass 32-bit systems?

@rgommers
Copy link
Member

That fixed it, thanks @stsievert

@larsoner
Copy link
Member

Now that @rgommers is happy, I'll merge later today since it looks like all remaining comments have been addressed

@larsoner larsoner merged commit 2f36b17 into scipy:master Jul 20, 2016
@larsoner
Copy link
Member

Thanks @stsievert!

@rgommers
Copy link
Member

In at last:) Thanks @stsievert, @endolith and everyone else who helped.

@ev-br ev-br added this to the 0.19.0 milestone Jul 20, 2016
@pv
Copy link
Member

pv commented Jul 21, 2016 via email

@endolith
Copy link
Member

@pv I would guess that most of the tests are small N, and would benefit from the small N shortcuts that we didn't put in _fftconv_faster yet.

@larsoner larsoner mentioned this pull request Sep 6, 2016
stsievert pushed a commit to stsievert/scipy that referenced this pull request Nov 29, 2016
stsievert pushed a commit to stsievert/scipy that referenced this pull request Nov 30, 2016
stsievert pushed a commit to stsievert/scipy that referenced this pull request Nov 30, 2016
rgommers added a commit that referenced this pull request Nov 30, 2016
Linkid pushed a commit to Linkid/scipy that referenced this pull request Feb 9, 2017
@stsievert stsievert deleted the convolve-method branch May 3, 2021 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.signal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet