Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

ENH: Speed up fftconvolve by using 5-smooth length to pad FFT #3144

Merged
merged 2 commits into from

4 participants

@endolith

Fix for #2146

In [4]: in1 = rand(2**14+1) + 1j*rand(2**14+1)

In [5]: timeit fftconvolve(in1, in1)
100 loops, best of 3: 19.1 ms per loop

In [6]: timeit fftconvolvefast(in1, in1)
100 loops, best of 3: 3.75 ms per loop

In [10]: assert_array_almost_equal(fftconvolve(in1, in1), fftconvolvefast(in1, in1))

In [11]: 19.1/3.75
5.093333333333334
endolith added some commits
@endolith endolith MAINT: Use "shape" for variables derived from array.shape (not array.…
…size)
01a9389
@endolith endolith ENH: Speed up fftconvolve by using 5-smooth length to pad FFT
instead of padding to the next power of 2.  FFTPACK can handle multiples and powers of 3 and 5 just as efficiently as powers of 2.
e86fe48
@coveralls

Coverage Status

Coverage remained the same when pulling e86fe48 on endolith:smooth_fft into a0a79a8 on scipy:master.

@pv
Owner
pv commented

Could you add test cases: (i) some for the _next_regular function, and (ii) some for fftconvolve with nasty sizes.

@endolith

Ok. For _next_regular I have a really overkill test that tests every integer up to 51200000 and compares to http://oeis.org/A051037/b051037.txt and then a bunch of really big integers too, but I'll try to think of just a few special cases.

For fftconvolve, how to test it though? It should produce identical results as the original, just faster.

@argriffing
Collaborator

For fftconvolve, how to test it though?
It should produce identical results as the original, just faster.

If it is a regression test, then could it be reasonable to keep the old implementation but just rename it and make it not public, and only use it for regression testing? This would not test the speedup though.

@endolith

Well it's just a change in implementation, so the existing tests should cover it, right? They exist to show that any change in implementation doesn't break things that are known to work? test_random_data already compares the output offftconvolve with the direct convolve.

@argriffing
Collaborator

Oh that makes sense. I don't know if this is set up for scipy, but I've heard that vbench can be helpful for keeping tabs on speeds of functions across versions of libraries, like this for numpy. There are also some bench*.py speed benchmarking modules analogous to test*.py unit testing modules in scipy, although the benchmarking ones are much less comprehensive and possibly are not supported by as much infrastructure as the unit test modules. I guess that some have already been written for fftpack here.

@endolith

Could maybe do a test where it does a timeit measurement of different input sizes (power of 2, (power of 2)+1, nearby prime number) and makes sure the times are not enormously different? Then the test would be consistent across different hardware?

(power of 2)+1 is the biggest speed difference compared to the previous implementation, because it only pads up to the next regular number instead of almost doubling it to the next power of 2:

fftconvolvereg vs fftconvolve real compute11a 2 7 3 2013-12-09 20-51 cropped

@josef-pkt josef-pkt referenced this pull request in statsmodels/statsmodels
Merged

REF: ensure O(N log N) when using fft for acf #1196

@pv
Owner
pv commented
@endolith

Yeah there are tests for fftconvolve at https://github.com/scipy/scipy/blob/master/scipy/signal/tests/test_signaltools.py#L155 but I'll see if they can be expanded

@endolith

@cournape has some similar functions in https://github.com/cournape/numpy/blob/bluestein/numpy/fft/helper.py. Maybe next_regular should be a public function in numpy.fft.helper, too?

...but then this optimization would depend on a certain numpy version, so no...

@pv pv merged commit e86fe48 into scipy:master

1 check passed

Details default The Travis CI build passed
@endolith

Sorry, I meant to add some tests. It looks like the one you added just tests that the result is >= to the original number, and regular, but not that it's the next regular? This is what I was going to submit:

def test_next_regular():
    hams = {
        1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 8, 8: 8, 14: 15, 15: 15,
        16: 16, 17: 18, 1021: 1024, 1536: 1536, 51200000: 51200000, 
        510183360: 510183360, 510183360+1: 512000000, 511000000: 512000000,
        854296875: 854296875, 854296875+1: 859963392, 
        196608000000: 196608000000, 196608000000+1: 196830000000,
        8789062500000: 8789062500000, 8789062500000+1: 8796093022208,
        206391214080000: 206391214080000, 206391214080000+1: 206624260800000,
        470184984576000: 470184984576000, 470184984576000+1: 470715894135000,
        7222041363087360: 7222041363087360, 
        7222041363087360+1: 7230196133913600,

        # power of 5    5**23
        11920928955078125: 11920928955078125,
        11920928955078125-1: 11920928955078125,

        # power of 3    3**34
        16677181699666569: 16677181699666569,
        16677181699666569-1: 16677181699666569,

        # power of 2   2**54
        18014398509481984: 18014398509481984,
        18014398509481984-1: 18014398509481984,

        # above this, int(ceil(n)) == int(ceil(n+1))
        19200000000000000: 19200000000000000,
        19200000000000000+1: 19221679687500000,

        288230376151711744:   288230376151711744,
        288230376151711744+1: 288325195312500000,
        288325195312500000-1: 288325195312500000,
        288325195312500000:   288325195312500000,
        288325195312500000+1: 288555831593533440,

        # power of 3    3**83
        3990838394187339929534246675572349035227-1:
            3990838394187339929534246675572349035227,
        3990838394187339929534246675572349035227:
            3990838394187339929534246675572349035227,

        # power of 2     2**135
        43556142965880123323311949751266331066368-1:
            43556142965880123323311949751266331066368,
        43556142965880123323311949751266331066368:
            43556142965880123323311949751266331066368,

        # power of 5      5**57
        6938893903907228377647697925567626953125-1:
            6938893903907228377647697925567626953125,
        6938893903907228377647697925567626953125:
            6938893903907228377647697925567626953125,

        # http://www.drdobbs.com/228700538
        # 2**96 * 3**1 * 5**13
        290142196707511001929482240000000000000-1:
            290142196707511001929482240000000000000,
        290142196707511001929482240000000000000:
            290142196707511001929482240000000000000,
        290142196707511001929482240000000000000+1:
            290237644800000000000000000000000000000,

        # 2**36 * 3**69 * 5**7
        4479571262811807241115438439905203543080960000000-1:
            4479571262811807241115438439905203543080960000000,
        4479571262811807241115438439905203543080960000000:
            4479571262811807241115438439905203543080960000000,
        4479571262811807241115438439905203543080960000000+1:
            4480327901140333639941336854183943340032000000000,

        # 2**37 * 3**44 * 5**42
        30774090693237851027531250000000000000000000000000000000000000-1:
            30774090693237851027531250000000000000000000000000000000000000,
        30774090693237851027531250000000000000000000000000000000000000:
            30774090693237851027531250000000000000000000000000000000000000,
        30774090693237851027531250000000000000000000000000000000000000+1:
            30778180617309082445871527002041377406962596539492679680000000,
        }

    for x, y in hams.items():
        assert_equal(_next_regular(x), y)
@endolith endolith deleted the endolith:smooth_fft branch
@pv
Owner
pv commented

Thanks, added in e31a4ae

@rmcgibbo rmcgibbo referenced this pull request in rmcgibbo/pyhmc
Merged

there is a hidden function in scipy for this #5

@endolith

That mailing list post said the optimal size is 7 smooth, but from what I could find in fftpack docs and in testing, it seemed to be 5 smooth sizes. Can anyone confirm?

@argriffing
Collaborator

@endolith I'm not sure but I could imagine this could be a distinction among the underlying fft implementations. For example fftpack vs. fftw. I think the pdf mentioned on the mailing list referred to fftw, whereas scipy includes fftpack but not fftw for licensing reasons.

@endolith

well FFTW's optimal size is clearly stated:

FFTW is best at handling sizes of the form 2a 3b 5c 7d 11e 13f, where e+f is either 0 or 1, and the other exponents are arbitrary.

FFTPACK I'm not 100% sure about. This commit says "Composite numbers of 2, 3, and 5 are accepted, as FFTPACK has those", but the actual FFTPACK documentation just says:

n the length of the array x to be transformed. the method is most efficient when n is a product of small primes.

So I'm not sure which small primes. When I tried testing multiples of 5 vs 7, it seemed that it was not optimized for 7, but I don't remember how I tested exactly.

@argriffing
Collaborator

That mailing list post said the optimal size is 7 smooth

Does it actually say this?

@endolith

I meant this post:

The performance of fftpack depends very strongly on the array size -- sizes
that are powers of two are good, but also powers of three, five and seven,
or numbers whose only prime factors are from (2,3,5,7).

@argriffing
Collaborator

I don't think the post makes claims about {2, 3, 5, 7} vs. {2, 3, 5} or {2, 3, 5, 7, 11} with respect to fftpack, so I'd believe your tests!

@endolith

@argriffing Fortran is gibberish to me, but I looked more carefully through https://github.com/scipy/scipy/blob/v0.14.0/scipy/fftpack/src/ and also found the java translation on netlib which actually has comments, and it looks like it's optimized for multiples of {4, 2, 3, 5}.

jfftpack comments:

passf2: Complex FFT's forward/backward processing of factor 2;
passf3: Complex FFT's forward/backward processing of factor 3;
passf4: Complex FFT's forward/backward processing of factor 4;
passf5: Complex FFT's forward/backward processing of factor 5;
passfg: Complex FFT's forward/backward processing of general factor;

in order of passf4 -> passf2 -> passf3 -> passf5 -> passfg

SciPy's fftpack likewise has PASSB4 -> PASSB2 -> PASSB3 -> PASSB5 -> PASSB

so I think that settles it. Gnu Scientific Library has a GPL version of fftpack which includes factors up to 7, but of course we can't use it. http://git.savannah.gnu.org/cgit/gsl.git/tree/fft

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Dec 9, 2013
  1. @endolith
Commits on Dec 12, 2013
  1. @endolith

    ENH: Speed up fftconvolve by using 5-smooth length to pad FFT

    endolith authored
    instead of padding to the next power of 2.  FFTPACK can handle multiples and powers of 3 and 5 just as efficiently as powers of 2.
This page is out of date. Refresh to see the latest.
Showing with 58 additions and 8 deletions.
  1. +58 −8 scipy/signal/signaltools.py
View
66 scipy/signal/signaltools.py
@@ -118,7 +118,7 @@ def correlate(in1, in2, mode='full'):
except KeyError:
raise ValueError("Acceptable mode flags are 'valid',"
" 'same', or 'full'.")
-
+
if rank(in1) == rank(in2) == 0:
return in1 * in2
elif not in1.ndim == in2.ndim:
@@ -157,6 +157,56 @@ def _centered(arr, newsize):
return arr[tuple(myslice)]
+def _next_regular(target):
+ """
+ Find the next regular number greater than or equal to target.
+ Regular numbers are composites of the prime factors 2, 3, and 5.
+ Also known as 5-smooth numbers or Hamming numbers, these are the optimal
+ size for inputs to FFTPACK.
+
+ Target must be a positive integer.
+ """
+ if target <= 6:
+ return target
+
+ # Quickly check if it's already a power of 2
+ if not (target & (target-1)):
+ return target
+
+ match = float('inf') # Anything found will be smaller
+ p5 = 1
+ while p5 < target:
+ p35 = p5
+ while p35 < target:
+ # Ceiling integer division, avoiding conversion to float
+ # (quotient = ceil(target / p35))
+ quotient = -(-target // p35)
+
+ # Quickly find next power of 2 >= quotient
+ try:
+ p2 = 2**((quotient - 1).bit_length())
+ except AttributeError:
+ # Fallback for Python <2.7
+ p2 = 2**(len(bin(quotient - 1)) - 2)
+
+ N = p2 * p35
+ if N == target:
+ return N
+ elif N < match:
+ match = N
+ p35 *= 3
+ if p35 == target:
+ return p35
+ if p35 < match:
+ match = p35
+ p5 *= 5
+ if p5 == target:
+ return p5
+ if p5 < match:
+ match = p5
+ return match
+
+
def fftconvolve(in1, in2, mode="full"):
"""Convolve two N-dimensional arrays using FFT.
@@ -209,20 +259,20 @@ def fftconvolve(in1, in2, mode="full"):
s2 = array(in2.shape)
complex_result = (np.issubdtype(in1.dtype, np.complex) or
np.issubdtype(in2.dtype, np.complex))
- size = s1 + s2 - 1
+ shape = s1 + s2 - 1
if mode == "valid":
_check_valid_mode_shapes(s1, s2)
- # Always use 2**n-sized FFT
- fsize = 2 ** np.ceil(np.log2(size)).astype(int)
- fslice = tuple([slice(0, int(sz)) for sz in size])
+ # Speed up FFT by padding to optimal size for FFTPACK
+ fshape = [_next_regular(int(d)) for d in shape]
+ fslice = tuple([slice(0, int(sz)) for sz in shape])
if not complex_result:
- ret = irfftn(rfftn(in1, fsize) *
- rfftn(in2, fsize), fsize)[fslice].copy()
+ ret = irfftn(rfftn(in1, fshape) *
+ rfftn(in2, fshape), fshape)[fslice].copy()
ret = ret.real
else:
- ret = ifftn(fftn(in1, fsize) * fftn(in2, fsize))[fslice].copy()
+ ret = ifftn(fftn(in1, fshape) * fftn(in2, fshape))[fslice].copy()
if mode == "full":
return ret
Something went wrong with that request. Please try again.