ENH: Speed up fftconvolve by using 5-smooth length to pad FFT #3144

Merged
merged 2 commits into from Feb 2, 2014

Projects

None yet

4 participants

@endolith
Contributor

Fix for #2146

In [4]: in1 = rand(2**14+1) + 1j*rand(2**14+1)

In [5]: timeit fftconvolve(in1, in1)
100 loops, best of 3: 19.1 ms per loop

In [6]: timeit fftconvolvefast(in1, in1)
100 loops, best of 3: 3.75 ms per loop

In [10]: assert_array_almost_equal(fftconvolve(in1, in1), fftconvolvefast(in1, in1))

In [11]: 19.1/3.75
5.093333333333334
endolith added some commits Nov 9, 2013
@endolith endolith MAINT: Use "shape" for variables derived from array.shape (not array.…
…size)
01a9389
@endolith endolith ENH: Speed up fftconvolve by using 5-smooth length to pad FFT
instead of padding to the next power of 2.  FFTPACK can handle multiples and powers of 3 and 5 just as efficiently as powers of 2.
e86fe48
@coveralls

Coverage Status

Coverage remained the same when pulling e86fe48 on endolith:smooth_fft into a0a79a8 on scipy:master.

@pv
Member
pv commented Dec 12, 2013

Could you add test cases: (i) some for the _next_regular function, and (ii) some for fftconvolve with nasty sizes.

@endolith
Contributor

Ok. For _next_regular I have a really overkill test that tests every integer up to 51200000 and compares to http://oeis.org/A051037/b051037.txt and then a bunch of really big integers too, but I'll try to think of just a few special cases.

For fftconvolve, how to test it though? It should produce identical results as the original, just faster.

@argriffing
Contributor

For fftconvolve, how to test it though?
It should produce identical results as the original, just faster.

If it is a regression test, then could it be reasonable to keep the old implementation but just rename it and make it not public, and only use it for regression testing? This would not test the speedup though.

@endolith
Contributor

Well it's just a change in implementation, so the existing tests should cover it, right? They exist to show that any change in implementation doesn't break things that are known to work? test_random_data already compares the output offftconvolve with the direct convolve.

@argriffing
Contributor

Oh that makes sense. I don't know if this is set up for scipy, but I've heard that vbench can be helpful for keeping tabs on speeds of functions across versions of libraries, like this for numpy. There are also some bench*.py speed benchmarking modules analogous to test*.py unit testing modules in scipy, although the benchmarking ones are much less comprehensive and possibly are not supported by as much infrastructure as the unit test modules. I guess that some have already been written for fftpack here.

@endolith
Contributor

Could maybe do a test where it does a timeit measurement of different input sizes (power of 2, (power of 2)+1, nearby prime number) and makes sure the times are not enormously different? Then the test would be consistent across different hardware?

(power of 2)+1 is the biggest speed difference compared to the previous implementation, because it only pads up to the next regular number instead of almost doubling it to the next power of 2:

fftconvolvereg vs fftconvolve real compute11a 2 7 3 2013-12-09 20-51 cropped

@josef-pkt josef-pkt referenced this pull request in statsmodels/statsmodels Dec 12, 2013
Merged

REF: ensure O(N log N) when using fft for acf #1196

@pv
Member
pv commented Dec 12, 2013

I don't think testing for speed is very important here, just add a few checking for correctness.
Re: existing tests - were there any? I didnt look...

@endolith
Contributor

Yeah there are tests for fftconvolve at https://github.com/scipy/scipy/blob/master/scipy/signal/tests/test_signaltools.py#L155 but I'll see if they can be expanded

@endolith
Contributor

@cournape has some similar functions in https://github.com/cournape/numpy/blob/bluestein/numpy/fft/helper.py. Maybe next_regular should be a public function in numpy.fft.helper, too?

...but then this optimization would depend on a certain numpy version, so no...

@pv pv merged commit e86fe48 into scipy:master Feb 2, 2014

1 check passed

default The Travis CI build passed
Details
@endolith
Contributor
endolith commented Feb 2, 2014

Sorry, I meant to add some tests. It looks like the one you added just tests that the result is >= to the original number, and regular, but not that it's the next regular? This is what I was going to submit:

def test_next_regular():
    hams = {
        1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 8, 8: 8, 14: 15, 15: 15,
        16: 16, 17: 18, 1021: 1024, 1536: 1536, 51200000: 51200000, 
        510183360: 510183360, 510183360+1: 512000000, 511000000: 512000000,
        854296875: 854296875, 854296875+1: 859963392, 
        196608000000: 196608000000, 196608000000+1: 196830000000,
        8789062500000: 8789062500000, 8789062500000+1: 8796093022208,
        206391214080000: 206391214080000, 206391214080000+1: 206624260800000,
        470184984576000: 470184984576000, 470184984576000+1: 470715894135000,
        7222041363087360: 7222041363087360, 
        7222041363087360+1: 7230196133913600,

        # power of 5    5**23
        11920928955078125: 11920928955078125,
        11920928955078125-1: 11920928955078125,

        # power of 3    3**34
        16677181699666569: 16677181699666569,
        16677181699666569-1: 16677181699666569,

        # power of 2   2**54
        18014398509481984: 18014398509481984,
        18014398509481984-1: 18014398509481984,

        # above this, int(ceil(n)) == int(ceil(n+1))
        19200000000000000: 19200000000000000,
        19200000000000000+1: 19221679687500000,

        288230376151711744:   288230376151711744,
        288230376151711744+1: 288325195312500000,
        288325195312500000-1: 288325195312500000,
        288325195312500000:   288325195312500000,
        288325195312500000+1: 288555831593533440,

        # power of 3    3**83
        3990838394187339929534246675572349035227-1:
            3990838394187339929534246675572349035227,
        3990838394187339929534246675572349035227:
            3990838394187339929534246675572349035227,

        # power of 2     2**135
        43556142965880123323311949751266331066368-1:
            43556142965880123323311949751266331066368,
        43556142965880123323311949751266331066368:
            43556142965880123323311949751266331066368,

        # power of 5      5**57
        6938893903907228377647697925567626953125-1:
            6938893903907228377647697925567626953125,
        6938893903907228377647697925567626953125:
            6938893903907228377647697925567626953125,

        # http://www.drdobbs.com/228700538
        # 2**96 * 3**1 * 5**13
        290142196707511001929482240000000000000-1:
            290142196707511001929482240000000000000,
        290142196707511001929482240000000000000:
            290142196707511001929482240000000000000,
        290142196707511001929482240000000000000+1:
            290237644800000000000000000000000000000,

        # 2**36 * 3**69 * 5**7
        4479571262811807241115438439905203543080960000000-1:
            4479571262811807241115438439905203543080960000000,
        4479571262811807241115438439905203543080960000000:
            4479571262811807241115438439905203543080960000000,
        4479571262811807241115438439905203543080960000000+1:
            4480327901140333639941336854183943340032000000000,

        # 2**37 * 3**44 * 5**42
        30774090693237851027531250000000000000000000000000000000000000-1:
            30774090693237851027531250000000000000000000000000000000000000,
        30774090693237851027531250000000000000000000000000000000000000:
            30774090693237851027531250000000000000000000000000000000000000,
        30774090693237851027531250000000000000000000000000000000000000+1:
            30778180617309082445871527002041377406962596539492679680000000,
        }

    for x, y in hams.items():
        assert_equal(_next_regular(x), y)
@endolith endolith deleted the endolith:smooth_fft branch Feb 2, 2014
@pv
Member
pv commented Feb 2, 2014

Thanks, added in e31a4ae

@rmcgibbo rmcgibbo referenced this pull request in rmcgibbo/pyhmc Dec 24, 2014
Merged

there is a hidden function in scipy for this #5

@endolith
Contributor

That mailing list post said the optimal size is 7 smooth, but from what I could find in fftpack docs and in testing, it seemed to be 5 smooth sizes. Can anyone confirm?

@argriffing
Contributor

@endolith I'm not sure but I could imagine this could be a distinction among the underlying fft implementations. For example fftpack vs. fftw. I think the pdf mentioned on the mailing list referred to fftw, whereas scipy includes fftpack but not fftw for licensing reasons.

@endolith
Contributor

well FFTW's optimal size is clearly stated:

FFTW is best at handling sizes of the form 2a 3b 5c 7d 11e 13f, where e+f is either 0 or 1, and the other exponents are arbitrary.

FFTPACK I'm not 100% sure about. This commit says "Composite numbers of 2, 3, and 5 are accepted, as FFTPACK has those", but the actual FFTPACK documentation just says:

n the length of the array x to be transformed. the method is most efficient when n is a product of small primes.

So I'm not sure which small primes. When I tried testing multiples of 5 vs 7, it seemed that it was not optimized for 7, but I don't remember how I tested exactly.

@argriffing
Contributor

That mailing list post said the optimal size is 7 smooth

Does it actually say this?

@endolith
Contributor

I meant this post:

The performance of fftpack depends very strongly on the array size -- sizes
that are powers of two are good, but also powers of three, five and seven,
or numbers whose only prime factors are from (2,3,5,7).

@argriffing
Contributor

I don't think the post makes claims about {2, 3, 5, 7} vs. {2, 3, 5} or {2, 3, 5, 7, 11} with respect to fftpack, so I'd believe your tests!

@endolith
Contributor

@argriffing Fortran is gibberish to me, but I looked more carefully through https://github.com/scipy/scipy/blob/v0.14.0/scipy/fftpack/src/ and also found the java translation on netlib which actually has comments, and it looks like it's optimized for multiples of {4, 2, 3, 5}.

jfftpack comments:

passf2: Complex FFT's forward/backward processing of factor 2;
passf3: Complex FFT's forward/backward processing of factor 3;
passf4: Complex FFT's forward/backward processing of factor 4;
passf5: Complex FFT's forward/backward processing of factor 5;
passfg: Complex FFT's forward/backward processing of general factor;

in order of passf4 -> passf2 -> passf3 -> passf5 -> passfg

SciPy's fftpack likewise has PASSB4 -> PASSB2 -> PASSB3 -> PASSB5 -> PASSB

so I think that settles it. Gnu Scientific Library has a GPL version of fftpack which includes factors up to 7, but of course we can't use it. http://git.savannah.gnu.org/cgit/gsl.git/tree/fft

@kousu kousu added a commit to kousu/waveform-analyzer that referenced this pull request Mar 24, 2016
@kousu kousu Use scipy's built-in next-optimal-fft-length function instead of my r…
…ound2()

Depending on the BLAS you have there are myriad optimal sizes. [For example](scipy/scipy#3144 (comment))
the optimal length under one implementation is any of 2^a 3^b 5^c 7^d 11^e 13^f,
where e+f is either 0 or 1, and the other exponents are arbitrary. It's just a
matter of how the implementation chooses to carve up subproblems. Halves---making
powers of 2 fast---is traditional, but not the only choice.
b543468
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment