REF: ensure O(N log N) when using fft for acf #1196

Merged
merged 2 commits into from Apr 3, 2014

Projects

None yet

6 participants

@kain88-de
Contributor

The numpy implementation will fall back to a O(N**2) version of the fft
in cases where it cannot factorize the array length, which can be a
increase in runtime of several order of magnitude. If we always ensure
to fill up the array with zeros that it's total length is a power of 2
the runtime will be guaranteed to be O(N log N).

@coveralls

Coverage Status

Coverage remained the same when pulling 6d79897 on kain88-de:fast_fft_acf into 9d4b1f8 on statsmodels:master.

@josef-pkt
Member

Thank you, Sorry about the slow reply. I didn't have time yet to think about this.
We also use fft in a similar way in other places, so we might want to have a general solution for this.

In scipy there is currently a PR to make a more fine grained padding that doesn't increase memory as much.
scipy/scipy#3144

maybe we should also switch to using scipy's fft if it is faster.

@argriffing

In scipy there is currently a PR to make a more fine grained padding that doesn't increase memory as much.

For what it's worth, that PR is for fftconvolve and I don't think it would affect fft until bluestein's algorithm is implemented for scipy.

@josef-pkt
Member

I didn't look at the details but our code should have the same behavior as fftconvolve.
We are doing essentially the same as an fftconvolve.

@argriffing

OK. Just to be clear I don't think you will get the benefit of the new scipy PR unless you use the scipy fftconvolve function, whereas I think the statsmodels acf uses fft which will not be improved by the new scipy PR. I'm only pointing this out because I made a comment on endolith's PR that confused fft vs. fftconvolve and he corrected me. I'm not positive this applies to your statsmodels case but I just thought I would mention it.

@endolith

This is the one for the general fft case: scipy/scipy#1476 but I don't think that applies here, either. This looks like an autocorrelation function implemented using the FFT? So it's basically the same algorithm as scipy.fftconvolve (autocorrelation = convolution of the input with itself reversed), and could use the same speedups:

  1. is to detect real inputs and use numpy.rfft instead of numpy.fft for a 2x speedup for real input
  2. is to zero-pad to the next 5-smooth number, which numpy's fftpack is optimized for, so it will be maybe 5x faster in some cases than rounding up to a more distant power of 2, which is what I'm doing in scipy/scipy#3144
@josef-pkt
Member

Yes we have autocorrelation, autocovariance, and cross-covariance implemented either using np.correlate or fft.
(plus a periodogram that is missing smoothing with a kernel window.)

@endolith did you really mean numpy's fftpack or scipy's?
Is there much difference between numpy's and scipy's fft? Skipper and I raised the issue recently but we never looked into this.

Here it's just the 3 or 4 functions, but there is more fft code including an adjusted copy of scipy's fftconvolve in the sandbox, but I haven't looked at this in several years.

If we improve this, then I would essentially copy the new scipy function, next_regular, because we stay compatible with older scipy versions.
Changing the shape, we still have to check whether our normalization/bias correction is correct, #1188

Also I don't remember whether some of these functions were supposed to be vectorized and calculate the autocorrelation for each column or row in 2d.

@endolith

did you really mean numpy's fftpack or scipy's?

I think they both use the same fortran fftpack library, but they have different wrappers for it. scipy.fftpack.fft is a little better than numpy.fft.fft because it automatically uses rfft in some cases and allows you to overwrite the input array with the output array: http://docs.scipy.org/doc/scipy-dev/reference/generated/scipy.fftpack.fft.html otherwise I think they're identical.

scipy's rfft is weirdly "unpacked" though. numpy.rfft is easier to work with: scipy/scipy#2487

@kain88-de
Contributor

using next_regular instead of my fix seems good to me. I didn't check if there would be any unit tests will you merge #1188 soon? That would make it easy to see if I break something

@josef-pkt
Member

#1188 is one of my next PR's to be merged, within a day.
The next_regular function could go into statsmodels.compatnp so we can easily drop it when our minimum scipy version has it.

@endolith

I'll add a test for _next_regular soon in scipy/scipy#3144. Probably should wait until it gets accepted before you use it in statsmodels; I might have missed something.

@kain88-de
Contributor

Yeah I will do that. But I would have added tests anyway.

kain88-de added some commits Nov 20, 2013
@kain88-de kain88-de REF: ensure O(N log N) when using fft for acf
The numpy implementation will fall back to a O(N**2) version of the fft
in cases where it cannot factorize the array length, which can be a
increase in runtime of several order of magnitude. This uses the
_next_regular function from scipy to ensure that we always use array
lengths for which fftpack is optimized.
b18aa96
@kain88-de kain88-de TST: add test for _next_regular
also copy the test for _next_regular from scipy
d1a4103
@kain88-de
Contributor

I replaced my solution with the _next_regular function from scipy and also added their unit-test.
Is there anything else I should fix before this can be merged?

@josef-pkt josef-pkt added the PR label Feb 19, 2014
@jseabold
Member
jseabold commented Apr 3, 2014

This looks good to me. Will merge shortly.

@jseabold jseabold merged commit c70a709 into statsmodels:master Apr 3, 2014

1 check passed

default The Travis CI build passed
Details
@kain88-de kain88-de deleted the kain88-de:fast_fft_acf branch Apr 3, 2014
@endolith
endolith commented Apr 3, 2014

The comment "ensure that we always use a power of 2" is no longer correct, since it will also use powers of 3, etc. :)

@josef-pkt
Member

this causes a test failure on python 2.6 travis version in #1325 test run

@jseabold
Member
jseabold commented Apr 3, 2014

Yeah, I submitted a patch. I'll also correct the docstring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment