Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF: ensure O(N log N) when using fft for acf #1196

Merged
merged 2 commits into from Apr 3, 2014

Conversation

Projects
None yet
6 participants
@kain88-de
Copy link
Contributor

commented Nov 20, 2013

The numpy implementation will fall back to a O(N**2) version of the fft
in cases where it cannot factorize the array length, which can be a
increase in runtime of several order of magnitude. If we always ensure
to fill up the array with zeros that it's total length is a power of 2
the runtime will be guaranteed to be O(N log N).

@coveralls

This comment has been minimized.

Copy link

commented Nov 20, 2013

Coverage Status

Coverage remained the same when pulling 6d79897 on kain88-de:fast_fft_acf into 9d4b1f8 on statsmodels:master.

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Dec 12, 2013

Thank you, Sorry about the slow reply. I didn't have time yet to think about this.
We also use fft in a similar way in other places, so we might want to have a general solution for this.

In scipy there is currently a PR to make a more fine grained padding that doesn't increase memory as much.
scipy/scipy#3144

maybe we should also switch to using scipy's fft if it is faster.

@argriffing

This comment has been minimized.

Copy link

commented Dec 12, 2013

In scipy there is currently a PR to make a more fine grained padding that doesn't increase memory as much.

For what it's worth, that PR is for fftconvolve and I don't think it would affect fft until bluestein's algorithm is implemented for scipy.

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Dec 12, 2013

I didn't look at the details but our code should have the same behavior as fftconvolve.
We are doing essentially the same as an fftconvolve.

@argriffing

This comment has been minimized.

Copy link

commented Dec 12, 2013

OK. Just to be clear I don't think you will get the benefit of the new scipy PR unless you use the scipy fftconvolve function, whereas I think the statsmodels acf uses fft which will not be improved by the new scipy PR. I'm only pointing this out because I made a comment on endolith's PR that confused fft vs. fftconvolve and he corrected me. I'm not positive this applies to your statsmodels case but I just thought I would mention it.

@endolith

This comment has been minimized.

Copy link

commented Dec 13, 2013

This is the one for the general fft case: scipy/scipy#1476 but I don't think that applies here, either. This looks like an autocorrelation function implemented using the FFT? So it's basically the same algorithm as scipy.fftconvolve (autocorrelation = convolution of the input with itself reversed), and could use the same speedups:

  1. is to detect real inputs and use numpy.rfft instead of numpy.fft for a 2x speedup for real input
  2. is to zero-pad to the next 5-smooth number, which numpy's fftpack is optimized for, so it will be maybe 5x faster in some cases than rounding up to a more distant power of 2, which is what I'm doing in scipy/scipy#3144
@josef-pkt

This comment has been minimized.

Copy link
Member

commented Dec 13, 2013

Yes we have autocorrelation, autocovariance, and cross-covariance implemented either using np.correlate or fft.
(plus a periodogram that is missing smoothing with a kernel window.)

@endolith did you really mean numpy's fftpack or scipy's?
Is there much difference between numpy's and scipy's fft? Skipper and I raised the issue recently but we never looked into this.

Here it's just the 3 or 4 functions, but there is more fft code including an adjusted copy of scipy's fftconvolve in the sandbox, but I haven't looked at this in several years.

If we improve this, then I would essentially copy the new scipy function, next_regular, because we stay compatible with older scipy versions.
Changing the shape, we still have to check whether our normalization/bias correction is correct, #1188

Also I don't remember whether some of these functions were supposed to be vectorized and calculate the autocorrelation for each column or row in 2d.

@endolith

This comment has been minimized.

Copy link

commented Dec 13, 2013

did you really mean numpy's fftpack or scipy's?

I think they both use the same fortran fftpack library, but they have different wrappers for it. scipy.fftpack.fft is a little better than numpy.fft.fft because it automatically uses rfft in some cases and allows you to overwrite the input array with the output array: http://docs.scipy.org/doc/scipy-dev/reference/generated/scipy.fftpack.fft.html otherwise I think they're identical.

scipy's rfft is weirdly "unpacked" though. numpy.rfft is easier to work with: scipy/scipy#2487

@kain88-de

This comment has been minimized.

Copy link
Contributor Author

commented Dec 15, 2013

using next_regular instead of my fix seems good to me. I didn't check if there would be any unit tests will you merge #1188 soon? That would make it easy to see if I break something

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Dec 15, 2013

#1188 is one of my next PR's to be merged, within a day.
The next_regular function could go into statsmodels.compatnp so we can easily drop it when our minimum scipy version has it.

@endolith

This comment has been minimized.

Copy link

commented Dec 16, 2013

I'll add a test for _next_regular soon in scipy/scipy#3144. Probably should wait until it gets accepted before you use it in statsmodels; I might have missed something.

@kain88-de

This comment has been minimized.

Copy link
Contributor Author

commented Dec 17, 2013

Yeah I will do that. But I would have added tests anyway.

kain88-de added some commits Nov 20, 2013

REF: ensure O(N log N) when using fft for acf
The numpy implementation will fall back to a O(N**2) version of the fft
in cases where it cannot factorize the array length, which can be a
increase in runtime of several order of magnitude. This uses the
_next_regular function from scipy to ensure that we always use array
lengths for which fftpack is optimized.
TST: add test for _next_regular
also copy the test for _next_regular from scipy
@kain88-de

This comment has been minimized.

Copy link
Contributor Author

commented Feb 9, 2014

I replaced my solution with the _next_regular function from scipy and also added their unit-test.
Is there anything else I should fix before this can be merged?

@josef-pkt josef-pkt added the PR label Feb 19, 2014

@jseabold

This comment has been minimized.

Copy link
Member

commented Apr 3, 2014

This looks good to me. Will merge shortly.

jseabold added a commit that referenced this pull request Apr 3, 2014

Merge pull request #1196 from kain88-de/fast_fft_acf
ENH: ensure O(N log N) when using fft for acf

@jseabold jseabold merged commit c70a709 into statsmodels:master Apr 3, 2014

1 check passed

default The Travis CI build passed
Details

@kain88-de kain88-de deleted the kain88-de:fast_fft_acf branch Apr 3, 2014

@endolith

This comment has been minimized.

Copy link

commented Apr 3, 2014

The comment "ensure that we always use a power of 2" is no longer correct, since it will also use powers of 3, etc. :)

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Apr 3, 2014

this causes a test failure on python 2.6 travis version in #1325 test run

@jseabold

This comment has been minimized.

Copy link
Member

commented Apr 3, 2014

Yeah, I submitted a patch. I'll also correct the docstring.

PierreBdR pushed a commit to PierreBdR/statsmodels that referenced this pull request Sep 2, 2014

Merge pull request statsmodels#1196 from kain88-de/fast_fft_acf
ENH: ensure O(N log N) when using fft for acf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.