acf / pacf do not work on pandas objects #322

Closed
jseabold opened this Issue Jun 20, 2012 · 8 comments

Projects

None yet

4 participants

@jseabold
Member

Something I noticed writing some examples.

@jseabold
Member

Update. They work fine on pandas objects. It's only when you have missing data that they break. E.g.,

from statsmodels.datasets.macrodata import load_pandas
cpi = load_pandas().data["cpi"]
sm.tsa.pacf(cpi.diff())

Not really sure what to do here yet. Thoughts?

@jseabold
Member

Spoke too soon ACF doesn't work

dta = sm.datasets.sunspots.load_pandas().data
dta.index = pandas.Index(sm.tsa.datetools.dates_from_range('1700', '2008'))
del dta["YEAR"]
sm.tsa.acf(dta['SUNACTIVITY'])
#AssertionError: Index length did not match values
sm.tsa.acf(dta) # needs a .values.squeeze()
#ValueError: object too deep for desired array
@tshauck tshauck added a commit to tshauck/statsmodels that referenced this issue Sep 29, 2012
@tshauck tshauck Addresses #322: when passed a pandas Series acf will return a numpy a…
…rray
9e0dc9f
@tshauck tshauck added a commit to tshauck/statsmodels that referenced this issue Sep 29, 2012
@tshauck tshauck Actually address issue #322 with passing tests 624f01a
@jseabold jseabold added a commit that closed this issue Nov 13, 2012
@jseabold jseabold Merge branch 'fix-322'. Closes #322 and #486.
* fix-322:
  STY: Whitespace cleanup
  Explicity convert x to numpy array to allow pandas.Series to be passed
  Passing test for acovf with pandas Series, clean acovf import
  Added test to confirm it works with pandas Series
  Actually address issue #322 with passing tests
  Addresses #322: when passed a pandas Series acf will return a numpy array
a3141e5
@jseabold jseabold closed this in a3141e5 Nov 13, 2012
@jseabold
Member

There's actually another issue here. The object is too deep for desired array happens because DataFrames can't be 1d. So we need not only an asarray but a squeeze and probably a dims check for a better error message.

@jseabold jseabold reopened this Nov 13, 2012
@jseabold jseabold closed this in bcdb025 Nov 13, 2012
@josef-pkt josef-pkt reopened this May 30, 2013
@josef-pkt
Member

the asarray was only added to acovf. acf calls acovf if fft=False, but acf has it's own calculation if fft=True

suggested fix, add the same asarray and dim change to acf directly.

@tshauck
Contributor
tshauck commented May 31, 2013

Alight, I'll try to look into the issue this weekend.

On Thursday, May 30, 2013, Josef Perktold wrote:

the asarray was only added to acovf. acf calls acovf if fft=False, but
acf has it's own calculation if fft=True

suggested fix, add the same asarray and dim change to acf directly.


Reply to this email directly or view it on GitHubhttps://github.com/statsmodels/statsmodels/issues/322#issuecomment-18713845
.

Trent Hauck

@josef-pkt
Member

this issue never mentions and links to the pull request #486

@PierreBdR PierreBdR pushed a commit to PierreBdR/statsmodels that referenced this issue Sep 2, 2014
@tshauck @jseabold tshauck + jseabold Addresses #322: when passed a pandas Series acf will return a numpy a…
…rray
67aa6ec
@PierreBdR PierreBdR pushed a commit to PierreBdR/statsmodels that referenced this issue Sep 2, 2014
@tshauck @jseabold tshauck + jseabold Actually address issue #322 with passing tests 1a6d40c
@PierreBdR PierreBdR pushed a commit to PierreBdR/statsmodels that referenced this issue Sep 2, 2014
@jseabold jseabold Merge branch 'fix-322'. Closes #322 and #486.
* fix-322:
  STY: Whitespace cleanup
  Explicity convert x to numpy array to allow pandas.Series to be passed
  Passing test for acovf with pandas Series, clean acovf import
  Added test to confirm it works with pandas Series
  Actually address issue #322 with passing tests
  Addresses #322: when passed a pandas Series acf will return a numpy array
421e827
@PierreBdR PierreBdR pushed a commit to PierreBdR/statsmodels that referenced this issue Sep 2, 2014
@jseabold jseabold ENH: Check dims for acovf. Closes #322. 0cd366f
@bashtage
Contributor

@josef-pkt This appears to work in 0.5.0. Maybe ready to be closed, or it just missing a test?

@josef-pkt
Member

It's still broken

using the sunspot data from Skipper's example above

>>> dta
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 309 entries, 1700-12-31 00:00:00 to 2008-12-31 00:00:00
Data columns (total 1 columns):
SUNACTIVITY    309  non-null values
dtypes: float64(1)
>>> sm.tsa.acf(dta, fft=False)
array([ 1.        ,  0.82020129,  0.45126849,  0.03957655, -0.27579196,
       -0.42523943, -0.37659509, -0.15737391,  0.15820254,  0.47309753,
        0.65898002,  0.65029082,  0.45666254,  0.16179329, -0.12205105,
       -0.3161808 , -0.37471125, -0.30605753, -0.1348069 ,  0.09158727,
        0.2975632 ,  0.4207074 ,  0.41183954,  0.27020758,  0.04496208,
       -0.17428715, -0.33045026, -0.37287834, -0.28555061, -0.11794414,
        0.08293231,  0.24897507,  0.32752101,  0.28335919,  0.1375272 ,
       -0.05526386, -0.22973205, -0.31338879, -0.29355684, -0.17897285,
       -0.01769038])
>>> sm.tsa.acf(dta, fft=True)
array([[  1.00000000e+00,              nan,   1.00000000e+00, ...,
          1.00000000e+00,   1.00000000e+00,              nan],
       [  7.49831457e-01,              nan,  -1.00000000e+00, ...,
         -1.00000000e+00,  -1.00000000e+00,              nan],
       [  5.68819900e-01,              nan,   5.00000000e-01, ...,
          5.00000000e-01,   5.00000000e-01,              nan],
       ..., 
       [  1.21401450e+01,              nan,   0.00000000e+00, ...,
          0.00000000e+00,   0.00000000e+00,              nan],
       [  8.49950450e+00,              nan,   7.09642837e-17, ...,
         -4.37578782e-17,   7.09642837e-17,              nan],
       [  1.74907666e+00,              nan,  -8.87053546e-18, ...,
          5.46973478e-18,  -8.87053546e-18,              nan]])
>>> sm.tsa.acf(dta, fft=True).shape
(41, 625)

>>> sm.tsa.acf(dta.values.squeeze(), fft=True)
array([ 1.        ,  0.82020129,  0.45126849,  0.03957655, -0.27579196,
       -0.42523943, -0.37659509, -0.15737391,  0.15820254,  0.47309753,
        0.65898002,  0.65029082,  0.45666254,  0.16179329, -0.12205105,
       -0.3161808 , -0.37471125, -0.30605753, -0.1348069 ,  0.09158727,
        0.2975632 ,  0.4207074 ,  0.41183954,  0.27020758,  0.04496208,
       -0.17428715, -0.33045026, -0.37287834, -0.28555061, -0.11794414,
        0.08293231,  0.24897507,  0.32752101,  0.28335919,  0.1375272 ,
       -0.05526386, -0.22973205, -0.31338879, -0.29355684, -0.17897285,
       -0.01769038])
@jseabold jseabold added a commit to jseabold/statsmodels that referenced this issue Sep 26, 2014
@jseabold jseabold BUG: 2d 1 columns -> 1d. Closes #322. cd4b582
@jseabold jseabold closed this in #2012 Sep 26, 2014
@yarikoptic yarikoptic added a commit to yarikoptic/statsmodels that referenced this issue Oct 23, 2014
@yarikoptic yarikoptic Merge commit 'v0.5.0-1491-g850e0e4' into debian-experimental
* commit 'v0.5.0-1491-g850e0e4': (178 commits)
  DOC: Fix versions to match other docs.
  REF/ENH: Use clip pattern. Use it for resid_dev in Poisson.
  STY: Pep-8
  ENH: More numerically stable inv. nbinom.
  STY: Pep-8
  ENH: More numerically stable version of invlogit.
  TST: Test invlogit stability.
  BUG: Fix prediction for ARIMA d > 1. Closes #1562.
  TST: Test predict for ARIMA with d > 1
  TST: Test forecast with ARIMA d > 1.
  BUG: Fix ARIMA.forecast for d > 1.
  ENH: Cleanup unintegrate. Add unintegrate_levels
  STY: Cleanup imports
  ENH: Better error message on object dtype. Closes #880
  TST: Test dtype object error
  TST: Test DataFrame ACF with FFT.
  BUG: 2d 1 columns -> 1d. Closes #322.
  TST: Silence convergence warnings in tests.
  ENH: Do not warn on intermediate results convergence.
  TST: Silence test warnings.
  ...
7e3fe95
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment