Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Address issue #322 that acovf doesn't work w/ pandas Series #486

Closed
wants to merge 5 commits into
from

Conversation

Projects
None yet
3 participants
Contributor

tshauck commented Sep 29, 2012

A workaround for the issue that when you pass a series to np.correlate it returns half of the series leaving the other half as %ss.

Uses the fact that np.correlate returns a symmetric array. So reversing it doesn't effect the situation when a numpy array is passed, but it properly aligns the series for the index call coming later.

Owner

josef-pkt commented Sep 29, 2012

do you have a test case with a pandas series that we can add?
so we don't break it again the next time this is refactored.

Contributor

tshauck commented Sep 30, 2012

I've added a simple test that tests equality between a Series and it's numpy representation. Obviously the test file now requires pandas.Series and acovf.

Owner

josef-pkt commented Sep 30, 2012

using pandas in tests is fine since it's a requirement of statsmodels.

However, after looking at this for a while, I think, the correct solution might be np.asarray at the beginning.

the index doesn't make any sense for the autocorrelation function

>>> idx = list('abcdefghij')
>>> idx
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
>>> s2 = pa.Series(np.arange(1, 11), index=idx)
>>> s2
a     1
b     2
c     3
d     4
e     5
f     6
g     7
h     8
i     9
j    10
>>> np.correlate(s2,s2, 'full')
a     10
b     29
c     56
d     90
e    130
f    175
g    224
h    276
i    330
j    385
>>> np.correlate(np.asarray(s2),s2, 'full')
array([ 10,  29,  56,  90, 130, 175, 224, 276, 330, 385, 330, 276, 224,
       175, 130,  90,  56,  29,  10])

more generally, since np.correlate with a pandas.Series doesn't do the same thing as on an ndarray, this will be a fragile strategy. how does pandas truncate?

with np.asarray we would just return an np.array with the autocorrelations.

see also pydata/pandas#1991

a more systematic check on which functions work and which don't work with a pandas datastructure would be very useful.

Contributor

tshauck commented Sep 30, 2012

I think you're right with regards to calling np.asarray at the beginning - [::-1] is doing that conversion implicitly and reversing, where really conversion is all that's needed, and it'd be better to do it explicitly.

@jseabold jseabold added a commit that referenced this pull request Nov 13, 2012

@jseabold jseabold Merge branch 'fix-322'. Closes #322 and #486.
* fix-322:
  STY: Whitespace cleanup
  Explicity convert x to numpy array to allow pandas.Series to be passed
  Passing test for acovf with pandas Series, clean acovf import
  Added test to confirm it works with pandas Series
  Actually address issue #322 with passing tests
  Addresses #322: when passed a pandas Series acf will return a numpy array
a3141e5
Owner

jseabold commented Nov 13, 2012

Thanks. Merged.

@jseabold jseabold closed this Nov 13, 2012

@PierreBdR PierreBdR pushed a commit to PierreBdR/statsmodels that referenced this pull request Sep 2, 2014

@jseabold jseabold Merge branch 'fix-322'. Closes #322 and #486.
* fix-322:
  STY: Whitespace cleanup
  Explicity convert x to numpy array to allow pandas.Series to be passed
  Passing test for acovf with pandas Series, clean acovf import
  Added test to confirm it works with pandas Series
  Actually address issue #322 with passing tests
  Addresses #322: when passed a pandas Series acf will return a numpy array
421e827
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment