New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: asfreq / pct_change strange behavior #7292

Closed
shura-v opened this Issue May 31, 2014 · 8 comments

Comments

Projects
None yet
5 participants
@shura-v

shura-v commented May 31, 2014

In the first case, is it a bug (those NaNs at the end) or a feature? I just don't get the reason behind this behavior:

In[5]: s = Series(range(10), date_range('2014', periods=10, freq='H'))
In[6]: s
Out[6]: 
2014-05-31 00:00:00    0
2014-05-31 01:00:00    1
2014-05-31 02:00:00    2
2014-05-31 03:00:00    3
2014-05-31 04:00:00    4
2014-05-31 05:00:00    5
2014-05-31 06:00:00    6
2014-05-31 07:00:00    7
2014-05-31 08:00:00    8
2014-05-31 09:00:00    9
Freq: H, dtype: int64
In[7]: s.pct_change(periods=1, freq='5H')
Out[7]: 
2014-05-31 00:00:00         NaN
2014-05-31 01:00:00         NaN
2014-05-31 02:00:00         NaN
2014-05-31 03:00:00         NaN
2014-05-31 04:00:00         NaN
2014-05-31 05:00:00         inf
2014-05-31 06:00:00    5.000000
2014-05-31 07:00:00    2.500000
2014-05-31 08:00:00    1.666667
2014-05-31 09:00:00    1.250000
2014-05-31 10:00:00         NaN
2014-05-31 11:00:00         NaN
2014-05-31 12:00:00         NaN
2014-05-31 13:00:00         NaN
2014-05-31 14:00:00         NaN
dtype: float64

but this seems ok:

In[8]: s.pct_change(periods=5)
Out[8]: 
2014-05-31 00:00:00         NaN
2014-05-31 01:00:00         NaN
2014-05-31 02:00:00         NaN
2014-05-31 03:00:00         NaN
2014-05-31 04:00:00         NaN
2014-05-31 05:00:00         inf
2014-05-31 06:00:00    5.000000
2014-05-31 07:00:00    2.500000
2014-05-31 08:00:00    1.666667
2014-05-31 09:00:00    1.250000
Freq: H, dtype: float64

@hayd hayd added the Frequency label May 31, 2014

@jreback

This comment has been minimized.

Contributor

jreback commented Jun 1, 2014

So pct_change is just s divided by its 5H shift (slightly more complicated as it handles various fill methods). So this, while I agree looks a bit odd, seems correct. That said I could also see that it should reindex to the original series

In [6]: s
Out[6]: 
2014-06-01 00:00:00    0
2014-06-01 01:00:00    1
2014-06-01 02:00:00    2
2014-06-01 03:00:00    3
2014-06-01 04:00:00    4
2014-06-01 05:00:00    5
2014-06-01 06:00:00    6
2014-06-01 07:00:00    7
2014-06-01 08:00:00    8
2014-06-01 09:00:00    9
Freq: H, dtype: int64

In [7]: s.shift(freq='5H')
Out[7]: 
2014-06-01 05:00:00    0
2014-06-01 06:00:00    1
2014-06-01 07:00:00    2
2014-06-01 08:00:00    3
2014-06-01 09:00:00    4
2014-06-01 10:00:00    5
2014-06-01 11:00:00    6
2014-06-01 12:00:00    7
2014-06-01 13:00:00    8
2014-06-01 14:00:00    9
dtype: int64

Proposed

In [10]: s.div(s.shift(freq='5H')).reindex_like(s)
Out[10]: 
2014-06-01 00:00:00         NaN
2014-06-01 01:00:00         NaN
2014-06-01 02:00:00         NaN
2014-06-01 03:00:00         NaN
2014-06-01 04:00:00         NaN
2014-06-01 05:00:00         inf
2014-06-01 06:00:00    6.000000
2014-06-01 07:00:00    3.500000
2014-06-01 08:00:00    2.666667
2014-06-01 09:00:00    2.250000
Freq: H, dtype: float64
@jreback

This comment has been minimized.

Contributor

jreback commented Jun 1, 2014

want to do a pr to fix this?

@jreback jreback added the Bug label Jun 1, 2014

@jreback jreback added this to the 0.14.1 milestone Jun 1, 2014

@jreback

This comment has been minimized.

Contributor

jreback commented Jun 10, 2014

@shura-v how's this coming?

@jreback

This comment has been minimized.

Contributor

jreback commented Jun 22, 2014

@jreback jreback modified the milestones: 0.15.0, 0.14.1 Jun 22, 2014

@jreback jreback changed the title from pct_change strange behavior to BUG: asfreq / pct_change strange behavior Jun 26, 2014

@jreback jreback modified the milestones: 0.15.0, 0.15.1 Jul 6, 2014

@shura-v

This comment has been minimized.

shura-v commented Jul 15, 2014

I made pull request on June 2nd:
Here is my commit: https://github.com/shura-v/pandas/commit/eb250b6cead505bad3cf67e838f975b6148cdeae

@jreback

This comment has been minimized.

Contributor

jreback commented Jul 15, 2014

needs some tests

@jreback jreback modified the milestones: 0.15.1, 0.15.0 Sep 9, 2014

@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015

@minggli

This comment has been minimized.

Contributor

minggli commented Jan 25, 2018

Hi, will attempt to work on this issue. reverting.

@minggli

This comment has been minimized.

Contributor

minggli commented Jan 26, 2018

@jreback

since pandas.core.generic.NDFrame.pct_change essentially calls s.shift method to work out pct change, I've looked into how different outputs were produced with difference choices of params.

s.shift(periods=5, freq=None) uses the underlying _data block manager to shift values without touching the index.

s.shift(freq='5H') calls s.tshift and shift index, hence when calculating pct_change (unshifted divided by shifted frame), the resulting frame will have longer index when shifting with frequencies than when shifting with periods, because the latter scenario, index remains unchanged.

So that explains the difference of observations as per earlier discussion. raising PR. reverting.

@jreback jreback modified the milestones: Next Major Release, 0.23.0 Jan 27, 2018

jreback added a commit that referenced this issue Jan 31, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment