Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rolling_max on a TimeSeries with freq='D' returns incorrect results #6297

Closed
sleibman opened this issue Feb 7, 2014 · 2 comments · Fixed by #6845
Closed

rolling_max on a TimeSeries with freq='D' returns incorrect results #6297

sleibman opened this issue Feb 7, 2014 · 2 comments · Fixed by #6845
Labels
API Design Bug Datetime Datetime data dtype Frequency DateOffsets Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@sleibman
Copy link

sleibman commented Feb 7, 2014

rolling_max on a TimeSeries with freq='D' appears to actually compute the rolling mean, and not the rolling max.

In [118]: import pandas

In [119]: indices = [datetime.datetime(1975, 1, i, 12, 0) for i in range(1, 6)]

In [120]: indices.append(datetime.datetime(1975, 1, 3, 6, 0))  # So that we can have 2 datapoints on one of the days

In [121]: series = pandas.Series(range(1, 7), index=indices)

In [122]: series = series.map(lambda x: float(x))  # Use floats instead of ints as values

In [123]: series = series.sort_index()  # Sort chronologically

In [124]: expected_result = pandas.Series([1.0, 2.0, 6.0, 4.0, 5.0], index=[datetime.datetime(1975, 1, i, 12, 0) for i in range(1, 6)])

In [125]: actual_result = pandas.rolling_max(series, window=1, freq='D')

In [126]: assert((actual_result==expected_result).all())
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-126-cc436c4798a7> in <module>()
----> 1 assert((actual_result==expected_result).all())

AssertionError: 

In [127]: expected_result
Out[127]: 
1975-01-01 12:00:00    1
1975-01-02 12:00:00    2
1975-01-03 12:00:00    6
1975-01-04 12:00:00    4
1975-01-05 12:00:00    5
dtype: float64

In [128]: actual_result
Out[128]: 
1975-01-01    1.0
1975-01-02    2.0
1975-01-03    4.5
1975-01-04    4.0
1975-01-05    5.0
Freq: D, dtype: float64

With a window of size 2 days, it still looks like the rolling mean:

In [130]: pandas.rolling_max(series, window=2, freq='D')
Out[130]: 
1975-01-01    NaN
1975-01-02    2.0
1975-01-03    4.5
1975-01-04    4.5
1975-01-05    5.0
Freq: D, dtype: float64
@jreback
Copy link
Contributor

jreback commented Feb 7, 2014

I would call this a bug, but somewhat subtle, and one could argue that this is an API issue actually.

Since you specified a frequency, the data needs to be put into that frequency by resampling. The default resample is 'mean'. One could argue that that is correct here, but its not a good default.

So for the time being, you can simply resample yourself before passing to rolling_max

I could see an optional argument being passed to the rolling_* functions that specifies how to handle upsampling (so would default to max for rolling_max and so on).

Makes sense?

This would be a nice straightforward PR actually, interested?

In [14]: series.resample('D')
Out[14]: 
1975-01-01    1.0
1975-01-02    2.0
1975-01-03    4.5
1975-01-04    4.0
1975-01-05    5.0
Freq: D, dtype: float64

In [15]: series.resample('D',how='max')
Out[15]: 
1975-01-01    1
1975-01-02    2
1975-01-03    6
1975-01-04    4
1975-01-05    5
Freq: D, dtype: float64

In [16]: pandas.rolling_max(series.resample('D',how='max'),window=1,freq='D')
Out[16]: 
1975-01-01    1
1975-01-02    2
1975-01-03    6
1975-01-04    4
1975-01-05    5
Freq: D, dtype: float64

@jreback jreback added this to the 0.14.0 milestone Feb 8, 2014
@jreback
Copy link
Contributor

jreback commented Feb 15, 2014

@sleibman interested in doing a PR for this?

jsexauer added a commit to jsexauer/pandas that referenced this issue Apr 8, 2014
jsexauer added a commit to jsexauer/pandas that referenced this issue Apr 22, 2014
jreback added a commit that referenced this issue Apr 22, 2014
BUG/ENH: Add how kwarg to rolling_* functions [fix #6297]
jeffreystarr pushed a commit to jeffreystarr/pandas that referenced this issue Apr 28, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Bug Datetime Datetime data dtype Frequency DateOffsets Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants