Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rolling_mean with freq='D' returns all NaNs when there is exactly 1 data point per day #5955

Closed
sleibman opened this issue Jan 15, 2014 · 2 comments

Comments

Projects
None yet
3 participants
@sleibman
Copy link

commented Jan 15, 2014

related to #3020

$ python
Python 2.7.4 (default, Apr 23 2013, 12:22:04) 
[GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import datetime
>>> import pandas as pd
>>> pd.__version__
'0.12.0'
>>> indices = [datetime.datetime(1975, 1, i, 12, 0) for i in range(1, 6)]
>>> series = pd.Series(range(1, 6), index=indices)
>>> series = series.map(lambda x: float(x))  # range() returns ints, so force to float
>>> series = series.sort_index()  # already sorted, but just to be clear
>>> series  # here's what our input series looks like
1975-01-01 12:00:00    1
1975-01-02 12:00:00    2
1975-01-03 12:00:00    3
1975-01-04 12:00:00    4
1975-01-05 12:00:00    5
dtype: float64
>>> pd.rolling_mean(series, window=2, freq='D')  # these results will be wrong
1975-01-01   NaN
1975-01-02   NaN
1975-01-03   NaN
1975-01-04   NaN
1975-01-05   NaN
Freq: D, dtype: float64
>>> better_series = series.append(pd.Series([3.0], index=[datetime.datetime(1975, 1, 3, 6, 0)]))
>>> better_series = better_series.sort_index()
>>> better_series  # here's a revised input with more than one datapoint on one of the days
1975-01-01 12:00:00    1
1975-01-02 12:00:00    2
1975-01-03 06:00:00    3
1975-01-03 12:00:00    3
1975-01-04 12:00:00    4
1975-01-05 12:00:00    5
dtype: float64
>>> pd.rolling_mean(better_series, window=2, freq='D')  # These results will be correct and are what I expected above
1975-01-01    NaN
1975-01-02    1.5
1975-01-03    2.5
1975-01-04    3.5
1975-01-05    4.5
Freq: D, dtype: float64
@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Jan 15, 2014

This is an issue with resample itself (the freq keyword triggers a resample). With master:

In [22]: series.resample('D')
Out[22]:
1975-01-01   NaN
1975-01-02   NaN
1975-01-03   NaN
1975-01-04   NaN
1975-01-05   NaN
Freq: D, dtype: float64
@jreback

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2014

this was doing upsampling on the series, when its effectively a pass thru (as the freq of the series is the same as the resample freq). Worked fine if how was specified or if their was at least 1 value to resample (e.g. you had at least 2 values on 1 particular date). This case tried to upsample (which essentially reindexes)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.