New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: rolling with datetime range #13327

Closed
randomgambit opened this Issue May 30, 2016 · 9 comments

Comments

Projects
None yet
4 participants
@randomgambit

randomgambit commented May 30, 2016

Hello everyone,

I would like to submit one very useful addition to rolling, namely the possibility to compute any statistics over a specific time range.

Indeed, my understanding is that rolling(windows=5).mean() computes, say, the mean over the last five observations.

Instead, it would be very useful to specify something like `rolling(windows=5,type_windows='time_range').mean() to get the rolling mean over the last 5 days.

So if your data starts on January 1 and then the next data point is on Feb 2nd, then the rolling mean for the Feb 2nb point is NA because there was no data on Jan 29, 30, 31, Feb 1, Feb 2.

I believe this would be very useful in settings where data represents trading data, so most of the time the data points are not equidistant in time. Still, you want to compute rolling metrics that are specified over the same delta.

What do you think?

Thanks!

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback May 30, 2016

Contributor

you can simply resample first to get your desired results (specifying freq to .rolling does this now as well)

Contributor

jreback commented May 30, 2016

you can simply resample first to get your desired results (specifying freq to .rolling does this now as well)

@randomgambit

This comment has been minimized.

Show comment
Hide comment
@randomgambit

randomgambit May 30, 2016

I see, thanks.

But do you know what's wrong here, then?

df=pd.DataFrame({'time': ['2015/01/01', '2015/02/01', '2016/02/02'],
                 'myvar' : [2,2,2],
                 'group' : ['jeff', 'olaf', 'jeff']})

df['time']=pd.to_datetime(df.time)    
df.set_index('time',inplace=True)

df.groupby('group').apply(lambda x: x.rolling(window=2,freq='D').count())

ValueError: could not convert string to float: jeff.
I am using Pandas 18.0

randomgambit commented May 30, 2016

I see, thanks.

But do you know what's wrong here, then?

df=pd.DataFrame({'time': ['2015/01/01', '2015/02/01', '2016/02/02'],
                 'myvar' : [2,2,2],
                 'group' : ['jeff', 'olaf', 'jeff']})

df['time']=pd.to_datetime(df.time)    
df.set_index('time',inplace=True)

df.groupby('group').apply(lambda x: x.rolling(window=2,freq='D').count())

ValueError: could not convert string to float: jeff.
I am using Pandas 18.0

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback May 30, 2016

Contributor

You need to ATM specify the column to work on as the non-numeric columns don't work very well (e.g. the grouper in this case).

# < 0.18.1
In [14]: df.groupby('group').myvar.apply(lambda x: x.rolling(2,freq='D').count())

# 0.18.1
In [15]: df.groupby('group').myvar.rolling(2,freq='D').count())
Out[15]: 
group  time      
jeff   2015-01-01    1.0
       2015-01-02    1.0
       2015-01-03    0.0
       2015-01-04    0.0
       2015-01-05    0.0
                    ... 
       2016-01-30    0.0
       2016-01-31    0.0
       2016-02-01    0.0
       2016-02-02    1.0
olaf   2015-02-01    1.0
Name: myvar, dtype: float64

so this is a dupe of #12537

Contributor

jreback commented May 30, 2016

You need to ATM specify the column to work on as the non-numeric columns don't work very well (e.g. the grouper in this case).

# < 0.18.1
In [14]: df.groupby('group').myvar.apply(lambda x: x.rolling(2,freq='D').count())

# 0.18.1
In [15]: df.groupby('group').myvar.rolling(2,freq='D').count())
Out[15]: 
group  time      
jeff   2015-01-01    1.0
       2015-01-02    1.0
       2015-01-03    0.0
       2015-01-04    0.0
       2015-01-05    0.0
                    ... 
       2016-01-30    0.0
       2016-01-31    0.0
       2016-02-01    0.0
       2016-02-02    1.0
olaf   2015-02-01    1.0
Name: myvar, dtype: float64

so this is a dupe of #12537

@randomgambit

This comment has been minimized.

Show comment
Hide comment
@randomgambit

randomgambit May 30, 2016

thanks jeff. Thats interesting and I would say its a rather sublte bug because I would not think of the grouper as a regular data column that gets processed by what follows groupby (count() in this case).

randomgambit commented May 30, 2016

thanks jeff. Thats interesting and I would say its a rather sublte bug because I would not think of the grouper as a regular data column that gets processed by what follows groupby (count() in this case).

@chrisaycock

This comment has been minimized.

Show comment
Hide comment
@chrisaycock

chrisaycock Jun 10, 2016

Contributor

It's probably a whole different issue, but would it ever be possible to specify the column for .rolling()? That is, instead of using the DataFrame's index, let the user explicitly list a column:

df.rolling('5s', col='time')

Definitely don't let my request hold-up this code change. We can worry about that later; getting windows by timestamp is far more important at this stage.

Contributor

chrisaycock commented Jun 10, 2016

It's probably a whole different issue, but would it ever be possible to specify the column for .rolling()? That is, instead of using the DataFrame's index, let the user explicitly list a column:

df.rolling('5s', col='time')

Definitely don't let my request hold-up this code change. We can worry about that later; getting windows by timestamp is far more important at this stage.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Jun 10, 2016

Contributor

that's actually very easy

Contributor

jreback commented Jun 10, 2016

that's actually very easy

@chrisaycock

This comment has been minimized.

Show comment
Hide comment
@chrisaycock

chrisaycock Jun 10, 2016

Contributor

Oh, if you're up for it, then can we add that to this feature request? It would be really helpful.

Contributor

chrisaycock commented Jun 10, 2016

Oh, if you're up for it, then can we add that to this feature request? It would be really helpful.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Jun 10, 2016

Contributor

yes will put it on the list -

Contributor

jreback commented Jun 10, 2016

yes will put it on the list -

@BenjaminHabert

This comment has been minimized.

Show comment
Hide comment
@BenjaminHabert

BenjaminHabert Jun 18, 2016

Very excited about this new feature! Thanks !

BenjaminHabert commented Jun 18, 2016

Very excited about this new feature! Thanks !

jreback added a commit to jreback/pandas that referenced this issue Jun 25, 2016

ENH: add time-window capability to .rolling
xref #13327
CLN: pep for cython, xref #12995

jreback added a commit to jreback/pandas that referenced this issue Jul 1, 2016

jreback added a commit to jreback/pandas that referenced this issue Jul 14, 2016

ENH: add time-window capability to .rolling
xref #13327
CLN: pep for cython, xref #12995

jreback added a commit that referenced this issue Jul 20, 2016

ENH: add time-window capability to .rolling
xref #13327
closes #936

Author: Jeff Reback <jeff@reback.net>

Closes #13513 from jreback/rolling and squashes the following commits:

d8f3d73 [Jeff Reback] ENH: add time-window capability to .rolling
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment