ENH: Preserve Index in Pandas Groupby Rolling on Datetime #35551

nrcjea001 · 2020-08-04T17:37:24Z

Is your feature request related to a problem?

A problem arises when rolling on datetime column where dates are the same. The Groupby Rolling function does not preserve the original index and so when dates are the same within the Group, it is impossible to know which index value it pertains to from the original dataframe.

Describe the solution you'd like

Ideally we wish to preserve the original index so that it can be merged back onto the original dataframe.

Current Code

# Your code here, if applicable
df = pd.DataFrame({"column1": range(6), 
                   "column2": range(6), 
                   'group': 3*['A','B'], 
                   'date':pd.date_range("20190101", periods=6)})
df.loc[:,'date']=df.loc[0,'date']

df.groupby('group').rolling('1D',on='date')['column1'].sum()

## Output
group  date      
A      2019-01-01    0.0
       2019-01-01    2.0
       2019-01-01    6.0
B      2019-01-01    1.0
       2019-01-01    4.0
       2019-01-01    9.0
Name: column1, dtype: float64

Describe alternatives you've considered

Current workaround as follows, but considerably slower

df.groupby('group').apply(lambda x: x.rolling('1D',on='date',closed='left')['column1'].sum())

## Output
group   
A      0    NaN
       2    0.0
       4    2.0
B      1    NaN
       3    1.0
       5    4.0
Name: column1, dtype: float64

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2020-09-04T15:58:25Z

@nrcjea001 what's your suggested fix here? And is your expected output right? Why isn't there a date level in the index?

nrcjea001 · 2020-09-06T08:26:25Z

No bug fix here. Rather a suggestion. In situations where the index is not datetime, and we rolling on a datetime column, it would be nice to preserve the index and the date column when performing a rolling groupby.

A workaround is to use a groupby apply with rolling function, but this is considerably slower.

May I suggest that the rolling groupby preserves the actual dataframe index, when rolling on a datetime column rather than datetime index.

trianta2 · 2022-02-09T22:44:37Z

I'm running into a similar issue. I'm doing a groupby(category, as_index=False).rolling(freq, on=timestamp) but instead of preserving the original RangeIndex, the returned dataframe has an index with category values.

Not what I expected, given that I explicitly said as_index=False.

blademwang11 · 2022-08-16T19:53:29Z

Is your feature request related to a problem?

A problem arises when rolling on datetime column where dates are the same. The Groupby Rolling function does not preserve the original index and so when dates are the same within the Group, it is impossible to know which index value it pertains to from the original dataframe.

Describe the solution you'd like

Ideally we wish to preserve the original index so that it can be merged back onto the original dataframe.

Current Code
# Your code here, if applicable
df = pd.DataFrame({"column1": range(6), 
                   "column2": range(6), 
                   'group': 3*['A','B'], 
                   'date':pd.date_range("20190101", periods=6)})
df.loc[:,'date']=df.loc[0,'date']

df.groupby('group').rolling('1D',on='date')['column1'].sum()

## Output
group  date      
A      2019-01-01    0.0
       2019-01-01    2.0
       2019-01-01    6.0
B      2019-01-01    1.0
       2019-01-01    4.0
       2019-01-01    9.0
Name: column1, dtype: float64
Describe alternatives you've considered

Current workaround as follows, but considerably slower
df.groupby('group').apply(lambda x: x.rolling('1D',on='date',closed='left')['column1'].sum())

## Output
group   
A      0    NaN
       2    0.0
       4    2.0
B      1    NaN
       3    1.0
       5    4.0
Name: column1, dtype: float64

You actually don't need the index preserved to get what you need. You can add whatever columns you need in the egg function and apply a function keep the last row. From your example, the solution would be:

df1 = df.groupby('group').rolling('1D',on='date').agg({'column1': sum, 'column2': lambda row: row[-1]})

I do agree there should be a better way to keep the linking of the original data frame to the new data frame from window aggregation. This solution also won't work if the current row is not within the window range. Maybe a special aggregation function to preserve the current row in the window?

nrcjea001 added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 4, 2020

TomAugspurger added Needs Info Clarification about behavior needed to assess issue and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 4, 2020

mroeschke added Window rolling, ewma, expanding and removed Needs Info Clarification about behavior needed to assess issue labels Mar 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Preserve Index in Pandas Groupby Rolling on Datetime #35551

ENH: Preserve Index in Pandas Groupby Rolling on Datetime #35551

nrcjea001 commented Aug 4, 2020

TomAugspurger commented Sep 4, 2020

nrcjea001 commented Sep 6, 2020

trianta2 commented Feb 9, 2022

blademwang11 commented Aug 16, 2022

Is your feature request related to a problem?

Describe the solution you'd like

Current Code

Describe alternatives you've considered

ENH: Preserve Index in Pandas Groupby Rolling on Datetime #35551

ENH: Preserve Index in Pandas Groupby Rolling on Datetime #35551

Comments

nrcjea001 commented Aug 4, 2020

Is your feature request related to a problem?

Describe the solution you'd like

Current Code

Describe alternatives you've considered

TomAugspurger commented Sep 4, 2020

nrcjea001 commented Sep 6, 2020

trianta2 commented Feb 9, 2022

blademwang11 commented Aug 16, 2022

Is your feature request related to a problem?

Describe the solution you'd like

Current Code

Describe alternatives you've considered