Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Preserve Index in Pandas Groupby Rolling on Datetime #35551

Open
nrcjea001 opened this issue Aug 4, 2020 · 4 comments
Open

ENH: Preserve Index in Pandas Groupby Rolling on Datetime #35551

nrcjea001 opened this issue Aug 4, 2020 · 4 comments
Labels
Enhancement Window rolling, ewma, expanding

Comments

@nrcjea001
Copy link

Is your feature request related to a problem?

A problem arises when rolling on datetime column where dates are the same. The Groupby Rolling function does not preserve the original index and so when dates are the same within the Group, it is impossible to know which index value it pertains to from the original dataframe.

Describe the solution you'd like

Ideally we wish to preserve the original index so that it can be merged back onto the original dataframe.

Current Code

# Your code here, if applicable
df = pd.DataFrame({"column1": range(6), 
                   "column2": range(6), 
                   'group': 3*['A','B'], 
                   'date':pd.date_range("20190101", periods=6)})
df.loc[:,'date']=df.loc[0,'date']

df.groupby('group').rolling('1D',on='date')['column1'].sum()

## Output
group  date      
A      2019-01-01    0.0
       2019-01-01    2.0
       2019-01-01    6.0
B      2019-01-01    1.0
       2019-01-01    4.0
       2019-01-01    9.0
Name: column1, dtype: float64

Describe alternatives you've considered

Current workaround as follows, but considerably slower

df.groupby('group').apply(lambda x: x.rolling('1D',on='date',closed='left')['column1'].sum())

## Output
group   
A      0    NaN
       2    0.0
       4    2.0
B      1    NaN
       3    1.0
       5    4.0
Name: column1, dtype: float64
@nrcjea001 nrcjea001 added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 4, 2020
@TomAugspurger
Copy link
Contributor

@nrcjea001 what's your suggested fix here? And is your expected output right? Why isn't there a date level in the index?

@TomAugspurger TomAugspurger added Needs Info Clarification about behavior needed to assess issue and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 4, 2020
@nrcjea001
Copy link
Author

No bug fix here. Rather a suggestion. In situations where the index is not datetime, and we rolling on a datetime column, it would be nice to preserve the index and the date column when performing a rolling groupby.

A workaround is to use a groupby apply with rolling function, but this is considerably slower.

May I suggest that the rolling groupby preserves the actual dataframe index, when rolling on a datetime column rather than datetime index.

@mroeschke mroeschke added Window rolling, ewma, expanding and removed Needs Info Clarification about behavior needed to assess issue labels Mar 26, 2021
@trianta2
Copy link

trianta2 commented Feb 9, 2022

I'm running into a similar issue. I'm doing a groupby(category, as_index=False).rolling(freq, on=timestamp) but instead of preserving the original RangeIndex, the returned dataframe has an index with category values.

Not what I expected, given that I explicitly said as_index=False.

@blademwang11
Copy link

Is your feature request related to a problem?

A problem arises when rolling on datetime column where dates are the same. The Groupby Rolling function does not preserve the original index and so when dates are the same within the Group, it is impossible to know which index value it pertains to from the original dataframe.

Describe the solution you'd like

Ideally we wish to preserve the original index so that it can be merged back onto the original dataframe.

Current Code

# Your code here, if applicable
df = pd.DataFrame({"column1": range(6), 
                   "column2": range(6), 
                   'group': 3*['A','B'], 
                   'date':pd.date_range("20190101", periods=6)})
df.loc[:,'date']=df.loc[0,'date']

df.groupby('group').rolling('1D',on='date')['column1'].sum()

## Output
group  date      
A      2019-01-01    0.0
       2019-01-01    2.0
       2019-01-01    6.0
B      2019-01-01    1.0
       2019-01-01    4.0
       2019-01-01    9.0
Name: column1, dtype: float64

Describe alternatives you've considered

Current workaround as follows, but considerably slower

df.groupby('group').apply(lambda x: x.rolling('1D',on='date',closed='left')['column1'].sum())

## Output
group   
A      0    NaN
       2    0.0
       4    2.0
B      1    NaN
       3    1.0
       5    4.0
Name: column1, dtype: float64

You actually don't need the index preserved to get what you need. You can add whatever columns you need in the egg function and apply a function keep the last row. From your example, the solution would be:

df1 = df.groupby('group').rolling('1D',on='date').agg({'column1': sum, 'column2': lambda row: row[-1]})

I do agree there should be a better way to keep the linking of the original data frame to the new data frame from window aggregation. This solution also won't work if the current row is not within the window range. Maybe a special aggregation function to preserve the current row in the window?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Window rolling, ewma, expanding
Projects
None yet
Development

No branches or pull requests

5 participants