Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Rolling with steps #43973

Closed
wants to merge 16 commits into from
Closed

Conversation

rosagold
Copy link
Contributor

@rosagold rosagold commented Oct 11, 2021

@rosagold rosagold changed the title Rolling w steps Rolling with steps Oct 11, 2021
@jreback jreback added the Window rolling, ewma, expanding label Oct 12, 2021
@rosagold rosagold marked this pull request as ready for review October 14, 2021 00:56
@rosagold
Copy link
Contributor Author

rosagold commented Oct 14, 2021

moved question here

@rosagold rosagold changed the title Rolling with steps ENH: Rolling with steps Oct 14, 2021
@rosagold rosagold marked this pull request as draft October 14, 2021 13:52
@mroeschke
Copy link
Member

Personally, I am still not 100% enthusiastic about including this directly in pandas due to:

  1. Stepping being functionally equivalent to indexing/masking like in this example Rolling window with step size #15354 (comment), albeit I recognize this is more cumbersome when expressing over a DatetimeIndex
  2. Breaks the API output consistency of windowing operations as described Rolling window with step size #15354 (comment)
  3. step not being well defined/ generalizable in other windowing operations:
    3a. expanding operations still requires all observations up to that point
    3b. ewm operations still requires all observations up to that point, unless step would affect the ewm calculation directly?
    3c. groupby followed by rolling I guess would have step apply to each group.

If this would be pursued, I would suggest having this only apply for pure rolling operations and implemented as a mask on the output of a regular rolling operation

@rosagold
Copy link
Contributor Author

rosagold commented Oct 15, 2021

Personally, I am still not 100% enthusiastic about including this directly in pandas [...]

I'm also not totally convinced, that this is a must have feature, just that i'm currently deeply familiar with the rolling because of #43997 and #43927, so i thought i could tackle this before i focus on something else. But if this has little change to get in pandas, i find other things to do, with no hard feelings :D


  1. Stepping being functionally equivalent to indexing/masking like in this example Rolling window with step size Rolling window with step size #15354 (comment), albeit I recognize this is more cumbersome when expressing over a DatetimeIndex

Yeah, i agree, but as i understand is the main struggle the users precious calculation time. Because any non-standard/non-vectorized function (most functions passed to apply) may calculate lots of unnecessary windows, regardless if they are needed or not.


  1. Breaks the API output consistency of windowing operations as described Rolling window with step size #15354 (comment)

Because i'm also not a fan of breaking the API output consistency i choose an approach which always return same dimensions as the input data.

Instead of manipulating the total number of generated data-chunks (and breaking the api), i would simply set the stepped-over data-chunks to zero length. This by just using the window_bounds which are cheap to calculate. So in the end the (expensive) calculation will most likely not take place (except of a very bad implementation of the function to apply) and the dimensions are kept.


  1. step not being well defined/ generalizable in other windowing operations:
    3a. expanding operations still requires all observations up to that point
    3b. ewm operations still requires all observations up to that point, unless step would affect the ewm calculation directly?
    3c. groupby followed by rolling I guess would have step apply to each group.

If this would be pursued, I would suggest having this only apply for pure rolling operations and implemented as a mask on the output of a regular rolling operation

I totally agree, and would just bring it in where it makes sense, namely pure rolling..

@mroeschke
Copy link
Member

Yeah I think this feature needs a little more discussion / buy-in from the core team before an implementation is pursued, but great work so far!

i would simply set the stepped-over data-chunks to zero length

This an interesting approach which sounds reasonable. To maintain the same dimensions of the input data, what would the value of the stepped over chunks? np.nan?

P.S. If you are interested on working on other windowing issues I would suggest #26958, #37535, #43405, #43579

@rosagold
Copy link
Contributor Author

This an interesting approach which sounds reasonable. To maintain the same dimensions of the input data, what would the value of the stepped over chunks? np.nan?

This depends on the function that is called, unfortunately this is not consistent even not with the existing functions. One can simple see this by calling obj.rolling(window=0)

eg.

import pandas as pd
pd.Series([1,1,1,]).rolling(0).sum()
# 0    0.0
# 1    0.0
# 2    0.0
# dtype: float64

pd.Series([1,1,1,]).rolling(0).max()
# 0    NaN
# 1    NaN
# 2    NaN
# dtype: float64

well, maybe its a bug in sum ?

@rosagold
Copy link
Contributor Author

P.S. If you are interested on working on other windowing issues I would suggest #26958, #37535, #43405, #43579

thx, i will have a look

@rosagold
Copy link
Contributor Author

should i close this PR or should i just keep it a Draft ?

@mroeschke
Copy link
Member

I suggest closing for now as more discussion should happen in the original issue first.

@rosagold rosagold closed this Oct 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Window rolling, ewma, expanding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rolling window with step size
4 participants