forward rolling functions #6772

hayd · 2014-04-02T21:45:13Z

https://groups.google.com/forum/#!msg/pydata/LLPnzMfDlSg/OpK-mHJzJwYJ
http://stackoverflow.com/questions/22820292/how-to-use-pandas-rolling-functions-on-a-forward-looking-basis

Applying to reverse Series and reversing could work on all (?) rolling functions, I think sometimes can just do on values array, a kwarg would be nice.

pd.rolling_(s[::-1])[::-1]
pd.Series(pd.rolling_(s.values[::-1])[::-1), s.index, name=s.name)  # faster?

(OP says that this doesn't actually work in some cases: http://stackoverflow.com/questions/22820292/how-to-use-pandas-rolling-functions-on-a-forward-looking-basis/22820689?noredirect=1#comment34813442_22820689... ?)

The text was updated successfully, but these errors were encountered:

luoshao23 · 2018-05-30T06:49:17Z

Is anybody still working on it ?

erfannariman · 2019-07-25T16:40:18Z

This is one I actually miss quite often. With datetime index you can't just reverse the dataframe, since you get the error: ValueError: index must be monotonic

Here's an example:

df = pd.DataFrame(data=[
    [pd.Timestamp('2018-01-01 00:00:00'), 100],
    [pd.Timestamp('2018-01-01 00:00:01'), 101],
    [pd.Timestamp('2018-01-01 00:00:03'), 103],
    [pd.Timestamp('2018-01-01 00:00:04'), 111]
], columns=['time', 'value']).set_index('time')

df[::-1].rolling('2s')

Output

ValueError: index must be monotonic

Tried resampling. Not working either:

df.resample('1s').first().ffill()[::-1].rolling('2s').mean()

Solution to this issue for now:

smeana · 2019-08-03T00:55:20Z

If you have time-series on a fix interval, let's say every second, you can do shifting - Rolling:

shifting(-30) -> 30 seconds back and then rolling to find the mean, max, min... on that window.

df_rol['.......'] = df['........'].shift(-1,freq=dt).rolling(dt).min()

If you don't have a fix interval try Truncate (truncate() is gonna ask you to sort_index()):

df['....'] = df.index.map(
lambda x: df.truncate(before=x, after=x + pd.Timedelta(dt,unit='s'))['....'].max())

With truncate, the computational time is exponential as you have more rows, Let's say 2min for 1 million rows and 10 min for 2 millions.

You can do the slicing based on numeric index that sometimes is faster than compare index datetimes:

init = get_loc(date)
end = get_loc(date+ pd.Timedelta(dt,unit='s'),method='nearest')
df.iloc[init:end]['column'].max()

Looping something like this (not optimum) but probably useful for any custom rolling (try iterrows(), itertuples()):

def window(deltaseconds,df):
    colmax = '{}maxPrice'.format(deltaseconds)
    colmin = '{}minPrice'.format(deltaseconds)
    df_max = pd.DataFrame(columns = [colmax], index=list(range(0,df.shape[0])))
    df_min = pd.DataFrame(columns = [colmin], index=list(range(0,df.shape[0])))

    for i in range(0,df.shape[0]):
        print(i)
        init = df.index[i]
        end = init + pd.Timedelta(deltaseconds,unit='s')
        maxPrice = 0
        minPrice = 0
        for j in range(i,df.shape[0]):
            
            if df.index[i]<=end:
                if df.iloc[j]['LastPrice']>maxPrice:
                    maxPrice = df.iloc[j]['LastPrice']
                elif df.iloc[j]['LastPrice']<minPrice:
                    minPrice = df.iloc[j]['LastPrice']<min
            else:
                break
        df_max.loc[i] = maxPrice - df.iloc[i]['LastPrice']
        df_min.loc[i] = minPrice - df.iloc[i]['LastPrice']
        print("j:" + str(j))
        
    return pd.concat([df,df_max,df_min],axis=1,join_axes=[df.index])

Using Dask with map_overlap as generic rolling function where you can specify before and after:

map_overlap(lambda x:...)

Documentation

Working on billions of rows, even Dask or numpy with numba in any of the above solution is slow.
I found a JAVA custom multi-thread looping on concurrent collections the fasted one, to solve this kind of forward rollings.

Would be really useful to have a forward rolling() in a function.

oldrichsmejkal · 2019-12-12T15:33:48Z

Should be resolved by: #28297

mroeschke · 2019-12-28T05:11:56Z

Yup looks like #28297 addresses this

jfaleiro · 2020-03-11T16:23:46Z

This issue should not be closed since #28297 (what seems to address this bug) isn't closed.

benedekpasztor · 2022-01-11T16:51:15Z

A possible solution is to shift the rolling window aggregation

hayd added the Enhancement label Apr 2, 2014

jreback added the Resample resample method label Mar 8, 2015

jreback added this to the Someday milestone Mar 8, 2015

mroeschke added Window rolling, ewma, expanding and removed Resample resample method labels Dec 28, 2019

mroeschke closed this as completed Dec 28, 2019

mroeschke mentioned this issue Sep 11, 2020

ENH: Add VariableForwardWindowIndexer #34226

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

forward rolling functions #6772

forward rolling functions #6772

hayd commented Apr 2, 2014

luoshao23 commented May 30, 2018

erfannariman commented Jul 25, 2019 •

edited

smeana commented Aug 3, 2019 •

edited

oldrichsmejkal commented Dec 12, 2019

mroeschke commented Dec 28, 2019

jfaleiro commented Mar 11, 2020

benedekpasztor commented Jan 11, 2022

forward rolling functions #6772

forward rolling functions #6772

Comments

hayd commented Apr 2, 2014

luoshao23 commented May 30, 2018

erfannariman commented Jul 25, 2019 • edited

smeana commented Aug 3, 2019 • edited

oldrichsmejkal commented Dec 12, 2019

mroeschke commented Dec 28, 2019

jfaleiro commented Mar 11, 2020

benedekpasztor commented Jan 11, 2022

erfannariman commented Jul 25, 2019 •

edited

smeana commented Aug 3, 2019 •

edited