Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

forward rolling functions #6772

Closed
hayd opened this issue Apr 2, 2014 · 7 comments
Closed

forward rolling functions #6772

hayd opened this issue Apr 2, 2014 · 7 comments
Labels
Enhancement Window rolling, ewma, expanding

Comments

@hayd
Copy link
Contributor

hayd commented Apr 2, 2014

https://groups.google.com/forum/#!msg/pydata/LLPnzMfDlSg/OpK-mHJzJwYJ
http://stackoverflow.com/questions/22820292/how-to-use-pandas-rolling-functions-on-a-forward-looking-basis

Applying to reverse Series and reversing could work on all (?) rolling functions, I think sometimes can just do on values array, a kwarg would be nice.

pd.rolling_(s[::-1])[::-1]
pd.Series(pd.rolling_(s.values[::-1])[::-1), s.index, name=s.name)  # faster?

(OP says that this doesn't actually work in some cases: http://stackoverflow.com/questions/22820292/how-to-use-pandas-rolling-functions-on-a-forward-looking-basis/22820689?noredirect=1#comment34813442_22820689... ?)

@jreback jreback added the Resample resample method label Mar 8, 2015
@jreback jreback added this to the Someday milestone Mar 8, 2015
@luoshao23
Copy link

Is anybody still working on it ?

@erfannariman
Copy link
Member

erfannariman commented Jul 25, 2019

This is one I actually miss quite often. With datetime index you can't just reverse the dataframe, since you get the error: ValueError: index must be monotonic

Here's an example:

df = pd.DataFrame(data=[
    [pd.Timestamp('2018-01-01 00:00:00'), 100],
    [pd.Timestamp('2018-01-01 00:00:01'), 101],
    [pd.Timestamp('2018-01-01 00:00:03'), 103],
    [pd.Timestamp('2018-01-01 00:00:04'), 111]
], columns=['time', 'value']).set_index('time')

df[::-1].rolling('2s')

Output

ValueError: index must be monotonic

Tried resampling. Not working either:

df.resample('1s').first().ffill()[::-1].rolling('2s').mean()

Solution to this issue for now:

@smeana
Copy link

smeana commented Aug 3, 2019

If you have time-series on a fix interval, let's say every second, you can do shifting - Rolling:

shifting(-30) -> 30 seconds back and then rolling to find the mean, max, min... on that window.

df_rol['.......'] = df['........'].shift(-1,freq=dt).rolling(dt).min()

If you don't have a fix interval try Truncate (truncate() is gonna ask you to sort_index()):

df['....'] = df.index.map(
lambda x: df.truncate(before=x, after=x + pd.Timedelta(dt,unit='s'))['....'].max())

With truncate, the computational time is exponential as you have more rows, Let's say 2min for 1 million rows and 10 min for 2 millions.


You can do the slicing based on numeric index that sometimes is faster than compare index datetimes:

init = get_loc(date)
end = get_loc(date+ pd.Timedelta(dt,unit='s'),method='nearest')
df.iloc[init:end]['column'].max()

Looping something like this (not optimum) but probably useful for any custom rolling (try iterrows(), itertuples()):

def window(deltaseconds,df):
    colmax = '{}maxPrice'.format(deltaseconds)
    colmin = '{}minPrice'.format(deltaseconds)
    df_max = pd.DataFrame(columns = [colmax], index=list(range(0,df.shape[0])))
    df_min = pd.DataFrame(columns = [colmin], index=list(range(0,df.shape[0])))

    for i in range(0,df.shape[0]):
        print(i)
        init = df.index[i]
        end = init + pd.Timedelta(deltaseconds,unit='s')
        maxPrice = 0
        minPrice = 0
        for j in range(i,df.shape[0]):
            
            if df.index[i]<=end:
                if df.iloc[j]['LastPrice']>maxPrice:
                    maxPrice = df.iloc[j]['LastPrice']
                elif df.iloc[j]['LastPrice']<minPrice:
                    minPrice = df.iloc[j]['LastPrice']<min
            else:
                break
        df_max.loc[i] = maxPrice - df.iloc[i]['LastPrice']
        df_min.loc[i] = minPrice - df.iloc[i]['LastPrice']
        print("j:" + str(j))
        
    return pd.concat([df,df_max,df_min],axis=1,join_axes=[df.index])

Using Dask with map_overlap as generic rolling function where you can specify before and after:

map_overlap(lambda x:...)

Documentation


Working on billions of rows, even Dask or numpy with numba in any of the above solution is slow.
I found a JAVA custom multi-thread looping on concurrent collections the fasted one, to solve this kind of forward rollings.

Would be really useful to have a forward rolling() in a function.

@oldrichsmejkal
Copy link

Should be resolved by: #28297

@mroeschke mroeschke added Window rolling, ewma, expanding and removed Resample resample method labels Dec 28, 2019
@mroeschke
Copy link
Member

Yup looks like #28297 addresses this

@jfaleiro
Copy link

This issue should not be closed since #28297 (what seems to address this bug) isn't closed.

@benedekpasztor
Copy link

A possible solution is to shift the rolling window aggregation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Window rolling, ewma, expanding
Projects
None yet
Development

No branches or pull requests

9 participants