Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH/API: rolling_apply to pass frames to the rolled function (rather than ndarrays) #5071

Closed
jreback opened this Issue Oct 1, 2013 · 10 comments

Comments

@jseabold

This comment has been minimized.

Copy link
Contributor

jseabold commented Jan 9, 2014

+1

I was just trying to do similar. Would be nice if rolling_apply, expanding_apply had an option to work over the whole DataFrame. It doesn't even have to pass frames, but rather just roll over the whole 0 axis instead of one series at a time.

@ghost

This comment has been minimized.

Copy link

ghost commented Jan 9, 2014

That sounds equivalent to the split-apply(-combine) approach of groupby, only pandas doesn't
currently provide that sort of split function.

related #4059

@twiecki

This comment has been minimized.

Copy link
Contributor

twiecki commented Jun 22, 2015

Just ran into the same issue.

@flashhack

This comment has been minimized.

Copy link

flashhack commented Jun 27, 2015

same issue here

@max-sixty

This comment has been minimized.

Copy link
Contributor

max-sixty commented Apr 22, 2016

@jreback What's the best way to do this?

If I try and change the _apply method on _Rolling to take pandas objects rather than numpy arrays, a few of the standard functions fail (e.g. _zsqrt):

...
return _zsqrt(algos.roll_var(arg, window, minp, ddof))
TypeError: Argument 'input' has incorrect type (expected numpy.ndarray, got Series)

Could this be done in roll_generic? Or with an additional path other that the standard _apply for user-supplied functions? Neither seem that compelling

@jreback

This comment has been minimized.

Copy link
Contributor Author

jreback commented Apr 22, 2016

So just to have an example

In [32]: df = DataFrame({'A' : np.random.randn(5), 'B' : np.random.randint(0,10,size=5)})

In [33]: def f(x):
    print type(x)
    return x.sum()
   ....: 

In [34]: df.rolling(2).apply(f)
<type 'numpy.ndarray'>
<type 'numpy.ndarray'>
<type 'numpy.ndarray'>
<type 'numpy.ndarray'>
<type 'numpy.ndarray'>
<type 'numpy.ndarray'>
<type 'numpy.ndarray'>
<type 'numpy.ndarray'>
Out[34]: 
          A     B
0       NaN   NaN
1 -0.414646  15.0
2  1.007150   8.0
3  1.822979   2.0
4  0.884894   4.0

The issue is that you need to pass a constructed object to algos.roll_generic (or maybe a new function) which does the windowing.

here

@max-sixty

This comment has been minimized.

Copy link
Contributor

max-sixty commented Apr 22, 2016

Is this do-able with roll_generic? It seems that requires an array:

In [28]: series=pd.Series(range(10),dtype='float64')

In [29]: roll_generic(series, win=2, minp=2, offset=0, func=lambda x: x.sum(), args=[], kwargs={})
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-29-3ec0f9465dad> in <module>()
----> 1 roll_generic(series, win=2, minp=2, offset=0, func=lambda x: x.sum(), args=[], kwargs={})

TypeError: Argument 'input' has incorrect type (expected numpy.ndarray, got Series)

Does that mean we need a parallel function which operates on Series?

I could imagine having a function that generated the groups - then it would actually be a groupby. But haven't thought through it enough and performance may be an issue.

@jreback

This comment has been minimized.

Copy link
Contributor Author

jreback commented Apr 22, 2016

no u have to change roll_generic to take an object

doing with GroupBy is a whole separate idea - I may do that but it's orthogonal (and the reason is different than this)

@max-sixty

This comment has been minimized.

Copy link
Contributor

max-sixty commented Apr 22, 2016

OK, I haven't worked with Cython before, and not sure how it handles non-numpy arrays, but I can have a go. Probably won't have immediate results.

@citynorman

This comment has been minimized.

Copy link

citynorman commented Aug 6, 2016

Almost 3 years and it's still an issue :'(
`

import pandas as pd
import numpy as np

def distance_sum(df):
    print df
    df['norm1']=df.ix[:,0]/df.ix[0,0]
    df['norm2']=df.ix[:,1]/df.ix[0,1]
    return np.sum(np.square(df['norm1']-df['norm2']))

df=pd.DataFrame({'a':np.array([1,2,3]),'b':np.array([10,20,30])})
df.rolling(center=False,window=2).apply(distance_sum)

`

AttributeError Traceback (most recent call last)
in ()
9
10 df=pd.DataFrame({'a':np.array([1,2,3]),'b':np.array([10,20,30])})
---> 11 df.rolling(center=False,window=2).apply(distance_sum)

/usr/local/lib/python2.7/dist-packages/pandas/core/generic.pyc in getattr(self, name)
2358 return self[name]
2359 raise AttributeError("'%s' object has no attribute '%s'" %
-> 2360 (type(self).name, name))
2361
2362 def setattr(self, name, value):

AttributeError: 'DataFrame' object has no attribute 'rolling'

OR


AttributeError Traceback (most recent call last)
in ()
14
15 t=pd.DataFrame({'a':a,'b':b})
---> 16 t.rolling(center=False,window=2).apply(test_distance_sum)

/usr/local/lib/python2.7/dist-packages/pandas/core/window.pyc in apply(self, func, args, kwargs)

/usr/local/lib/python2.7/dist-packages/pandas/core/window.pyc in apply(self, func, args, kwargs)

/usr/local/lib/python2.7/dist-packages/pandas/core/window.pyc in _apply(self, func, name, window, center, check_minp, how, **kwargs)

/usr/local/lib/python2.7/dist-packages/numpy/lib/shape_base.pyc in apply_along_axis(func1d, axis, arr, _args, *_kwargs)
89 outshape = asarray(arr.shape).take(indlist)
90 i.put(indlist, ind)
---> 91 res = func1d(arr[tuple(i.tolist())], _args, *_kwargs)
92 # if res is a number, then we have a smaller output array
93 if isscalar(res):

/usr/local/lib/python2.7/dist-packages/pandas/core/window.pyc in calc(x)

/usr/local/lib/python2.7/dist-packages/pandas/core/window.pyc in f(arg, window, min_periods)

pandas/algos.pyx in pandas.algos.roll_generic (pandas/algos.c:51577)()

in test_distance_sum(df)
9 def test_distance_sum(df):
10 print df
---> 11 df['pxnorm1']=df.ix[:,0]/df.ix[0,0]
12 df['pxnorm2']=df.ix[:,1]/df.ix[0,1]
13 return np.mean(df)#np.sum(np.square(df['pxnorm1']-df['pxnorm2']))

AttributeError: 'numpy.ndarray' object has no attribute 'ix'

@jreback jreback modified the milestones: Interesting Issues, Next Major Release Sep 11, 2017

@jreback jreback modified the milestones: Interesting Issues, Next Major Release Nov 26, 2017

@jreback jreback added this to API in Interesting Things Nov 26, 2017

jreback added a commit to jreback/pandas that referenced this issue Apr 2, 2018

jreback added a commit to jreback/pandas that referenced this issue Apr 2, 2018

jreback added a commit to jreback/pandas that referenced this issue Apr 10, 2018

jreback added a commit to jreback/pandas that referenced this issue Apr 12, 2018

jreback added a commit to jreback/pandas that referenced this issue Apr 13, 2018

jreback added a commit to jreback/pandas that referenced this issue Apr 14, 2018

jreback added a commit to jreback/pandas that referenced this issue Apr 15, 2018

jreback added a commit to jreback/pandas that referenced this issue Apr 16, 2018

@jreback jreback modified the milestones: Next Major Release, 0.23.0 Apr 16, 2018

jreback added a commit that referenced this issue Apr 16, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.