rolling_apply only applicable to numeric columns #4964

aschilling · 2013-09-24T15:40:42Z

Hi everybody,

I discovered that the rolling_apply function is only applicable to numeric columns. I think this should be changed as this seems too limited to me. Let's take the following example,

    import datetime as DT
    df = pd.DataFrame({
    'Buyer': 'Carl Mark Carl Joe Joe Carl'.split(),
    'Quantity': [1,3,5,8,9,3],
    'Date' : [
        DT.datetime(2013,9,1,13,0),
        DT.datetime(2013,9,1,13,5),
        DT.datetime(2013,10,1,20,0),
        DT.datetime(2013,10,3,10,0),
        DT.datetime(2013,12,2,12,0),                                      
        DT.datetime(2013,12,2,14,0),
        ]}).set_index('Date')

Now I want to count all new customers each 10 days.

buyers = []

def novices(x):
    new = [n for n in x if n not in buyers]

    if (len(new) > 0): buyers.extend(new) 

    return len(new)


pd.rolling_apply(df['Buyer'], 10, novices)

throws an exception "ValueError: could not convert string to float: Carl"

However, although meaningless a call with a numeric column such as:

pd.rolling_apply(df['Quantity'], 2, novices)

works.

The text was updated successfully, but these errors were encountered:

jreback · 2013-09-24T15:54:25Z

this is a bit of work, because need right now the rollinging functions only accept float dtypes, so need a generator for object types (not that hard) in the code generator. then need to dispatch based on dtype. will put on the board for 0.14.

aschilling · 2013-09-24T16:03:48Z

Thanks, is there maybe an alternative solution how I could implement the functionality with the current pandas version?

jreback · 2013-09-24T16:13:17Z

There's an embeded bug here (that's why i am astyping)...separate issue

In [28]: df['Quantity'].astype('float').resample('10D',how=np.sum)
Out[28]: 
Date
2013-09-01 13:00:00     4
2013-09-11 13:00:00   NaN
2013-09-21 13:00:00   NaN
2013-10-01 13:00:00    13
2013-10-11 13:00:00   NaN
2013-10-21 13:00:00   NaN
2013-10-31 13:00:00   NaN
2013-11-10 13:00:00   NaN
2013-11-20 13:00:00   NaN
2013-11-30 13:00:00    12
2013-12-10 13:00:00     0
Freq: 10D, dtype: float64

aschilling · 2013-09-24T16:28:45Z

Thanks, but I think you got me wrong, I am looking for a possibility to count at each point the number of new customers in the upcoming 10 days. Hence, I need to compare the names of the customers to see if they are new or already known.

jreback · 2013-09-24T17:11:57Z

I think this will work. It is a 'manual' rolling_apply
you can do anything you want the function (e.g. if you can return a Series with multiple values or a DataFrame)
and can efectively 'group' how you want

The key is return None from the function which concat ignores

In [25]: df
Out[25]: 
                    Buyer  Quantity
Date                               
2013-09-01 13:00:00  Carl         1
2013-09-01 13:05:00  Mark         3
2013-10-01 20:00:00  Carl         5
2013-10-03 10:00:00   Joe         8
2013-12-02 12:00:00   Joe         9
2013-12-02 14:00:00  Carl         3

In [24]: dates = date_range(df.index[0],df.index[-1],freq='D')                                          

In [22]: def f(x,d):
   ....:     if len(x):
   ....:         return Series([x['Quantity'].sum()],index=[d])
   ....:     return None
   ....: 

In [23]: concat([ f(df.loc[(df.index>dates[i-10]) & (df.index<=dates[i])], dates[i]) for i in xrange(10,len(dates)) ])
Out[23]: 
2013-09-11 13:00:00     3
2013-10-02 13:00:00     5
2013-10-03 13:00:00    13
2013-10-04 13:00:00    13
2013-10-05 13:00:00    13
2013-10-06 13:00:00    13
2013-10-07 13:00:00    13
2013-10-08 13:00:00    13
2013-10-09 13:00:00    13
2013-10-10 13:00:00    13
2013-10-11 13:00:00    13
2013-10-12 13:00:00     8
2013-12-02 13:00:00     9
dtype: int64

aschilling · 2013-09-25T13:42:59Z

Thanks, that is a great solution. Is there any possibility to use in your last statement [23] a TimeOffset (e.g. a week or month) to select the rows for the f function while still having the variable 'dates' on a daily basis?

jreback · 2013-09-25T15:08:08Z

In [24]: offset = pd.offsets.MonthEnd('1')

In [25]: concat([ f(df.loc[(df.index>d-offset) & (df.index<=d)], d) for d in dates ])
Out[25]: 
2013-09-01 13:00:00    1
2013-09-02 13:00:00    4
2013-09-03 13:00:00    4
2013-09-04 13:00:00    4
2013-09-05 13:00:00    4
2013-09-06 13:00:00    4
2013-09-07 13:00:00    4
2013-09-08 13:00:00    4
2013-09-09 13:00:00    4
2013-09-10 13:00:00    4
2013-09-11 13:00:00    4
2013-09-12 13:00:00    4
2013-09-13 13:00:00    4
2013-09-14 13:00:00    4
2013-09-15 13:00:00    4
...
2013-10-18 13:00:00    13
2013-10-19 13:00:00    13
2013-10-20 13:00:00    13
2013-10-21 13:00:00    13
2013-10-22 13:00:00    13
2013-10-23 13:00:00    13
2013-10-24 13:00:00    13
2013-10-25 13:00:00    13
2013-10-26 13:00:00    13
2013-10-27 13:00:00    13
2013-10-28 13:00:00    13
2013-10-29 13:00:00    13
2013-10-30 13:00:00    13
2013-10-31 13:00:00    13
2013-12-02 13:00:00     9
Length: 61

dashesy · 2015-06-10T00:22:59Z

is it possible to ignore string columns and apply the rolling function to the rest?

jreback · 2016-03-06T00:02:41Z

going to repurpose this issue to skip nuiscance columns in a dataframe

In [4]: df = DataFrame({'A' : pd.date_range('20130101',periods=3),'B' : range(3)})

In [5]: df
Out[5]: 
           A  B
0 2013-01-01  0
1 2013-01-02  1
2 2013-01-03  2

This is a supported op

In [7]: df.rolling(window=2).count()
Out[7]: 
     A    B
0  1.0  1.0
1  2.0  2.0
2  2.0  2.0

This works

In [9]: df.rolling(window=2).B.sum()
Out[9]: 
0    NaN
1    1.0
2    3.0
Name: B, dtype: float64

df.rolling(window=2).sum()

should work (but excluded A) from the results.

jreback · 2016-03-06T00:04:08Z

closing in favor of #12537

Darel13712 · 2018-03-21T15:30:31Z

@jreback Hello is there any possibility to get rolling.apply for objects in 2018?) It still ignores objects as of 0.22.0.

idx = pd.date_range(pd.to_datetime('2017-01-01'), pd.to_datetime('2017-01-03'))
df = pd.DataFrame([[1, 's'],[2, 't'],[3, 'r']], index=idx)

	0	1
2017-01-01	1	s
2017-01-02	2	t
2017-01-03	3	r

it's ok to ignore objects with built-in functions

df.rolling(2, min_periods=1).sum()

	0	1
2017-01-01	1	s
2017-01-02	3	t
2017-01-03	5	r

but why can't i handle objects myself?

def const(x):
    return 5
df.rolling(2, min_periods=1).apply(const)

	0	1
2017-01-01	5	s
2017-01-02	5	t
2017-01-03	5	r

This example above is not really useful but I want to be able to apply function to a column that contains objects.

The only solution I've found is to create your own DataFrame/Series type https://stackoverflow.com/questions/44479384/pandas-rolling-apply-doesnt-do-anything

But I really think this should be in pandas by default.

jreback · 2018-03-21T15:42:42Z

would always take a PR for this

hayd mentioned this issue Sep 27, 2013

Non-Reducing Return Values in Rolling .apply #4130

Closed

jreback modified the milestones: 0.15.0, 0.14.0 Feb 18, 2014

jreback mentioned this issue Oct 29, 2014

API/ENH: master issue for pd.rolling_apply #8659

Closed

14 tasks

jreback modified the milestones: 0.17.0, 0.16.0 Feb 5, 2015

jreback added Difficulty Intermediate labels Apr 6, 2015

This was referenced Nov 16, 2015

Roll apply nonfloat dtypes #11620

Closed

ENH: Support returning the same dtype as the caller for window ops (including extension dtypes) #11446

Open

jreback mentioned this issue Mar 5, 2016

ENH: support timedeltas with window functions #12536

Open

jreback modified the milestones: 0.18.1, Next Major Release Mar 5, 2016

jreback mentioned this issue Mar 6, 2016

ENH: window functions need to exclude nuiscance columns #12537

Closed

jreback closed this as completed Mar 6, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rolling_apply only applicable to numeric columns #4964

rolling_apply only applicable to numeric columns #4964

aschilling commented Sep 24, 2013

jreback commented Sep 24, 2013

aschilling commented Sep 24, 2013

jreback commented Sep 24, 2013

aschilling commented Sep 24, 2013

jreback commented Sep 24, 2013

aschilling commented Sep 25, 2013

jreback commented Sep 25, 2013

dashesy commented Jun 10, 2015

jreback commented Mar 6, 2016

jreback commented Mar 6, 2016

Darel13712 commented Mar 21, 2018 •

edited

Loading

jreback commented Mar 21, 2018

rolling_apply only applicable to numeric columns #4964

rolling_apply only applicable to numeric columns #4964

Comments

aschilling commented Sep 24, 2013

jreback commented Sep 24, 2013

aschilling commented Sep 24, 2013

jreback commented Sep 24, 2013

aschilling commented Sep 24, 2013

jreback commented Sep 24, 2013

aschilling commented Sep 25, 2013

jreback commented Sep 25, 2013

dashesy commented Jun 10, 2015

jreback commented Mar 6, 2016

jreback commented Mar 6, 2016

Darel13712 commented Mar 21, 2018 • edited Loading

jreback commented Mar 21, 2018

Darel13712 commented Mar 21, 2018 •

edited

Loading