Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: provide Rolling/Expanding/EWM objects for deferred rolling type calculations #10702 #11603

Merged
merged 8 commits into from
Dec 19, 2015

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Nov 14, 2015

closes #10702
closes #9052
xref #4950, removal of depr

So this basically takes all of the pd.rolling_*,pd.expanding_*,pd.ewma_* routines and allows an object oriented interface, similar to groupby.

Some benefits:

  • nice tab completions on the Rolling/Expanding/EWM objects
  • much cleaner code internally
  • complete back compat, e.g. everything just works like it did
  • added a .agg/aggregate function, similar to groupby, where you can do multiple aggregations at once
  • added __getitem__ accessing, e.g. df.rolling(....)['A','B'].sum() for a nicer API
  • allows for much of API/ENH: master issue for pd.rolling_apply #8659 to be done very easily
  • fix for coercing Timedeltas properly
  • handling nuiscance (string) columns

Other:

  • along with window doc rewrite, fixed doc-strings for groupby/window to provide back-refs

ToDO:

  • I think that all of the doc-strings are correct, but need check
  • implement .agg
  • update API.rst, what's new
  • deprecate the pd.expanding_*,pd.rolling_*,pd.ewma_* interface as this is polluting the top-level namespace quite a bit
  • change the docs to use the new API
In [4]: df = DataFrame({'A' : range(5), 'B' : pd.timedelta_range('1 day',periods=5), 'C' : 'foo'})

In [5]: df.rolling(window=2).sum()
Out[5]: 
    A      B    C
0 NaN    NaT  foo
1   1 3 days  foo
2   3 5 days  foo
3   5 7 days  foo
4   7 9 days  foo

In [6]: df.rolling(window=2)['A','C'].sum()
Out[6]: 
    A    C
0 NaN  foo
1   1  foo
2   3  foo
3   5  foo
4   7  foo

In [2]: r = df.rolling(window=3)

In [3]: r.
r.A         r.C         r.corr      r.cov       r.max       r.median    r.name      r.skew      r.sum       
r.B         r.apply     r.count     r.kurt      r.mean      r.min       r.quantile  r.std       r.var       

do rolling/expanding/ewma ops

In [1]: s = Series(range(5))

In [2]: r = s.rolling(2)

I# pd.rolling_sum
In [3]: r.sum()
Out[3]: 
0   NaN
1     1
2     3
3     5
4     7
dtype: float64

# nicer repr
In [4]: r
Out[4]: Rolling [window->2,center->False,axis->0]

In [5]: e = s.expanding(min_periods=2)

# pd.expanding_sum
In [6]: e.sum()
Out[6]: 
0   NaN
1     1
2     3
3     6
4    10
dtype: float64

In [7]: em = s.ewm(com=10)

# pd.ewma
In [8]: em.mean()
Out[8]: 
0    0.000000
1    0.523810
2    1.063444
3    1.618832
4    2.189874
dtype: float64

and allow the various aggregation type of ops (similar to groupby)

In [1]: df = DataFrame({'A' : range(5), 'B' : pd.timedelta_range('1 day',periods=5), 'C' : 'foo'})

In [2]: r = df.rolling(2,min_periods=1)

In [3]: r.agg([np.sum,np.mean])
Out[3]: 
    A           B                    C     
  sum mean    sum            mean  sum mean
0   0  0.0 1 days 1 days 00:00:00  foo  foo
1   1  0.5 3 days 1 days 12:00:00  foo  foo
2   3  1.5 5 days 2 days 12:00:00  foo  foo
3   5  2.5 7 days 3 days 12:00:00  foo  foo
4   7  3.5 9 days 4 days 12:00:00  foo  foo

In [4]: r.agg({'A' : 'sum', 'B' : 'mean'})
Out[4]: 
   A               B
0  0 1 days 00:00:00
1  1 1 days 12:00:00
2  3 2 days 12:00:00
3  5 3 days 12:00:00
4  7 4 days 12:00:00

@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode API Design labels Nov 14, 2015
@jreback jreback added this to the 0.17.1 milestone Nov 14, 2015
@shoyer
Copy link
Member

shoyer commented Nov 15, 2015

Cc @jhamman who has been working on this for xray.

@jreback jreback modified the milestones: 0.18.0, 0.17.1 Nov 15, 2015
@jreback
Copy link
Contributor Author

jreback commented Nov 15, 2015

pushing to 0.18.0, I think __getitem__ will be a really nice add here. might as well do all of this at once.

@jreback
Copy link
Contributor Author

jreback commented Nov 22, 2015

ok, this is ready, MUCH bigger rabbit hole that I thought.

note on the doc-strings.

Since now we have much more like a groupby interface, e.g.

s.rolling(....).sum(), the doc-strings for Rolling.sum are minimal but have a See also back to the Series/DataFrame.rolling (We don't have the notion of a RollingSeries,RolldingDataFrame class so this would be quite tricky).

Further I did the same with groupby doc-strings (again don't have the class distinction on the See Also).

@jorisvandenbossche @shoyer @sinhrks @TomAugspurger @cpcloud

@jreback jreback force-pushed the rolling branch 2 times, most recently from f38b713 to af0dc3c Compare November 22, 2015 23:45
@seth-p
Copy link
Contributor

seth-p commented Nov 25, 2015

One thing I've wanted to add but haven't, which may be easier using this framework, at least interface-wise, is rolling exponentially weighted functions -- i.e. add a window to all the ewm*() parameters. Obviously this would need to be implemented in Cython for performance, but perhaps interface-wise it would be simpler using the scheme proposed here.

@jreback
Copy link
Contributor Author

jreback commented Nov 25, 2015

yep that would be quite straightforward to do interface wise but yes would need to be added to the cython functions (but not too hard there)

@jreback
Copy link
Contributor Author

jreback commented Nov 25, 2015

any comments?


Generally these methods all have the same interface. The binary operators
(e.g. :func:`rolling_corr`) take two Series or DataFrames. Otherwise, they all
The API for window statistics is quite similar to the way one works with ``Groupby`` objects, see the documentation :ref:`here <groupby>`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be GroupBy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@jreback
Copy link
Contributor Author

jreback commented Dec 15, 2015

so now aggregations are consistent with what you'd expect

In [3]:  df = DataFrame({'A' : range(5),'B' : range(0,10,2)})

In [4]: r = df.rolling(window=3)

In [5]: r.agg(['mean','sum'])
Out[5]: 
     A        B    
  mean sum mean sum
0  NaN NaN  NaN NaN
1  NaN NaN  NaN NaN
2    1   3    2   6
3    2   6    4  12
4    3   9    6  18

In [6]: r['A'].agg(['mean','sum'])
Out[6]: 
   mean  sum
0   NaN  NaN
1   NaN  NaN
2     1    3
3     2    6
4     3    9

In [7]: r.agg({'A' : ['mean','sum']})
Out[7]: 
   mean  sum
0   NaN  NaN
1   NaN  NaN
2     1    3
3     2    6
4     3    9

@jorisvandenbossche
Copy link
Member

For the last one (r.agg({'A' : ['mean','sum']})), in the case of groupby you still have the 'A' in the columns:

In [49]: grouped.agg({'C': {'r':np.sum, 'r2':np.mean}})
Out[49]:
                  C
                  r        r2
A   B
bar one    1.249205  1.249205
    three -0.262759 -0.262759
    two    1.151419  1.151419
foo one   -0.518008 -0.259004
    three  0.588044  0.588044
    two    1.635643  0.817821

There are some problems with the how and freq keyword. freq is still allowed but deprecated, but the accompanying how is not allowed:

this is correct, you cannot pass how to .mean nor .sum. This is simply invalid syntax. (and nor was it allowed in the original impl).

In any case, this was allowed and is in the docstring. Eg the example I gave works for 0.16.2:

In [63]: ser = pd.Series(np.random.randn(20), index=pd.date_range('1/1/2000', pe
riods=20, freq='12H'))
In [64]: pd.rolling_mean(ser, window=5, freq='D', how='max')
Out[64]:
2000-01-01         NaN
2000-01-02         NaN
2000-01-03         NaN
2000-01-04         NaN
2000-01-05    0.314516
2000-01-06    0.511396
2000-01-07    0.343306
2000-01-08    0.561190
2000-01-09    0.772309
2000-01-10    0.266875
Freq: D, dtype: float64

In [65]: pd.rolling_mean(ser, window=5, freq='D', how='min')
Out[65]:
2000-01-01         NaN
2000-01-02         NaN
2000-01-03         NaN
2000-01-04         NaN
2000-01-05   -0.439707
2000-01-06   -0.431944
2000-01-07   -0.538390
2000-01-08   -0.296774
2000-01-09   -0.112988
2000-01-10   -0.425533
Freq: D, dtype: float64

@jreback
Copy link
Contributor Author

jreback commented Dec 15, 2015

I think my code was correct at the beginning.

In [4]: r.agg(['sum','mean'])
Out[4]: 
    A        B     
  sum mean sum mean
0 NaN  NaN NaN  NaN
1 NaN  NaN NaN  NaN
2   3    1   6    2
3   6    2  12    4
4   9    3  18    6

In [5]: r.agg({'A' : ['sum','mean']})
Out[5]: 
    A     
  sum mean
0 NaN  NaN
1 NaN  NaN
2   3    1
3   6    2
4   9    3

In [6]: r['A'].agg(['sum','mean'])
Out[6]: 
   sum  mean
0  NaN   NaN
1  NaN   NaN
2    3     1
3    6     2
4    9     3

In [7]: r['A'].agg({'s' : 'sum', 'm' : 'mean' })
Out[7]: 
    s   m
0 NaN NaN
1 NaN NaN
2   3   1
3   6   2
4   9   3

In [8]: r.agg({'A' : {'s' : 'sum', 'm' : 'mean' }})
Out[8]: 
    A    
    s   m
0 NaN NaN
1 NaN NaN
2   3   1
3   6   2
4   9   3

@jreback jreback force-pushed the rolling branch 4 times, most recently from 05d1385 to 2b81c88 Compare December 17, 2015 23:04
@jreback
Copy link
Contributor Author

jreback commented Dec 18, 2015

might be still some small loose ends, but any further comments @jorisvandenbossche @shoyer

jreback added a commit that referenced this pull request Dec 19, 2015
API: provide Rolling/Expanding/EWM objects for deferred rolling type calculations #10702
@jreback jreback merged commit 2a1d9f2 into pandas-dev:master Dec 19, 2015
@jreback
Copy link
Contributor Author

jreback commented Dec 19, 2015

bombs away!

@jreback
Copy link
Contributor Author

jreback commented Dec 19, 2015

@jorisvandenbossche http://pandas-docs.github.io/pandas-docs-travis/computation.html#stats-moments

does the :math not render on travis doc builds?

@max-sixty
Copy link
Contributor

👏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC: Dict of Dicts for renaming Groupby Aggregations