Refactored Resample API breaking change

closes #11732 closes #12072 closes #9052 closes #12140 Author: Jeff Reback <jeff@reback.net> Closes #11841 from jreback/resample and squashes the following commits: b2056ca [Jeff Reback] DOC: clean up aggregations docs, removing from whatsnew b4dfbc5 [Jeff Reback] fix according to comments e243f18 [Jeff Reback] API: add doc examples for #9052 750556b [Jeff Reback] raise SpecificationError if we have an invalid aggregator c54ea69 [Jeff Reback] PEP updates 68428d6 [Jeff Reback] API: disallow renamed nested-dicts 83238ed [Jeff Reback] BUG: timedelta resample idempotency, #12072 e570570 [Jeff Reback] ENH: .resample API to groupby-like class, #11732
pandas-dev · Feb 2, 2016 · 1dc49f5 · 1dc49f5
1 parent 6a32f10
commit 1dc49f5
Show file tree

Hide file tree

Showing 23 changed files with 2,784 additions and 997 deletions.
diff --git a/doc/source/api.rst b/doc/source/api.rst
@@ -1729,6 +1729,65 @@ The following methods are available only for ``DataFrameGroupBy`` objects.
    DataFrameGroupBy.corrwith
    DataFrameGroupBy.boxplot
 
+Resampling
+----------
+.. currentmodule:: pandas.tseries.resample
+
+Resampler objects are returned by resample calls: :func:`pandas.DataFrame.resample`, :func:`pandas.Series.resample`.
+
+Indexing, iteration
+~~~~~~~~~~~~~~~~~~~
+.. autosummary::
+   :toctree: generated/
+
+   Resampler.__iter__
+   Resampler.groups
+   Resampler.indices
+   Resampler.get_group
+
+Function application
+~~~~~~~~~~~~~~~~~~~~
+.. autosummary::
+   :toctree: generated/
+
+   Resampler.apply
+   Resampler.aggregate
+   Resampler.transform
+
+Upsampling
+~~~~~~~~~~
+
+.. autosummary::
+   :toctree: generated/
+
+   Resampler.ffill
+   Resampler.backfill
+   Resampler.bfill
+   Resampler.pad
+   Resampler.fillna
+   Resampler.asfreq
+
+Computations / Descriptive Stats
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autosummary::
+   :toctree: generated/
+
+   Resampler.count
+   Resampler.nunique
+   Resampler.first
+   Resampler.last
+   Resampler.max
+   Resampler.mean
+   Resampler.median
+   Resampler.min
+   Resampler.ohlc
+   Resampler.prod
+   Resampler.size
+   Resampler.sem
+   Resampler.std
+   Resampler.sum
+   Resampler.var
+
 Style
 -----
 .. currentmodule:: pandas.core.style

diff --git a/doc/source/cookbook.rst b/doc/source/cookbook.rst
@@ -567,7 +567,7 @@ Unlike agg, apply's callable is passed a sub-DataFrame which gives you access to
       return pd.NaT
 
    mhc = {'Mean' : np.mean, 'Max' : np.max, 'Custom' : MyCust}
-   ts.resample("5min",how = mhc)
+   ts.resample("5min").apply(mhc)
    ts
 
 `Create a value counts column and reassign back to the DataFrame

diff --git a/doc/source/release.rst b/doc/source/release.rst
@@ -48,7 +48,12 @@ users upgrade to this version.
 
 Highlights include:
 
-See the :ref:`v0.17.0 Whatsnew <whatsnew_0180>` overview for an extensive list
+Highlights include:
+
+- Window functions are now methods on ``.groupby`` like objects, see :ref:`here <whatsnew_0180.enhancements.moments>`.
+- API breaking ``.resample`` changes to make it more ``.groupby`` like, see :ref:`here <whatsnew_0180.resample>`.
+
+See the :ref:`v0.18.0 Whatsnew <whatsnew_0180>` overview for an extensive list
 of all enhancements and bugs that have been fixed in 0.17.1.
 
 Thanks

diff --git a/doc/source/timedeltas.rst b/doc/source/timedeltas.rst
@@ -401,4 +401,4 @@ Similar to :ref:`timeseries resampling <timeseries.resampling>`, we can resample
 
 .. ipython:: python
 
-   s.resample('D')
+   s.resample('D').mean()
diff --git a/doc/source/timeseries.rst b/doc/source/timeseries.rst
@@ -68,7 +68,7 @@ Resample:
 .. ipython:: python
 
    # Daily means
-   ts.resample('D', how='mean')
+   ts.resample('D').mean()
 
 
 .. _timeseries.overview:
@@ -1211,6 +1211,11 @@ Converting to Python datetimes
 Resampling
 ----------
 
+.. warning::
+
+   The interface to ``.resample`` has changed in 0.18.0 to be more groupby-like and hence more flexible.
+   See the :ref:`whatsnew docs <whatsnew_0180.breaking.resample>` for a comparison with prior versions.
+
 Pandas has a simple, powerful, and efficient functionality for
 performing resampling operations during frequency conversion (e.g., converting
 secondly data into 5-minutely data). This is extremely common in, but not
@@ -1226,7 +1231,7 @@ See some :ref:`cookbook examples <cookbook.resample>` for some advanced strategi
 
    ts = Series(randint(0, 500, len(rng)), index=rng)
 
-   ts.resample('5Min', how='sum')
+   ts.resample('5Min').sum()
 
 The ``resample`` function is very flexible and allows you to specify many
 different parameters to control the frequency conversion and resampling
@@ -1237,11 +1242,11 @@ an array and produces aggregated values:
 
 .. ipython:: python
 
-   ts.resample('5Min') # default is mean
+   ts.resample('5Min').mean()
 
-   ts.resample('5Min', how='ohlc')
+   ts.resample('5Min').ohlc()
 
-   ts.resample('5Min', how=np.max)
+   ts.resample('5Min').max()
 
 Any function available via :ref:`dispatching <groupby.dispatch>` can be given to
 the ``how`` parameter by name, including ``sum``, ``mean``, ``std``, ``sem``,
@@ -1252,9 +1257,9 @@ end of the interval is closed:
 
 .. ipython:: python
 
-   ts.resample('5Min', closed='right')
+   ts.resample('5Min', closed='right').mean()
 
-   ts.resample('5Min', closed='left')
+   ts.resample('5Min', closed='left').mean()
 
 Parameters like ``label`` and ``loffset`` are used to manipulate the resulting
 labels. ``label`` specifies whether the result is labeled with the beginning or
@@ -1263,11 +1268,11 @@ labels.
 
 .. ipython:: python
 
-   ts.resample('5Min') # by default label='right'
+   ts.resample('5Min').mean() # by default label='right'
 
-   ts.resample('5Min', label='left')
+   ts.resample('5Min', label='left').mean()
 
-   ts.resample('5Min', label='left', loffset='1s')
+   ts.resample('5Min', label='left', loffset='1s').mean()
 
 The ``axis`` parameter can be set to 0 or 1 and allows you to resample the
 specified axis for a DataFrame.
@@ -1284,18 +1289,17 @@ frequency periods.
 Up Sampling
 ~~~~~~~~~~~
 
-For upsampling, the ``fill_method`` and ``limit`` parameters can be specified
-to interpolate over the gaps that are created:
+For upsampling, you can specify a way to upsample and the ``limit`` parameter to interpolate over the gaps that are created:
 
 .. ipython:: python
 
    # from secondly to every 250 milliseconds
 
-   ts[:2].resample('250L')
+   ts[:2].resample('250L').asfreq()
 
-   ts[:2].resample('250L', fill_method='pad')
+   ts[:2].resample('250L').ffill()
 
-   ts[:2].resample('250L', fill_method='pad', limit=2)
+   ts[:2].resample('250L').ffill(limit=2)
 
 Sparse Resampling
 ~~~~~~~~~~~~~~~~~
@@ -1317,7 +1321,7 @@ If we want to resample to the full range of the series
 
 .. ipython:: python
 
-    ts.resample('3T',how='sum')
+    ts.resample('3T').sum()
 
 We can instead only resample those groups where we have points as follows:
 
@@ -1333,6 +1337,74 @@ We can instead only resample those groups where we have points as follows:
 
     ts.groupby(partial(round, freq='3T')).sum()
 
+Aggregation
+~~~~~~~~~~~
+
+Similar to :ref:`groupby aggregates <groupby.aggregate>` and the :ref:`window functions <stats.aggregate>`, a ``Resampler`` can be selectively
+resampled.
+
+Resampling a ``DataFrame``, the default will be to act on all columns with the same function.
+
+.. ipython:: python
+
+   df = pd.DataFrame(np.random.randn(1000, 3),
+                     index=pd.date_range('1/1/2012', freq='S', periods=1000),
+                     columns=['A', 'B', 'C'])
+   r = df.resample('3T')
+   r.mean()
+
+We can select a specific column or columns using standard getitem.
+
+.. ipython:: python
+
+   r['A'].mean()
+
+   r[['A','B']].mean()
+
+You can pass a list or dict of functions to do aggregation with, outputting a DataFrame:
+
+.. ipython:: python
+
+   r['A'].agg([np.sum, np.mean, np.std])
+
+If a dict is passed, the keys will be used to name the columns. Otherwise the
+function's name (stored in the function object) will be used.
+
+.. ipython:: python
+
+   r['A'].agg({'result1' : np.sum,
+               'result2' : np.mean})
+
+On a resampled DataFrame, you can pass a list of functions to apply to each
+column, which produces an aggregated result with a hierarchical index:
+
+.. ipython:: python
+
+   r.agg([np.sum, np.mean])
+
+By passing a dict to ``aggregate`` you can apply a different aggregation to the
+columns of a DataFrame:
+
+.. ipython:: python
+   :okexcept:
+
+   r.agg({'A' : np.sum,
+          'B' : lambda x: np.std(x, ddof=1)})
+
+The function names can also be strings. In order for a string to be valid it
+must be implemented on the Resampled object
+
+.. ipython:: python
+
+   r.agg({'A' : 'sum', 'B' : 'std'})
+
+Furthermore, you can also specify multiple aggregation functions for each column separately.
+
+.. ipython:: python
+
+   r.agg({'A' : ['sum','std'], 'B' : ['mean','std'] })
+
+
 .. _timeseries.periods:
 
 Time Span Representation

diff --git a/doc/source/whatsnew/v0.10.0.txt b/doc/source/whatsnew/v0.10.0.txt
@@ -70,16 +70,59 @@ nfrequencies are unaffected. The prior defaults were causing a great deal of
 confusion for users, especially resampling data to daily frequency (which
 labeled the aggregated group with the end of the interval: the next day).
 
-Note:
-
-.. ipython:: python
-
-    dates = pd.date_range('1/1/2000', '1/5/2000', freq='4h')
-    series = Series(np.arange(len(dates)), index=dates)
-    series
-    series.resample('D', how='sum')
-    # old behavior
-    series.resample('D', how='sum', closed='right', label='right')
+.. code-block:: python
+
+   In [1]: dates = pd.date_range('1/1/2000', '1/5/2000', freq='4h')
+
+   In [2]: series = Series(np.arange(len(dates)), index=dates)
+
+   In [3]: series
+   Out[3]:
+   2000-01-01 00:00:00     0
+   2000-01-01 04:00:00     1
+   2000-01-01 08:00:00     2
+   2000-01-01 12:00:00     3
+   2000-01-01 16:00:00     4
+   2000-01-01 20:00:00     5
+   2000-01-02 00:00:00     6
+   2000-01-02 04:00:00     7
+   2000-01-02 08:00:00     8
+   2000-01-02 12:00:00     9
+   2000-01-02 16:00:00    10
+   2000-01-02 20:00:00    11
+   2000-01-03 00:00:00    12
+   2000-01-03 04:00:00    13
+   2000-01-03 08:00:00    14
+   2000-01-03 12:00:00    15
+   2000-01-03 16:00:00    16
+   2000-01-03 20:00:00    17
+   2000-01-04 00:00:00    18
+   2000-01-04 04:00:00    19
+   2000-01-04 08:00:00    20
+   2000-01-04 12:00:00    21
+   2000-01-04 16:00:00    22
+   2000-01-04 20:00:00    23
+   2000-01-05 00:00:00    24
+   Freq: 4H, dtype: int64
+
+   In [4]: series.resample('D', how='sum')
+   Out[4]:
+   2000-01-01     15
+   2000-01-02     51
+   2000-01-03     87
+   2000-01-04    123
+   2000-01-05     24
+   Freq: D, dtype: int64
+
+   In [5]: # old behavior
+   In [6]: series.resample('D', how='sum', closed='right', label='right')
+   Out[6]:
+   2000-01-01      0
+   2000-01-02     21
+   2000-01-03     57
+   2000-01-04     93
+   2000-01-05    129
+   Freq: D, dtype: int64
 
 - Infinity and negative infinity are no longer treated as NA by ``isnull`` and
   ``notnull``. That they ever were was a relic of early pandas. This behavior
@@ -354,4 +397,3 @@ Adding experimental support for Panel4D and factory functions to create n-dimens
 See the :ref:`full release notes
 <release>` or issue tracker
 on GitHub for a complete list.
-