Skip to content

Commit

Permalink
Refactored Resample API breaking change
Browse files Browse the repository at this point in the history
closes #11732
closes #12072
closes #9052
closes #12140

Author: Jeff Reback <jeff@reback.net>

Closes #11841 from jreback/resample and squashes the following commits:

b2056ca [Jeff Reback] DOC: clean up aggregations docs, removing from whatsnew
b4dfbc5 [Jeff Reback] fix according to comments
e243f18 [Jeff Reback] API: add doc examples for #9052
750556b [Jeff Reback] raise SpecificationError if we have an invalid aggregator
c54ea69 [Jeff Reback] PEP updates
68428d6 [Jeff Reback] API: disallow renamed nested-dicts
83238ed [Jeff Reback] BUG: timedelta resample idempotency, #12072
e570570 [Jeff Reback] ENH: .resample API to groupby-like class, #11732
  • Loading branch information
jreback committed Feb 2, 2016
1 parent 6a32f10 commit 1dc49f5
Show file tree
Hide file tree
Showing 23 changed files with 2,784 additions and 997 deletions.
59 changes: 59 additions & 0 deletions doc/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1729,6 +1729,65 @@ The following methods are available only for ``DataFrameGroupBy`` objects.
DataFrameGroupBy.corrwith
DataFrameGroupBy.boxplot

Resampling
----------
.. currentmodule:: pandas.tseries.resample

Resampler objects are returned by resample calls: :func:`pandas.DataFrame.resample`, :func:`pandas.Series.resample`.

Indexing, iteration
~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: generated/

Resampler.__iter__
Resampler.groups
Resampler.indices
Resampler.get_group

Function application
~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: generated/

Resampler.apply
Resampler.aggregate
Resampler.transform

Upsampling
~~~~~~~~~~

.. autosummary::
:toctree: generated/

Resampler.ffill
Resampler.backfill
Resampler.bfill
Resampler.pad
Resampler.fillna
Resampler.asfreq

Computations / Descriptive Stats
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: generated/

Resampler.count
Resampler.nunique
Resampler.first
Resampler.last
Resampler.max
Resampler.mean
Resampler.median
Resampler.min
Resampler.ohlc
Resampler.prod
Resampler.size
Resampler.sem
Resampler.std
Resampler.sum
Resampler.var

Style
-----
.. currentmodule:: pandas.core.style
Expand Down
2 changes: 1 addition & 1 deletion doc/source/cookbook.rst
Original file line number Diff line number Diff line change
Expand Up @@ -567,7 +567,7 @@ Unlike agg, apply's callable is passed a sub-DataFrame which gives you access to
return pd.NaT
mhc = {'Mean' : np.mean, 'Max' : np.max, 'Custom' : MyCust}
ts.resample("5min",how = mhc)
ts.resample("5min").apply(mhc)
ts
`Create a value counts column and reassign back to the DataFrame
Expand Down
7 changes: 6 additions & 1 deletion doc/source/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,12 @@ users upgrade to this version.

Highlights include:

See the :ref:`v0.17.0 Whatsnew <whatsnew_0180>` overview for an extensive list
Highlights include:

- Window functions are now methods on ``.groupby`` like objects, see :ref:`here <whatsnew_0180.enhancements.moments>`.
- API breaking ``.resample`` changes to make it more ``.groupby`` like, see :ref:`here <whatsnew_0180.resample>`.

See the :ref:`v0.18.0 Whatsnew <whatsnew_0180>` overview for an extensive list
of all enhancements and bugs that have been fixed in 0.17.1.

Thanks
Expand Down
2 changes: 1 addition & 1 deletion doc/source/timedeltas.rst
Original file line number Diff line number Diff line change
Expand Up @@ -401,4 +401,4 @@ Similar to :ref:`timeseries resampling <timeseries.resampling>`, we can resample

.. ipython:: python
s.resample('D')
s.resample('D').mean()
104 changes: 88 additions & 16 deletions doc/source/timeseries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ Resample:
.. ipython:: python
# Daily means
ts.resample('D', how='mean')
ts.resample('D').mean()
.. _timeseries.overview:
Expand Down Expand Up @@ -1211,6 +1211,11 @@ Converting to Python datetimes
Resampling
----------

.. warning::

The interface to ``.resample`` has changed in 0.18.0 to be more groupby-like and hence more flexible.
See the :ref:`whatsnew docs <whatsnew_0180.breaking.resample>` for a comparison with prior versions.

Pandas has a simple, powerful, and efficient functionality for
performing resampling operations during frequency conversion (e.g., converting
secondly data into 5-minutely data). This is extremely common in, but not
Expand All @@ -1226,7 +1231,7 @@ See some :ref:`cookbook examples <cookbook.resample>` for some advanced strategi
ts = Series(randint(0, 500, len(rng)), index=rng)
ts.resample('5Min', how='sum')
ts.resample('5Min').sum()
The ``resample`` function is very flexible and allows you to specify many
different parameters to control the frequency conversion and resampling
Expand All @@ -1237,11 +1242,11 @@ an array and produces aggregated values:

.. ipython:: python
ts.resample('5Min') # default is mean
ts.resample('5Min').mean()
ts.resample('5Min', how='ohlc')
ts.resample('5Min').ohlc()
ts.resample('5Min', how=np.max)
ts.resample('5Min').max()
Any function available via :ref:`dispatching <groupby.dispatch>` can be given to
the ``how`` parameter by name, including ``sum``, ``mean``, ``std``, ``sem``,
Expand All @@ -1252,9 +1257,9 @@ end of the interval is closed:

.. ipython:: python
ts.resample('5Min', closed='right')
ts.resample('5Min', closed='right').mean()
ts.resample('5Min', closed='left')
ts.resample('5Min', closed='left').mean()
Parameters like ``label`` and ``loffset`` are used to manipulate the resulting
labels. ``label`` specifies whether the result is labeled with the beginning or
Expand All @@ -1263,11 +1268,11 @@ labels.

.. ipython:: python
ts.resample('5Min') # by default label='right'
ts.resample('5Min').mean() # by default label='right'
ts.resample('5Min', label='left')
ts.resample('5Min', label='left').mean()
ts.resample('5Min', label='left', loffset='1s')
ts.resample('5Min', label='left', loffset='1s').mean()
The ``axis`` parameter can be set to 0 or 1 and allows you to resample the
specified axis for a DataFrame.
Expand All @@ -1284,18 +1289,17 @@ frequency periods.
Up Sampling
~~~~~~~~~~~

For upsampling, the ``fill_method`` and ``limit`` parameters can be specified
to interpolate over the gaps that are created:
For upsampling, you can specify a way to upsample and the ``limit`` parameter to interpolate over the gaps that are created:

.. ipython:: python
# from secondly to every 250 milliseconds
ts[:2].resample('250L')
ts[:2].resample('250L').asfreq()
ts[:2].resample('250L', fill_method='pad')
ts[:2].resample('250L').ffill()
ts[:2].resample('250L', fill_method='pad', limit=2)
ts[:2].resample('250L').ffill(limit=2)
Sparse Resampling
~~~~~~~~~~~~~~~~~
Expand All @@ -1317,7 +1321,7 @@ If we want to resample to the full range of the series

.. ipython:: python
ts.resample('3T',how='sum')
ts.resample('3T').sum()
We can instead only resample those groups where we have points as follows:

Expand All @@ -1333,6 +1337,74 @@ We can instead only resample those groups where we have points as follows:
ts.groupby(partial(round, freq='3T')).sum()
Aggregation
~~~~~~~~~~~

Similar to :ref:`groupby aggregates <groupby.aggregate>` and the :ref:`window functions <stats.aggregate>`, a ``Resampler`` can be selectively
resampled.

Resampling a ``DataFrame``, the default will be to act on all columns with the same function.

.. ipython:: python
df = pd.DataFrame(np.random.randn(1000, 3),
index=pd.date_range('1/1/2012', freq='S', periods=1000),
columns=['A', 'B', 'C'])
r = df.resample('3T')
r.mean()
We can select a specific column or columns using standard getitem.

.. ipython:: python
r['A'].mean()
r[['A','B']].mean()
You can pass a list or dict of functions to do aggregation with, outputting a DataFrame:

.. ipython:: python
r['A'].agg([np.sum, np.mean, np.std])
If a dict is passed, the keys will be used to name the columns. Otherwise the
function's name (stored in the function object) will be used.

.. ipython:: python
r['A'].agg({'result1' : np.sum,
'result2' : np.mean})
On a resampled DataFrame, you can pass a list of functions to apply to each
column, which produces an aggregated result with a hierarchical index:

.. ipython:: python
r.agg([np.sum, np.mean])
By passing a dict to ``aggregate`` you can apply a different aggregation to the
columns of a DataFrame:

.. ipython:: python
:okexcept:
r.agg({'A' : np.sum,
'B' : lambda x: np.std(x, ddof=1)})
The function names can also be strings. In order for a string to be valid it
must be implemented on the Resampled object

.. ipython:: python
r.agg({'A' : 'sum', 'B' : 'std'})
Furthermore, you can also specify multiple aggregation functions for each column separately.

.. ipython:: python
r.agg({'A' : ['sum','std'], 'B' : ['mean','std'] })
.. _timeseries.periods:

Time Span Representation
Expand Down
64 changes: 53 additions & 11 deletions doc/source/whatsnew/v0.10.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -70,16 +70,59 @@ nfrequencies are unaffected. The prior defaults were causing a great deal of
confusion for users, especially resampling data to daily frequency (which
labeled the aggregated group with the end of the interval: the next day).

Note:

.. ipython:: python

dates = pd.date_range('1/1/2000', '1/5/2000', freq='4h')
series = Series(np.arange(len(dates)), index=dates)
series
series.resample('D', how='sum')
# old behavior
series.resample('D', how='sum', closed='right', label='right')
.. code-block:: python

In [1]: dates = pd.date_range('1/1/2000', '1/5/2000', freq='4h')

In [2]: series = Series(np.arange(len(dates)), index=dates)

In [3]: series
Out[3]:
2000-01-01 00:00:00 0
2000-01-01 04:00:00 1
2000-01-01 08:00:00 2
2000-01-01 12:00:00 3
2000-01-01 16:00:00 4
2000-01-01 20:00:00 5
2000-01-02 00:00:00 6
2000-01-02 04:00:00 7
2000-01-02 08:00:00 8
2000-01-02 12:00:00 9
2000-01-02 16:00:00 10
2000-01-02 20:00:00 11
2000-01-03 00:00:00 12
2000-01-03 04:00:00 13
2000-01-03 08:00:00 14
2000-01-03 12:00:00 15
2000-01-03 16:00:00 16
2000-01-03 20:00:00 17
2000-01-04 00:00:00 18
2000-01-04 04:00:00 19
2000-01-04 08:00:00 20
2000-01-04 12:00:00 21
2000-01-04 16:00:00 22
2000-01-04 20:00:00 23
2000-01-05 00:00:00 24
Freq: 4H, dtype: int64

In [4]: series.resample('D', how='sum')
Out[4]:
2000-01-01 15
2000-01-02 51
2000-01-03 87
2000-01-04 123
2000-01-05 24
Freq: D, dtype: int64

In [5]: # old behavior
In [6]: series.resample('D', how='sum', closed='right', label='right')
Out[6]:
2000-01-01 0
2000-01-02 21
2000-01-03 57
2000-01-04 93
2000-01-05 129
Freq: D, dtype: int64

- Infinity and negative infinity are no longer treated as NA by ``isnull`` and
``notnull``. That they ever were was a relic of early pandas. This behavior
Expand Down Expand Up @@ -354,4 +397,3 @@ Adding experimental support for Panel4D and factory functions to create n-dimens
See the :ref:`full release notes
<release>` or issue tracker
on GitHub for a complete list.

Loading

0 comments on commit 1dc49f5

Please sign in to comment.