Skip to content

Commit

Permalink
ENH: add Series & DataFrame .agg/.aggregate to provide convienent
Browse files Browse the repository at this point in the history
function application that mimics the groupby(..).agg/.aggregate
interface

.apply is now a synonym for .agg, and will accept dict/list-likes
for aggregations

CLN: rename .name attr -> ._selection_name from SeriesGroupby for compat (didn't exist on DataFrameGroupBy)
resolves conflicts w.r.t. setting .name on a groupby object

closes pandas-dev#1623
closes pandas-dev#14464

custom .describe
closes pandas-dev#14483
closes pandas-dev#7014
  • Loading branch information
jreback committed Dec 26, 2016
1 parent 72786cc commit e58ead6
Show file tree
Hide file tree
Showing 15 changed files with 884 additions and 57 deletions.
4 changes: 4 additions & 0 deletions doc/source/api.rst
Expand Up @@ -306,6 +306,8 @@ Function application, GroupBy & Window
:toctree: generated/

Series.apply
Series.aggregate
Series.transform
Series.map
Series.groupby
Series.rolling
Expand Down Expand Up @@ -825,6 +827,8 @@ Function application, GroupBy & Window

DataFrame.apply
DataFrame.applymap
DataFrame.aggregate
DataFrame.transform
DataFrame.groupby
DataFrame.rolling
DataFrame.expanding
Expand Down
220 changes: 212 additions & 8 deletions doc/source/basics.rst
Expand Up @@ -702,7 +702,8 @@ on an entire ``DataFrame`` or ``Series``, row- or column-wise, or elementwise.

1. `Tablewise Function Application`_: :meth:`~DataFrame.pipe`
2. `Row or Column-wise Function Application`_: :meth:`~DataFrame.apply`
3. Elementwise_ function application: :meth:`~DataFrame.applymap`
3. `Aggregation API`_: :meth:`~DataFrame.agg` and :meth:`~DataFrame.transform`
4. `Applying Elementwise Functions`_: :meth:`~DataFrame.applymap`

.. _basics.pipe:

Expand Down Expand Up @@ -778,6 +779,13 @@ statistics methods, take an optional ``axis`` argument:
df.apply(np.cumsum)
df.apply(np.exp)
``.apply()`` will also dispatch on a string method name.

.. ipython:: python
df.apply('mean')
df.apply('mean', axis=1)
Depending on the return type of the function passed to :meth:`~DataFrame.apply`,
the result will either be of lower dimension or the same dimension.

Expand Down Expand Up @@ -827,16 +835,212 @@ set to True, the passed function will instead receive an ndarray object, which
has positive performance implications if you do not need the indexing
functionality.

.. seealso::
.. _basics.aggregate:

Aggregation API
~~~~~~~~~~~~~~~

.. versionadded:: 0.20.0

The aggregation API allows one to express possibly multiple aggregation operations in a single concise way.
This API is similar across pandas objects, :ref:`groupby aggregates <groupby.aggregate>`,
:ref:`window functions <stats.aggregate>`, and the :ref:`resample API <timeseries.aggregate>`.

We will use a similar starting frame from above.

.. ipython:: python
tsdf = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'],
index=pd.date_range('1/1/2000', periods=10))
tsdf.iloc[3:7] = np.nan
tsdf
Using a single function is equivalent to ``.apply``; You can also pass named methods as strings.
This will return a Series of the output.

.. ipython:: python
tsdf.agg(np.sum)
tsdf.agg('sum')
# these are equivalent to a ``.sum()`` because we are aggregating on a single function
tsdf.sum()
On a Series this will result in a scalar value

.. ipython:: python
tsdf.A.agg('sum')
Aggregating multiple functions at once
++++++++++++++++++++++++++++++++++++++

You can pass arguments as a list. The results of each of the passed functions will be a row in the resultant DataFrame.
These are naturally named from the aggregation function.

.. ipython:: python
tsdf.agg(['sum'])
Multiple functions yield multiple rows.

.. ipython:: python
tsdf.agg(['sum', 'mean'])
On a Series, multiple functions return a Series.

.. ipython:: python
tsdf.A.agg(['sum', 'mean'])
Aggregating with a dict of functions
++++++++++++++++++++++++++++++++++++

Passing a dictionary of column name to function or list of functions, to ``DataFame.agg``
allows you to customize which functions are applied to which columns.

.. ipython:: python
tsdf.agg({'A': 'mean', 'B': 'sum'})
Passing a list-like will generate a DataFrame output. You will get a matrix-like output
of all of the aggregators; some may be missing values.

.. ipython:: python
tsdf.agg({'A': ['mean', 'min'], 'B': 'sum'})
For a Series, you can pass a dict; the keys will set the name of the column

.. ipython:: python
tsdf.A.agg({'foo' : ['sum', 'mean']})
Alternatively, using multiple dictionaries, you can have renamed elements with the aggregation

.. ipython:: python
tsdf.A.agg({'foo' : 'sum', 'bar':'mean'})
Multiple keys will yield multiple columns.

.. ipython:: python
tsdf.A.agg({'foo' : ['sum', 'mean'], 'bar': ['min', 'max', lambda x: x.sum()+1]})
.. _basics.custom_describe:

Custom describe
+++++++++++++++

With ``.agg()`` is it possible to easily create a custom describe function, similar
to the built in :ref:`describe function <basics.describe>`.

.. ipython:: python
from functools import partial
q_25 = partial(pd.Series.quantile, q=0.25)
q_25.__name__ = '25%'
q_75 = partial(pd.Series.quantile, q=0.75)
q_75.__name__ = '75%'
tsdf.agg(['count', 'mean', 'std', 'min', q_25, 'median', q_75, 'max'])
.. _basics.transform:

Transform API
~~~~~~~~~~~~~

.. versionadded:: 0.20.0

The ``transform`` method returns an object that is indexed the same (same size)
as the original. This API allows you to provide *multiple* operations at the same
time rather than one-by-one. Its api is quite similar to the ``.agg`` API.

Use a similar frame to the above sections.

The section on :ref:`GroupBy <groupby>` demonstrates related, flexible
functionality for grouping by some criterion, applying, and combining the
results into a Series, DataFrame, etc.
.. ipython:: python
tsdf = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'],
index=pd.date_range('1/1/2000', periods=10))
tsdf.iloc[3:7] = np.nan
tsdf
Transform the entire frame. Transform allows functions to input as a numpy function, string
function name and user defined function.

.. ipython:: python
.. _Elementwise:
tsdf.transform(np.abs)
tsdf.transform('abs')
tsdf.transform(lambda x: x.abs())
Since this is a single function, this is equivalent to a ufunc application

.. ipython:: python
np.abs(tsdf)
Passing a single function to ``.transform()`` with a Series will yield a single Series in return.

.. ipython:: python
Applying elementwise Python functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
tsdf.A.transform(np.abs)
Transform with multiple functions
+++++++++++++++++++++++++++++++++

Passing multiple functions will yield a column multi-indexed DataFrame.
The first level will be the original frame column names; the second level
will be the names of the transforming functions.

.. ipython:: python
tsdf.transform([np.abs, lambda x: x+1])
Passing multiple functions to a Series will yield a DataFrame. The
resulting column names will be the transforming functions.

.. ipython:: python
tsdf.A.transform([np.abs, lambda x: x+1])
Transforming with a dict of functions
+++++++++++++++++++++++++++++++++++++


Passing a dict of functions will will allow selective transforming per column.

.. ipython:: python
tsdf.transform({'A': np.abs, 'B': lambda x: x+1})
Passing a dict of lists will generate a multi-indexed DataFrame with these
selective transforms.

.. ipython:: python
tsdf.transform({'A': np.abs, 'B': [lambda x: x+1, 'sqrt']})
On a Series, passing a dict allows renaming as in ``.agg()``

.. ipython:: python
tsdf.A.transform({'foo': np.abs})
tsdf.A.transform({'foo': np.abs, 'bar': [lambda x: x+1, 'sqrt']})
.. _basics.elementwise:

Applying Elementwise Functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Since not all functions can be vectorized (accept NumPy arrays and return
another array or value), the methods :meth:`~DataFrame.applymap` on DataFrame
Expand Down
4 changes: 3 additions & 1 deletion doc/source/computation.rst
Expand Up @@ -565,7 +565,9 @@ Aggregation
-----------

Once the ``Rolling``, ``Expanding`` or ``EWM`` objects have been created, several methods are available to
perform multiple computations on the data. This is very similar to a ``.groupby(...).agg`` seen :ref:`here <groupby.aggregate>`.
perform multiple computations on the data. These operations are similar to the :ref:`aggregating API <basics.aggregate>`,
:ref:`groupby aggregates <groupby.aggregate>`, and :ref:`resample API <timeseries.aggregate>`.


.. ipython:: python
Expand Down
4 changes: 3 additions & 1 deletion doc/source/groupby.rst
Expand Up @@ -439,7 +439,9 @@ Aggregation
-----------

Once the GroupBy object has been created, several methods are available to
perform a computation on the grouped data.
perform a computation on the grouped data. These operations are similar to the
:ref:`aggregating API <basics.aggregate>`, :ref:`window functions <stats.aggregate>`,
and :ref:`resample API <timeseries.aggregate>`.

An obvious one is aggregation via the ``aggregate`` or equivalently ``agg`` method:

Expand Down
6 changes: 4 additions & 2 deletions doc/source/timeseries.rst
Expand Up @@ -1470,11 +1470,13 @@ We can instead only resample those groups where we have points as follows:
ts.groupby(partial(round, freq='3T')).sum()
.. _timeseries.aggregate:

Aggregation
~~~~~~~~~~~

Similar to :ref:`groupby aggregates <groupby.aggregate>` and the :ref:`window functions <stats.aggregate>`, a ``Resampler`` can be selectively
resampled.
Similar to the :ref:`aggregating API <basics.aggregate>`, :ref:`groupby aggregates <groupby.aggregate>`, and :ref:`window functions <stats.aggregate>`,
a ``Resampler`` can be selectively resampled.

Resampling a ``DataFrame``, the default will be to act on all columns with the same function.

Expand Down
55 changes: 55 additions & 0 deletions doc/source/whatsnew/v0.20.0.txt
Expand Up @@ -9,6 +9,7 @@ users upgrade to this version.

Highlights include:

- new ``.agg()`` API for Series/DataFrame similar to the groupby-rolling-resample API's, see :ref:`here <whatsnew_0200.enhancements.agg>`
- Building pandas for development now requires ``cython >= 0.23`` (:issue:`14831`)

Check the :ref:`API Changes <whatsnew_0200.api_breaking>` and :ref:`deprecations <whatsnew_0200.deprecations>` before updating.
Expand All @@ -22,6 +23,60 @@ Check the :ref:`API Changes <whatsnew_0200.api_breaking>` and :ref:`deprecations
New features
~~~~~~~~~~~~

.. _whatsnew_0200.enhancements.agg:

``agg`` API
^^^^^^^^^^^

Series & DataFrame have been enhanced to support the aggregation API. This is an already familiar API that
is supported for groupby, windows operations, and resampling. This allows one to express, possibly multiple
aggregation operations in a single concise way by using ``.agg()`` and ``.transform()``. The
full documentation is :ref:`here <basics.aggregate>`` (:issue:`1623`)

Here is a sample

.. ipython:: python

df = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'],
index=pd.date_range('1/1/2000', periods=10))
df.iloc[3:7] = np.nan
df

One can operate using string function names, callables, lists, or dictionaries of these.

Using a single function is equivalent to ``.apply``.

.. ipython:: python

df.agg('sum')

Multiple functions in lists.

.. ipython:: python

df.agg(['sum', 'min'])

Dictionaries to provide the ability to selective calculation.

.. ipython:: python

df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']})

When operating on a Series, passing a dictionry allows one to rename multiple
function aggregates; this will return a DataFrame.

.. ipython:: python

df.A.agg({'foo':['sum', 'min'], 'bar' : ['count','max']})

The API also supports a ``.transform()`` function to provide for broadcasting results.

.. ipython:: python

df.transform(['abs', lambda x: x-x.min()])


.. _whatsnew_0200.enhancements.dtype:

.. _whatsnew_0200.enhancements.dataio_dtype:

Expand Down

0 comments on commit e58ead6

Please sign in to comment.