Skip to content

Commit

Permalink
ENH: add Series & DataFrame .agg/.aggregate to provide convienent
Browse files Browse the repository at this point in the history
function application that mimics the groupby(..).agg/.aggregate
interface

.apply is now a synonym for .agg, and will accept dict/list-likes
for aggregations

CLN: rename .name attr -> ._selection_name from SeriesGroupby for compat (didn't exist on DataFrameGroupBy)
resolves conflicts w.r.t. setting .name on a groupby object

closes pandas-dev#1623
closes pandas-dev#14464

custom .describe
closes pandas-dev#14483
closes pandas-dev#15015
closes pandas-dev#7014
  • Loading branch information
jreback committed Mar 22, 2017
1 parent 94720d9 commit 90930eb
Show file tree
Hide file tree
Showing 16 changed files with 1,000 additions and 76 deletions.
4 changes: 4 additions & 0 deletions doc/source/api.rst
Expand Up @@ -313,6 +313,8 @@ Function application, GroupBy & Window
:toctree: generated/

Series.apply
Series.aggregate
Series.transform
Series.map
Series.groupby
Series.rolling
Expand Down Expand Up @@ -830,6 +832,8 @@ Function application, GroupBy & Window

DataFrame.apply
DataFrame.applymap
DataFrame.aggregate
DataFrame.transform
DataFrame.groupby
DataFrame.rolling
DataFrame.expanding
Expand Down
242 changes: 234 additions & 8 deletions doc/source/basics.rst
Expand Up @@ -702,7 +702,8 @@ on an entire ``DataFrame`` or ``Series``, row- or column-wise, or elementwise.

1. `Tablewise Function Application`_: :meth:`~DataFrame.pipe`
2. `Row or Column-wise Function Application`_: :meth:`~DataFrame.apply`
3. Elementwise_ function application: :meth:`~DataFrame.applymap`
3. `Aggregation API`_: :meth:`~DataFrame.agg` and :meth:`~DataFrame.transform`
4. `Applying Elementwise Functions`_: :meth:`~DataFrame.applymap`

.. _basics.pipe:

Expand Down Expand Up @@ -778,6 +779,13 @@ statistics methods, take an optional ``axis`` argument:
df.apply(np.cumsum)
df.apply(np.exp)
``.apply()`` will also dispatch on a string method name.

.. ipython:: python
df.apply('mean')
df.apply('mean', axis=1)
Depending on the return type of the function passed to :meth:`~DataFrame.apply`,
the result will either be of lower dimension or the same dimension.

Expand Down Expand Up @@ -827,16 +835,234 @@ set to True, the passed function will instead receive an ndarray object, which
has positive performance implications if you do not need the indexing
functionality.

.. seealso::
.. _basics.aggregate:

Aggregation API
~~~~~~~~~~~~~~~

.. versionadded:: 0.20.0

The aggregation API allows one to express possibly multiple aggregation operations in a single concise way.
This API is similar across pandas objects, :ref:`groupby aggregates <groupby.aggregate>`,
:ref:`window functions <stats.aggregate>`, and the :ref:`resample API <timeseries.aggregate>`.

We will use a similar starting frame from above.

.. ipython:: python
tsdf = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'],
index=pd.date_range('1/1/2000', periods=10))
tsdf.iloc[3:7] = np.nan
tsdf
Using a single function is equivalent to ``.apply``; You can also pass named methods as strings.
This will return a Series of the output.

.. ipython:: python
tsdf.agg(np.sum)
tsdf.agg('sum')
# these are equivalent to a ``.sum()`` because we are aggregating on a single function
tsdf.sum()
On a Series this will result in a scalar value

.. ipython:: python
tsdf.A.agg('sum')
Aggregating multiple functions at once
++++++++++++++++++++++++++++++++++++++

You can pass arguments as a list. The results of each of the passed functions will be a row in the resultant DataFrame.
These are naturally named from the aggregation function.

.. ipython:: python
tsdf.agg(['sum'])
Multiple functions yield multiple rows.

The section on :ref:`GroupBy <groupby>` demonstrates related, flexible
functionality for grouping by some criterion, applying, and combining the
results into a Series, DataFrame, etc.
.. ipython:: python
tsdf.agg(['sum', 'mean'])
On a Series, multiple functions return a Series, indexed by the function names.

.. ipython:: python
tsdf.A.agg(['sum', 'mean'])
Aggregating with a dict of functions
++++++++++++++++++++++++++++++++++++

Passing a dictionary of column name to function or list of functions, to ``DataFame.agg``
allows you to customize which functions are applied to which columns.

.. ipython:: python
tsdf.agg({'A': 'mean', 'B': 'sum'})
Passing a list-like will generate a DataFrame output. You will get a matrix-like output
of all of the aggregators; some may be missing values.

.. ipython:: python
tsdf.agg({'A': ['mean', 'min'], 'B': 'sum'})
.. _Elementwise:
For a Series, you can pass a dict. You will get back a MultiIndex Series; The outer level will
be the keys, the inner the name of the functions.

.. ipython:: python
tsdf.A.agg({'foo' : ['sum', 'mean']})
Alternatively, using multiple dictionaries, you can have renamed elements with the aggregation

.. ipython:: python
tsdf.A.agg({'foo' : 'sum', 'bar': 'mean'})
Multiple keys will yield a MultiIndex Series. The outer level will be the keys, the inner
the names of the functions.

.. ipython:: python
tsdf.A.agg({'foo' : ['sum', 'mean'], 'bar': ['min', 'max', lambda x: x.sum()+1]})
.. _basics.aggregation.mixed_dtypes:

Mixed Dtypes
++++++++++++

When presented with mixed dtypes that cannot aggregate, ``.agg`` will only take the valid
aggregations. This is similiar to how groupby ``.agg`` works.

.. ipython:: python
Applying elementwise Python functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
mdf = pd.DataFrame({'A': [1, 2, 3],
'B': [1., 2., 3.],
'C': ['foo', 'bar', 'baz'],
'D': pd.date_range('20130101', periods=3)})
mdf.dtypes
.. ipython:: python
mdf.agg(['min', 'sum'])
.. _basics.aggregation.custom_describe:

Custom describe
+++++++++++++++

With ``.agg()`` is it possible to easily create a custom describe function, similar
to the built in :ref:`describe function <basics.describe>`.

.. ipython:: python
from functools import partial
q_25 = partial(pd.Series.quantile, q=0.25)
q_25.__name__ = '25%'
q_75 = partial(pd.Series.quantile, q=0.75)
q_75.__name__ = '75%'
tsdf.agg(['count', 'mean', 'std', 'min', q_25, 'median', q_75, 'max'])
.. _basics.transform:

Transform API
~~~~~~~~~~~~~

.. versionadded:: 0.20.0

The ``transform`` method returns an object that is indexed the same (same size)
as the original. This API allows you to provide *multiple* operations at the same
time rather than one-by-one. Its api is quite similar to the ``.agg`` API.

Use a similar frame to the above sections.

.. ipython:: python
tsdf = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'],
index=pd.date_range('1/1/2000', periods=10))
tsdf.iloc[3:7] = np.nan
tsdf
Transform the entire frame. Transform allows functions to input as a numpy function, string
function name and user defined function.

.. ipython:: python
tsdf.transform(np.abs)
tsdf.transform('abs')
tsdf.transform(lambda x: x.abs())
Since this is a single function, this is equivalent to a ufunc application

.. ipython:: python
np.abs(tsdf)
Passing a single function to ``.transform()`` with a Series will yield a single Series in return.

.. ipython:: python
tsdf.A.transform(np.abs)
Transform with multiple functions
+++++++++++++++++++++++++++++++++

Passing multiple functions will yield a column multi-indexed DataFrame.
The first level will be the original frame column names; the second level
will be the names of the transforming functions.

.. ipython:: python
tsdf.transform([np.abs, lambda x: x+1])
Passing multiple functions to a Series will yield a DataFrame. The
resulting column names will be the transforming functions.

.. ipython:: python
tsdf.A.transform([np.abs, lambda x: x+1])
Transforming with a dict of functions
+++++++++++++++++++++++++++++++++++++


Passing a dict of functions will will allow selective transforming per column.

.. ipython:: python
tsdf.transform({'A': np.abs, 'B': lambda x: x+1})
Passing a dict of lists will generate a multi-indexed DataFrame with these
selective transforms.

.. ipython:: python
tsdf.transform({'A': np.abs, 'B': [lambda x: x+1, 'sqrt']})
On a Series, passing a dict allows renaming as in ``.agg()``

.. ipython:: python
tsdf.A.transform({'foo': np.abs})
tsdf.A.transform({'foo': np.abs, 'bar': [lambda x: x+1, 'sqrt']})
.. _basics.elementwise:

Applying Elementwise Functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Since not all functions can be vectorized (accept NumPy arrays and return
another array or value), the methods :meth:`~DataFrame.applymap` on DataFrame
Expand Down
4 changes: 3 additions & 1 deletion doc/source/computation.rst
Expand Up @@ -565,7 +565,9 @@ Aggregation
-----------

Once the ``Rolling``, ``Expanding`` or ``EWM`` objects have been created, several methods are available to
perform multiple computations on the data. This is very similar to a ``.groupby(...).agg`` seen :ref:`here <groupby.aggregate>`.
perform multiple computations on the data. These operations are similar to the :ref:`aggregating API <basics.aggregate>`,
:ref:`groupby aggregates <groupby.aggregate>`, and :ref:`resample API <timeseries.aggregate>`.


.. ipython:: python
Expand Down
4 changes: 3 additions & 1 deletion doc/source/groupby.rst
Expand Up @@ -439,7 +439,9 @@ Aggregation
-----------

Once the GroupBy object has been created, several methods are available to
perform a computation on the grouped data.
perform a computation on the grouped data. These operations are similar to the
:ref:`aggregating API <basics.aggregate>`, :ref:`window functions <stats.aggregate>`,
and :ref:`resample API <timeseries.aggregate>`.

An obvious one is aggregation via the ``aggregate`` or equivalently ``agg`` method:

Expand Down
6 changes: 4 additions & 2 deletions doc/source/timeseries.rst
Expand Up @@ -1470,11 +1470,13 @@ We can instead only resample those groups where we have points as follows:
ts.groupby(partial(round, freq='3T')).sum()
.. _timeseries.aggregate:

Aggregation
~~~~~~~~~~~

Similar to :ref:`groupby aggregates <groupby.aggregate>` and the :ref:`window functions <stats.aggregate>`, a ``Resampler`` can be selectively
resampled.
Similar to the :ref:`aggregating API <basics.aggregate>`, :ref:`groupby aggregates <groupby.aggregate>`, and :ref:`window functions <stats.aggregate>`,
a ``Resampler`` can be selectively resampled.

Resampling a ``DataFrame``, the default will be to act on all columns with the same function.

Expand Down

0 comments on commit 90930eb

Please sign in to comment.