Skip to content

Commit

Permalink
Added example for grouping by combination of index level and column
Browse files Browse the repository at this point in the history
  • Loading branch information
Jon M. Mease committed Nov 5, 2016
1 parent 8281982 commit da7b406
Show file tree
Hide file tree
Showing 2 changed files with 77 additions and 15 deletions.
79 changes: 64 additions & 15 deletions doc/source/groupby.rst
Original file line number Diff line number Diff line change
Expand Up @@ -105,9 +105,9 @@ consider the following DataFrame:
.. versionadded:: 0.20

A string passed to ``groupby`` may refer to either a column or an index level.
If a string matches both a column and an index level then a warning is issued
and the column takes precedence. This will result in an ambiguity error in a
future version.
If a string matches both a column name and an index level name then a warning is
issued and the column takes precedence. This will result in an ambiguity error
in a future version.

.. ipython:: python
Expand Down Expand Up @@ -247,17 +247,6 @@ the length of the ``groups`` dict, so it is largely just a convenience:
gb.aggregate gb.count gb.cumprod gb.dtype gb.first gb.groups gb.hist gb.max gb.min gb.nth gb.prod gb.resample gb.sum gb.var
gb.apply gb.cummax gb.cumsum gb.fillna gb.gender gb.head gb.indices gb.mean gb.name gb.ohlc gb.quantile gb.size gb.tail gb.weight


.. ipython:: python
:suppress:
df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
'foo', 'bar', 'foo', 'foo'],
'B' : ['one', 'one', 'two', 'three',
'two', 'two', 'one', 'three'],
'C' : np.random.randn(8),
'D' : np.random.randn(8)})
.. _groupby.multiindex:

GroupBy with MultiIndex
Expand Down Expand Up @@ -299,7 +288,9 @@ chosen level:
s.sum(level='second')
Also as of v0.6, grouping with multiple levels is supported.
.. versionadded:: 0.6

Grouping with multiple levels is supported.

.. ipython:: python
:suppress:
Expand All @@ -316,15 +307,73 @@ Also as of v0.6, grouping with multiple levels is supported.
s
s.groupby(level=['first', 'second']).sum()
.. versionadded:: 0.20

Index level names may be supplied as keys.

.. ipython:: python
s.groupby(['first', 'second']).sum()
More on the ``sum`` function and aggregation later.

Grouping DataFrame with Index Levels and Columns
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A DataFrame may be grouped by a combination of columns and index levels by
specifying the column names as strings and the index levels as ``pd.Grouper``
objects.

.. ipython:: python
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
index = pd.MultiIndex.from_arrays(arrays, names=['first', 'second'])
df = pd.DataFrame({'A': [1, 1, 1, 1, 2, 2, 3, 3],
'B': np.arange(8)},
index=index)
df
The following example groups ``df`` by the ``second`` index level and
the ``A`` column.

.. ipython:: python
df.groupby([pd.Grouper(level=1), 'A']).sum()
Index levels may also be specified by name.

.. ipython:: python
df.groupby([pd.Grouper(level='second'), 'A']).sum()
.. versionadded:: 0.20

Index level names may be specified as keys directly to ``groupby``.

.. ipython:: python
df.groupby(['second', 'A']).sum()
DataFrame column selection in GroupBy
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Once you have created the GroupBy object from a DataFrame, for example, you
might want to do something different for each of the columns. Thus, using
``[]`` similar to getting a column from a DataFrame, you can do:

.. ipython:: python
:suppress:
df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
'foo', 'bar', 'foo', 'foo'],
'B' : ['one', 'one', 'two', 'three',
'two', 'two', 'one', 'three'],
'C' : np.random.randn(8),
'D' : np.random.randn(8)})
.. ipython:: python
grouped = df.groupby(['A'])
Expand Down
13 changes: 13 additions & 0 deletions doc/source/whatsnew/v0.20.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,19 @@ Other enhancements
^^^^^^^^^^^^^^^^^^
- Strings passed to ``DataFrame.groupby()`` as the ``by`` parameter may now reference either column names or index level names (:issue:`5677`)

.. ipython:: python

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]

index = pd.MultiIndex.from_arrays(arrays, names=['first', 'second'])

df = pd.DataFrame({'A': [1, 1, 1, 1, 2, 2, 3, 3],
'B': np.arange(8)},
index=index)

df.groupby(['second', 'A']).sum()


.. _whatsnew_0200.api_breaking:

Expand Down

0 comments on commit da7b406

Please sign in to comment.