DOC: Dict of Dicts for renaming Groupby Aggregations #9052

TomAugspurger · 2014-12-10T19:47:53Z

I didn't realize this was possible, and didn't see it in the docs.

df = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'a', 'b'], 'C': [3, 4, 5]})
df.groupby('B').agg({'A': {'mean1': 'mean', 'med1': 'median'}, 'C': {'mean2': 'mean', 'med2': 'median'}})

The text was updated successfully, but these errors were encountered:

jreback · 2014-12-10T22:06:25Z

xref is #8593 (which would replace / enhance this)

aimboden · 2014-12-11T06:59:42Z

Thanks for the tip. Didn't realize this was possible either, this will save me from building my multicolumns "by hand".

@jreback are you planning any API change for 0.16.0 on this? #8593 does not seem to interfere with this behaviour, but maybe a deeper change is planned?

I'd rather not rely on this if it's not tested atm. Or would you accept a test for this?

jreback · 2014-12-11T11:11:35Z

@Gimli510 this IS implemented. Its basically the same as the following (except the name determination is slightly different).

In [5]: df.groupby('B').agg({'A': ['mean','median'], 'C': ['mean','median']})
Out[5]: 
     A           C       
  mean median mean median
B                        
a  1.5    1.5  3.5    3.5
b  3.0    3.0  5.0    5.0

I haven't carefully looked thru, but I suspect their is at least 1 tests. Though would for sure accept a PR which makes these tests more prominent (e.g. test_agg_api or something).

pd.Summary will enhance this API, the existing will remain.

jreback · 2015-11-12T23:34:17Z

from mailing list

In [2]: df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
   ...:                           'foo', 'bar', 'foo', 'foo'],
   ...:                    'B' : ['one', 'one', 'two', 'three',
   ...:                           'two', 'two', 'one', 'three'],
   ...:                    'C' : np.random.randn(8),
   ...:                    'D' : np.random.randn(8)})

In [3]: 

In [3]: grouped = df.groupby(['A', 'B'])

In [4]: grouped[['D','C']].agg({'r':np.sum, 'r2':np.mean})
Out[4]: 
                    r        r2
A   B                          
bar one   D -0.460078 -0.460078
          C  0.798220  0.798220
    three D  1.599986  1.599986
          C -0.554798 -0.554798
    two   D  0.124900  0.124900
          C  0.084758  0.084758
foo one   D -0.466082 -0.233041
          C -0.585512 -0.292756
    three D -0.184726 -0.184726
          C  0.130756  0.130756
    two   D -1.985586 -0.992793
          C  1.275138  0.637569

In [5]: grouped[['D','C']].agg({'r': { 'C' : np.sum }, 'r2' : { 'D' : np.mean }})
Out[5]: 
                    r        r2
                    C         D
A   B                          
bar one   D -0.460078 -0.460078
          C  0.798220  0.798220
    three D  1.599986  1.599986
          C -0.554798 -0.554798
    two   D  0.124900  0.124900
          C  0.084758  0.084758
foo one   D -0.466082 -0.233041
          C -0.585512 -0.292756
    three D -0.184726 -0.184726
          C  0.130756  0.130756
    two   D -1.985586 -0.992793
          C  1.275138  0.637569

In [6]: grouped[['D','C']].agg([np.sum, np.mean])
Out[6]: 
                  D                   C          
                sum      mean       sum      mean
A   B                                            
bar one   -0.460078 -0.460078  0.798220  0.798220
    three  1.599986  1.599986 -0.554798 -0.554798
    two    0.124900  0.124900  0.084758  0.084758
foo one   -0.466082 -0.233041 -0.585512 -0.292756
    three -0.184726 -0.184726  0.130756  0.130756
    two   -1.985586 -0.992793  1.275138  0.637569

with a trivial patch

diff --git a/pandas/core/groupby.py b/pandas/core/groupby.py
index add5080..b885b6f 100644
--- a/pandas/core/groupby.py
+++ b/pandas/core/groupby.py
@@ -2837,9 +2837,6 @@ class NDFrameGroupBy(GroupBy):
             keys = []
             if self._selection is not None:
                 subset = obj
-                if isinstance(subset, DataFrame):
-                    raise NotImplementedError("Aggregating on a DataFrame is "
-                                              "not supported")

                 for fname, agg_how in compat.iteritems(arg):
                     colg = SeriesGroupBy(subset, selection=self._selection,

of course need some tests......

jreback · 2015-12-19T13:52:34Z

acutally not closing this

xflr6 · 2016-02-14T23:15:42Z

The following raises SpecificationError in 0.18.0, although there is no ambiguity (SeriesGroupby):

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar',
                         'foo', 'bar', 'foo', 'foo'],
                   'B': ['one', 'one', 'two', 'three',
                         'two', 'two', 'one', 'three'],
                   'C': np.random.randn(8),
                   'D': np.arange(8)})

grouped = df.groupby(['A', 'B'])

grouped['D'].agg({'D': np.sum, 'result2': np.mean})

Is this intended or a bug (I'd prefer to be able to reuse the series column name)?

jorisvandenbossche · 2016-02-14T23:25:09Z

This should work (it is also a regression, as it worked before).
I think this should work because for a SeriesGroupBy, the dict keys can/should always be interpreted as new column names, and not to select existing columns names.

closes pandas-dev#9052

jreback · 2016-02-15T15:22:50Z

@xflr6

this is fixed in #12329

In [3]: grouped['D'].agg({'D': np.sum, 'result2': np.mean})
Out[3]: 
           result2  D
A   B                
bar one          1  1
    three        3  3
    two          5  5
foo one          3  6
    three        7  7
    two          3  6

Note that this works as well, though maybe not as to the users intent (e.g. the C is exactly a label here, nothing to do with the actual aggregation columns.

In [4]: grouped['D'].agg({'D': np.sum, 'c': np.mean})
Out[4]: 
           C  D
A   B          
bar one    1  1
    three  3  3
    two    5  5
foo one    3  6
    three  7  7
    two    3  6

arita37 · 2017-01-28T14:50:11Z

To reference on complex groupby:
We have sometimes 2 dimensionnal data like
date, user_id, val1, val2, val3

and need to transform into 'groupby' :
user_id_, mycol1, mycol2,..

Usually, this is done by

for x in user_id_list : 
   dfi= df[ df.user_id= x] 
   user_dict[x]['mycol1']=  myfun(dfi)
   user_dict[x]['mycol2']=  myfun2(dfi)

Is there a way to this kind of complex and generic grouping in groupby pandas ?

jreback · 2017-01-28T15:02:37Z

http://pandas.pydata.org/pandas-docs/stable/groupby.html#aggregation

TomAugspurger added Groupby API Design labels Dec 10, 2014

TomAugspurger added this to the 0.16.0 milestone Dec 10, 2014

jreback mentioned this issue Mar 4, 2015

ENH: groupby aggregate with multi-level columns #9585

Open

jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015

jreback mentioned this issue May 11, 2015

API: specification of functions in .agg #8593

Closed

3 tasks

jreback added Difficulty Novice labels Nov 12, 2015

jreback mentioned this issue Nov 21, 2015

API: provide Rolling/Expanding/EWM objects for deferred rolling type calculations #10702 #11603

Merged

5 tasks

jreback closed this as completed in #11603 Dec 19, 2015

jreback reopened this Dec 19, 2015

jreback mentioned this issue Jan 23, 2016

Refactored Resample API breaking change #11841

Closed

2 tasks

jreback added a commit to jreback/pandas that referenced this issue Feb 2, 2016

API: add doc examples for pandas-dev#9052

e243f18

jreback closed this as completed in 1dc49f5 Feb 2, 2016

jorisvandenbossche reopened this Feb 14, 2016

jorisvandenbossche modified the milestones: 0.18.0, Next Major Release Feb 14, 2016

jreback added a commit to jreback/pandas that referenced this issue Feb 15, 2016

BUG: addtl fix for compat summary of groupby/resample with dicts

66c23aa

closes pandas-dev#9052

jreback mentioned this issue Feb 15, 2016

BUG: addtl fix for compat summary of groupby/resample with dicts #12329

Closed

jreback closed this as completed in cac5f8b Feb 15, 2016

jorisvandenbossche mentioned this issue Oct 14, 2016

cryptic DataFrame.agg error when using dictionaries #14421

Closed

jorisvandenbossche mentioned this issue Jan 23, 2017

ENH: add Series & DataFrame .agg/.aggregate #14668

Merged

4 tasks

rhshadrach mentioned this issue Jan 12, 2023

DEPR: SeriesGroupBy.agg with dict argument #50684

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: Dict of Dicts for renaming Groupby Aggregations #9052

DOC: Dict of Dicts for renaming Groupby Aggregations #9052

TomAugspurger commented Dec 10, 2014

jreback commented Dec 10, 2014

aimboden commented Dec 11, 2014

jreback commented Dec 11, 2014

jreback commented Nov 12, 2015

jreback commented Dec 19, 2015

xflr6 commented Feb 14, 2016

jorisvandenbossche commented Feb 14, 2016

jreback commented Feb 15, 2016

arita37 commented Jan 28, 2017

jreback commented Jan 28, 2017

DOC: Dict of Dicts for renaming Groupby Aggregations #9052

DOC: Dict of Dicts for renaming Groupby Aggregations #9052

Comments

TomAugspurger commented Dec 10, 2014

jreback commented Dec 10, 2014

aimboden commented Dec 11, 2014

jreback commented Dec 11, 2014

jreback commented Nov 12, 2015

jreback commented Dec 19, 2015

xflr6 commented Feb 14, 2016

jorisvandenbossche commented Feb 14, 2016

jreback commented Feb 15, 2016

arita37 commented Jan 28, 2017

jreback commented Jan 28, 2017