Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Dict of Dicts for renaming Groupby Aggregations #9052

Closed
TomAugspurger opened this issue Dec 10, 2014 · 12 comments · Fixed by #11603
Milestone

Comments

@TomAugspurger
Copy link
Contributor

@TomAugspurger TomAugspurger commented Dec 10, 2014

I didn't realize this was possible, and didn't see it in the docs.

df = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'a', 'b'], 'C': [3, 4, 5]})
df.groupby('B').agg({'A': {'mean1': 'mean', 'med1': 'median'}, 'C': {'mean2': 'mean', 'med2': 'median'}})
@TomAugspurger TomAugspurger added this to the 0.16.0 milestone Dec 10, 2014
@jreback

This comment has been minimized.

Copy link
Contributor

@jreback jreback commented Dec 10, 2014

xref is #8593 (which would replace / enhance this)

@aimboden

This comment has been minimized.

Copy link

@aimboden aimboden commented Dec 11, 2014

Thanks for the tip. Didn't realize this was possible either, this will save me from building my multicolumns "by hand".

@jreback are you planning any API change for 0.16.0 on this? #8593 does not seem to interfere with this behaviour, but maybe a deeper change is planned?

I'd rather not rely on this if it's not tested atm. Or would you accept a test for this?

@jreback

This comment has been minimized.

Copy link
Contributor

@jreback jreback commented Dec 11, 2014

@Gimli510 this IS implemented. Its basically the same as the following (except the name determination is slightly different).

In [5]: df.groupby('B').agg({'A': ['mean','median'], 'C': ['mean','median']})
Out[5]: 
     A           C       
  mean median mean median
B                        
a  1.5    1.5  3.5    3.5
b  3.0    3.0  5.0    5.0

I haven't carefully looked thru, but I suspect their is at least 1 tests. Though would for sure accept a PR which makes these tests more prominent (e.g. test_agg_api or something).

pd.Summary will enhance this API, the existing will remain.

@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@jreback jreback mentioned this issue May 11, 2015
1 of 3 tasks complete
@jreback

This comment has been minimized.

Copy link
Contributor

@jreback jreback commented Nov 12, 2015

from mailing list

In [2]: df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
   ...:                           'foo', 'bar', 'foo', 'foo'],
   ...:                    'B' : ['one', 'one', 'two', 'three',
   ...:                           'two', 'two', 'one', 'three'],
   ...:                    'C' : np.random.randn(8),
   ...:                    'D' : np.random.randn(8)})

In [3]: 

In [3]: grouped = df.groupby(['A', 'B'])

In [4]: grouped[['D','C']].agg({'r':np.sum, 'r2':np.mean})
Out[4]: 
                    r        r2
A   B                          
bar one   D -0.460078 -0.460078
          C  0.798220  0.798220
    three D  1.599986  1.599986
          C -0.554798 -0.554798
    two   D  0.124900  0.124900
          C  0.084758  0.084758
foo one   D -0.466082 -0.233041
          C -0.585512 -0.292756
    three D -0.184726 -0.184726
          C  0.130756  0.130756
    two   D -1.985586 -0.992793
          C  1.275138  0.637569

In [5]: grouped[['D','C']].agg({'r': { 'C' : np.sum }, 'r2' : { 'D' : np.mean }})
Out[5]: 
                    r        r2
                    C         D
A   B                          
bar one   D -0.460078 -0.460078
          C  0.798220  0.798220
    three D  1.599986  1.599986
          C -0.554798 -0.554798
    two   D  0.124900  0.124900
          C  0.084758  0.084758
foo one   D -0.466082 -0.233041
          C -0.585512 -0.292756
    three D -0.184726 -0.184726
          C  0.130756  0.130756
    two   D -1.985586 -0.992793
          C  1.275138  0.637569

In [6]: grouped[['D','C']].agg([np.sum, np.mean])
Out[6]: 
                  D                   C          
                sum      mean       sum      mean
A   B                                            
bar one   -0.460078 -0.460078  0.798220  0.798220
    three  1.599986  1.599986 -0.554798 -0.554798
    two    0.124900  0.124900  0.084758  0.084758
foo one   -0.466082 -0.233041 -0.585512 -0.292756
    three -0.184726 -0.184726  0.130756  0.130756
    two   -1.985586 -0.992793  1.275138  0.637569

with a trivial patch

diff --git a/pandas/core/groupby.py b/pandas/core/groupby.py
index add5080..b885b6f 100644
--- a/pandas/core/groupby.py
+++ b/pandas/core/groupby.py
@@ -2837,9 +2837,6 @@ class NDFrameGroupBy(GroupBy):
             keys = []
             if self._selection is not None:
                 subset = obj
-                if isinstance(subset, DataFrame):
-                    raise NotImplementedError("Aggregating on a DataFrame is "
-                                              "not supported")

                 for fname, agg_how in compat.iteritems(arg):
                     colg = SeriesGroupBy(subset, selection=self._selection,

of course need some tests......

jreback added a commit to jreback/pandas that referenced this issue Dec 19, 2015
@jreback

This comment has been minimized.

Copy link
Contributor

@jreback jreback commented Dec 19, 2015

acutally not closing this

@jreback jreback reopened this Dec 19, 2015
@jreback jreback mentioned this issue Jan 23, 2016
2 of 2 tasks complete
jreback added a commit to jreback/pandas that referenced this issue Feb 2, 2016
@jreback jreback closed this in 1dc49f5 Feb 2, 2016
@xflr6

This comment has been minimized.

Copy link
Contributor

@xflr6 xflr6 commented Feb 14, 2016

The following raises SpecificationError in 0.18.0, although there is no ambiguity (SeriesGroupby):

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar',
                         'foo', 'bar', 'foo', 'foo'],
                   'B': ['one', 'one', 'two', 'three',
                         'two', 'two', 'one', 'three'],
                   'C': np.random.randn(8),
                   'D': np.arange(8)})

grouped = df.groupby(['A', 'B'])

grouped['D'].agg({'D': np.sum, 'result2': np.mean})

Is this intended or a bug (I'd prefer to be able to reuse the series column name)?

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Feb 14, 2016

This should work (it is also a regression, as it worked before).
I think this should work because for a SeriesGroupBy, the dict keys can/should always be interpreted as new column names, and not to select existing columns names.

@jorisvandenbossche jorisvandenbossche modified the milestones: 0.18.0, Next Major Release Feb 14, 2016
jreback added a commit to jreback/pandas that referenced this issue Feb 15, 2016
@jreback

This comment has been minimized.

Copy link
Contributor

@jreback jreback commented Feb 15, 2016

@xflr6

this is fixed in #12329

In [3]: grouped['D'].agg({'D': np.sum, 'result2': np.mean})
Out[3]: 
           result2  D
A   B                
bar one          1  1
    three        3  3
    two          5  5
foo one          3  6
    three        7  7
    two          3  6

Note that this works as well, though maybe not as to the users intent (e.g. the C is exactly a label here, nothing to do with the actual aggregation columns.

In [4]: grouped['D'].agg({'D': np.sum, 'c': np.mean})
Out[4]: 
           C  D
A   B          
bar one    1  1
    three  3  3
    two    5  5
foo one    3  6
    three  7  7
    two    3  6
@arita37

This comment has been minimized.

Copy link

@arita37 arita37 commented Jan 28, 2017

To reference on complex groupby:
We have sometimes 2 dimensionnal data like
date, user_id, val1, val2, val3

and need to transform into 'groupby' :
user_id_, mycol1, mycol2,..

Usually, this is done by

for x in user_id_list : 
   dfi= df[ df.user_id= x] 
   user_dict[x]['mycol1']=  myfun(dfi)
   user_dict[x]['mycol2']=  myfun2(dfi)

Is there a way to this kind of complex and generic grouping in groupby pandas ?

@jreback

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.