Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug on pivot_table with margins and dict aggfunc #8349

Closed
RenzoBertocchi opened this issue Sep 22, 2014 · 8 comments
Closed

Bug on pivot_table with margins and dict aggfunc #8349

RenzoBertocchi opened this issue Sep 22, 2014 · 8 comments
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@RenzoBertocchi
Copy link
Contributor

I think there is a bug on pandas 0.14.1 pivot_table using dictionary aggfunc and margins.
Test code:
df=pandas.DataFrame([
{'JOB':'Worker','NAME':'Bob' ,'YEAR':2013,'MONTH':12,'DAYS': 3,'SALARY': 17},
{'JOB':'Employ','NAME':'Mary','YEAR':2013,'MONTH':12,'DAYS': 5,'SALARY': 23},
{'JOB':'Worker','NAME':'Bob' ,'YEAR':2014,'MONTH': 1,'DAYS':10,'SALARY':100},
{'JOB':'Worker','NAME':'Bob' ,'YEAR':2014,'MONTH': 1,'DAYS':11,'SALARY':110},
{'JOB':'Employ','NAME':'Mary','YEAR':2014,'MONTH': 1,'DAYS':15,'SALARY':200},
{'JOB':'Worker','NAME':'Bob' ,'YEAR':2014,'MONTH': 2,'DAYS': 8,'SALARY': 80},
{'JOB':'Employ','NAME':'Mary','YEAR':2014,'MONTH': 2,'DAYS': 5,'SALARY':190}
])

df=df.set_index(['JOB','NAME','YEAR','MONTH'],drop=False,append=False)

df=df.pivot_table(index=['JOB','NAME'],columns=['YEAR','MONTH'],values=['DAYS','SALARY'],aggfunc={'DAYS':'mean','SALARY':'sum'})

All works fine but raise error using margins:

df=df.pivot_table(index=['JOB','NAME'],columns=['YEAR','MONTH'],values=['DAYS','SALARY'],aggfunc={'DAYS':'mean','SALARY':'sum'},margins=True)

df=df.pivot_table(index=['JOB','NAME'],columns=['YEAR','MONTH'],values=['DAYS','SALARY'],aggfunc={'DAYS':'mean','SALARY':'sum'},margins=True)
File "/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/util/decorators.py", line 60, in wrapper
return func(_args, *_kwargs)
File "/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/util/decorators.py", line 60, in wrapper
return func(_args, *_kwargs)
File "/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/tools/pivot.py", line 147, in pivot_table
cols=columns, aggfunc=aggfunc)
File "/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/tools/pivot.py", line 191, in _add_margins
row_margin[k] = grand_margin[k[0]]

KeyError: 'SALARY'

@jreback
Copy link
Contributor

jreback commented Sep 22, 2014

The issue is that your aggregation function applied different to different columns,

your example will work if you do this:

(Pdb) p df.pivot_table(index=['JOB','NAME'],columns=['YEAR','MONTH'],values=['DAYS','SALARY'],aggfunc=sum,margins='mean')
            DAYS              SALARY               
YEAR        2013 2014     All   2013 2014       All
MONTH         12    1   2         12    1    2     
JOB    NAME                                        
Employ Mary    5   15   5  25     23  200  190  413
Worker Bob     3   21   8  32     17  210   80  307
All            8   36  13  57     40  410  270  720
(Pdb) p df.pivot_table(index=['JOB','NAME'],columns=['YEAR','MONTH'],values=['DAYS','SALARY'],aggfunc='mean',margins=True)
            DAYS                      SALARY                             
YEAR        2013  2014            All   2013        2014              All
MONTH         12     1    2               12           1    2            
JOB    NAME                                                              
Employ Mary    5  15.0  5.0  8.333333     23  200.000000  190  137.666667
Worker Bob     3  10.5  8.0  8.000000     17  105.000000   80   76.750000
All            4  12.0  6.5  8.142857     20  136.666667  135  102.857143

So right now this just silently bombs on the aggregation function (as its a dict) for the grand margins and then raises a later error.

So the choices are:

  • raise an earlier error that you can't use a multi-column aggregator (e.g. a dict)
  • allow margins per column (e.g. the row 'All'), but NOT the 'All' on the columns as that doesn't make sense since you have sum on one column and mean on another.

So calling this an error-reportng issue

@jreback jreback added Error Reporting Incorrect or improved errors from pandas Good as first PR Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Sep 22, 2014
@jreback jreback added this to the 0.15.1 milestone Sep 22, 2014
@RenzoBertocchi
Copy link
Contributor Author

Ok but seems to works well if i change "pandas/tools/pivot.py"

form:


if isinstance(aggfunc, compat.string_types):
        grand_margin[k] = getattr(v, aggfunc)()
else: 
        grand_margin[k] = aggfunc(v)

to:


if isinstance(aggfunc, compat.string_types):
     grand_margin[k] = getattr(v, aggfunc)()
elif isinstance(aggfunc, compat.types.ObjectType):
     if isinstance(aggfunc[k], compat.string_types):
           grand_margin[k] = getattr(v, aggfunc[k])()
     else:
           grand_margin[k] = aggfunc[k](v)
else:
     grand_margin[k] = aggfunc(v)

Is it not a solution?

@jreback
Copy link
Contributor

jreback commented Sep 22, 2014

I think the grand margin is wrong no?

Why don't you show the output as you envision

@RenzoBertocchi
Copy link
Contributor Author

Seems good:

df=df.pivot_table(index=['JOB','NAME'],columns=['YEAR','MONTH'],values=['DAYS','SALARY'],aggfunc={'DAYS':'mean','SALARY':'sum'},margins=True)

SALARY DAYS
YEAR 2013 2014 All 2013 2014 All
MONTH 12 1 2 12 1 2
JOB NAME
Employ Mary 23 200 190 413 5 15.0 5.0 8.333333
Worker Bob 17 210 80 307 3 10.5 8.0 8.000000
All 40 410 270 720 4 12.0 6.5 8.142857

@jreback
Copy link
Contributor

jreback commented Sep 22, 2014

hmm, yeh that looks ok.

pls submit a pull-request (with tests!)

@jreback jreback added Bug and removed Error Reporting Incorrect or improved errors from pandas labels Sep 23, 2014
@RenzoBertocchi
Copy link
Contributor Author

Pull-request done

@jreback
Copy link
Contributor

jreback commented Sep 29, 2014

@RenzoBertocchi when the PR is merged this issue will be automatically closed.

@jreback jreback modified the milestones: 0.15.0, 0.15.1 Sep 29, 2014
@jreback
Copy link
Contributor

jreback commented Sep 30, 2014

closed by #8354

@jreback jreback closed this as completed Sep 30, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants