Bug on pivot_table with margins and dict aggfunc #8349

Closed
RenzoBertocchi opened this Issue Sep 22, 2014 · 8 comments

Comments

Projects
None yet
2 participants
Contributor

RenzoBertocchi commented Sep 22, 2014

I think there is a bug on pandas 0.14.1 pivot_table using dictionary aggfunc and margins.
Test code:
df=pandas.DataFrame([
{'JOB':'Worker','NAME':'Bob' ,'YEAR':2013,'MONTH':12,'DAYS': 3,'SALARY': 17},
{'JOB':'Employ','NAME':'Mary','YEAR':2013,'MONTH':12,'DAYS': 5,'SALARY': 23},
{'JOB':'Worker','NAME':'Bob' ,'YEAR':2014,'MONTH': 1,'DAYS':10,'SALARY':100},
{'JOB':'Worker','NAME':'Bob' ,'YEAR':2014,'MONTH': 1,'DAYS':11,'SALARY':110},
{'JOB':'Employ','NAME':'Mary','YEAR':2014,'MONTH': 1,'DAYS':15,'SALARY':200},
{'JOB':'Worker','NAME':'Bob' ,'YEAR':2014,'MONTH': 2,'DAYS': 8,'SALARY': 80},
{'JOB':'Employ','NAME':'Mary','YEAR':2014,'MONTH': 2,'DAYS': 5,'SALARY':190}
])

df=df.set_index(['JOB','NAME','YEAR','MONTH'],drop=False,append=False)

df=df.pivot_table(index=['JOB','NAME'],columns=['YEAR','MONTH'],values=['DAYS','SALARY'],aggfunc={'DAYS':'mean','SALARY':'sum'})

All works fine but raise error using margins:

df=df.pivot_table(index=['JOB','NAME'],columns=['YEAR','MONTH'],values=['DAYS','SALARY'],aggfunc={'DAYS':'mean','SALARY':'sum'},margins=True)

df=df.pivot_table(index=['JOB','NAME'],columns=['YEAR','MONTH'],values=['DAYS','SALARY'],aggfunc={'DAYS':'mean','SALARY':'sum'},margins=True)
File "/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/util/decorators.py", line 60, in wrapper
return func(_args, *_kwargs)
File "/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/util/decorators.py", line 60, in wrapper
return func(_args, *_kwargs)
File "/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/tools/pivot.py", line 147, in pivot_table
cols=columns, aggfunc=aggfunc)
File "/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/tools/pivot.py", line 191, in _add_margins
row_margin[k] = grand_margin[k[0]]

KeyError: 'SALARY'

Contributor

jreback commented Sep 22, 2014

The issue is that your aggregation function applied different to different columns,

your example will work if you do this:

(Pdb) p df.pivot_table(index=['JOB','NAME'],columns=['YEAR','MONTH'],values=['DAYS','SALARY'],aggfunc=sum,margins='mean')
            DAYS              SALARY               
YEAR        2013 2014     All   2013 2014       All
MONTH         12    1   2         12    1    2     
JOB    NAME                                        
Employ Mary    5   15   5  25     23  200  190  413
Worker Bob     3   21   8  32     17  210   80  307
All            8   36  13  57     40  410  270  720
(Pdb) p df.pivot_table(index=['JOB','NAME'],columns=['YEAR','MONTH'],values=['DAYS','SALARY'],aggfunc='mean',margins=True)
            DAYS                      SALARY                             
YEAR        2013  2014            All   2013        2014              All
MONTH         12     1    2               12           1    2            
JOB    NAME                                                              
Employ Mary    5  15.0  5.0  8.333333     23  200.000000  190  137.666667
Worker Bob     3  10.5  8.0  8.000000     17  105.000000   80   76.750000
All            4  12.0  6.5  8.142857     20  136.666667  135  102.857143

So right now this just silently bombs on the aggregation function (as its a dict) for the grand margins and then raises a later error.

So the choices are:

  • raise an earlier error that you can't use a multi-column aggregator (e.g. a dict)
  • allow margins per column (e.g. the row 'All'), but NOT the 'All' on the columns as that doesn't make sense since you have sum on one column and mean on another.

So calling this an error-reportng issue

jreback added this to the 0.15.1 milestone Sep 22, 2014

Contributor

RenzoBertocchi commented Sep 22, 2014

Ok but seems to works well if i change "pandas/tools/pivot.py"

form:


if isinstance(aggfunc, compat.string_types):
        grand_margin[k] = getattr(v, aggfunc)()
else: 
        grand_margin[k] = aggfunc(v)

to:


if isinstance(aggfunc, compat.string_types):
     grand_margin[k] = getattr(v, aggfunc)()
elif isinstance(aggfunc, compat.types.ObjectType):
     if isinstance(aggfunc[k], compat.string_types):
           grand_margin[k] = getattr(v, aggfunc[k])()
     else:
           grand_margin[k] = aggfunc[k](v)
else:
     grand_margin[k] = aggfunc(v)

Is it not a solution?

Contributor

jreback commented Sep 22, 2014

I think the grand margin is wrong no?

Why don't you show the output as you envision

Contributor

RenzoBertocchi commented Sep 22, 2014

Seems good:

df=df.pivot_table(index=['JOB','NAME'],columns=['YEAR','MONTH'],values=['DAYS','SALARY'],aggfunc={'DAYS':'mean','SALARY':'sum'},margins=True)

SALARY DAYS
YEAR 2013 2014 All 2013 2014 All
MONTH 12 1 2 12 1 2
JOB NAME
Employ Mary 23 200 190 413 5 15.0 5.0 8.333333
Worker Bob 17 210 80 307 3 10.5 8.0 8.000000
All 40 410 270 720 4 12.0 6.5 8.142857
Contributor

jreback commented Sep 22, 2014

hmm, yeh that looks ok.

pls submit a pull-request (with tests!)

@jreback jreback added Bug and removed Error Reporting labels Sep 23, 2014

Contributor

RenzoBertocchi commented Sep 29, 2014

Pull-request done

jreback reopened this Sep 29, 2014

Contributor

jreback commented Sep 29, 2014

@RenzoBertocchi when the PR is merged this issue will be automatically closed.

@jreback jreback modified the milestone: 0.15.0, 0.15.1 Sep 29, 2014

Contributor

jreback commented Sep 30, 2014

closed by #8354

jreback closed this Sep 30, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment