Pivot table margins brittleness #3334

wesm · 2013-04-13T00:52:37Z

I'm trying to find the most efficient way to tabulate responses to a survey using pandas. Here's a sample survey:
df = DataFrame({'Response' : ['Y', 'N' ,'N', 'Y', 'Y', 'N'],
                'Type' : ['A', 'A', 'B', 'B', 'B', 'C']})
I want to count the number of responses of each value (Y or N), with the different types across the top.

A simple way to do this is

df.groupby('Type')['Response'].value_counts().unstack(level=0)

Type     A      B        C
N       1       1       1
Y       1       2       NaN

This is almost what I want, except I'd like row and column totals as well. Of course, I could calculate these manually, but that seems ugly. What I'd really like to do is create a pivot table with margins=True, but this isn't quite as easy as I thought it would be.

Again without totals, I can use

    pivot_table(df, rows='Response',cols='Type',aggfunc=len)

This produces almost the same output as above, though now the index has a name:

Type            A       B       C
Response
N                       1       1        1
Y                       1       2       NaN

So far so good, but adding margins=True results in an error.

To get exactly what I want, the following works, but it seems there should be a nicer way:

df['Count'] = 1
pivot_table(df, rows='Response',cols='Type',values='Count',aggfunc=len,margins=True)     # Can also use sum

I have three questions:

Why does adding 'margins=True' in reindex_like function #2 above result in an error, while it works in Binary operations on int DataMatrix #3? I'm not sure I understand what's happening here.

If this behavior is intentional, is there a better way to get the output I'm looking for, without introducing an extra, unwanted column?

How do I remove the name of the index from a dataframe (to get the output from reindex_like function #2 to look the same as in Enable element-wise comparison operations in DataMatrix objects #1)?

The text was updated successfully, but these errors were encountered:

guyrt · 2013-07-03T02:18:21Z

I've traced this bug to the type of index we have on the columns of the pivoted table. The pivoted table is first constructed here:

https://github.com/pydata/pandas/blob/master/pandas/tools/pivot.py#L100

If the rows and columns account for the entire frame, then the index is a single level index. Otherwise, it's multilevel with nlevels=2. This difference causes problems later when we group by column:

https://github.com/pydata/pandas/blob/master/pandas/tools/pivot.py#L156

Rather than grouping by level 0 of a multi-index, we are grouping by the only level in a single index. This picks up a particular value in a column rather than the level 0 value in the index, which was the name of a column.

I'll try to push a fix tomorrow or the next day pending time to work on it. We may need to handle this case specially.

guyrt · 2013-07-03T02:56:35Z

I guess the most obvious question is whether a suitable fix is to append an extra column of ones if rows + cols is the entire DataFrame. Thoughts?

…_table Adds support for margin computation when all columns are used in rows and cols.

Adds support for margin computation when all columns are used in rows and cols

…_table Adds support for margin computation when all columns are used in rows and cols.

Adds support for margin computation when all columns are used in rows and cols. DOC: Fixed release notes BUG: Fixed failing test due to use of basestring in python 3

jreback · 2013-08-12T13:21:57Z

closed via #4432

guyrt mentioned this issue Jul 25, 2013

BUG: Fix brittle pivot margins #4358

Closed

guyrt added a commit to guyrt/pandas that referenced this issue Jul 25, 2013

BUG: Fixes issue pandas-dev#3334: brittle margin computation in pivot…

ad976bb

…_table Adds support for margin computation when all columns are used in rows and cols.

guyrt added a commit to guyrt/pandas that referenced this issue Aug 1, 2013

BUG: Fixes issue pandas-dev#3334: brittle margin in pivot_table.

1feaf7b

Adds support for margin computation when all columns are used in rows and cols

guyrt mentioned this issue Aug 1, 2013

BUG: Fixes issue #3334: brittle margin in pivot_table. #4432

Closed

guyrt added a commit to guyrt/pandas that referenced this issue Aug 1, 2013

BUG: Fixes issue pandas-dev#3334: brittle margin computation in pivot…

beb7c97

…_table Adds support for margin computation when all columns are used in rows and cols.

guyrt added a commit to guyrt/pandas that referenced this issue Aug 1, 2013

BUG: Fixes issue pandas-dev#3334: brittle margin computation in pivot…

50790c8

…_table Adds support for margin computation when all columns are used in rows and cols.

jreback closed this as completed Aug 12, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pivot table margins brittleness #3334

Pivot table margins brittleness #3334

wesm commented Apr 13, 2013

guyrt commented Jul 3, 2013

guyrt commented Jul 3, 2013

jreback commented Aug 12, 2013

Pivot table margins brittleness #3334

Pivot table margins brittleness #3334

Comments

wesm commented Apr 13, 2013

guyrt commented Jul 3, 2013

guyrt commented Jul 3, 2013

jreback commented Aug 12, 2013