Test/fix case where use of "All" causes problems in pivot_table with margins=True #3335

Closed
wesm opened this Issue Apr 13, 2013 · 7 comments

Comments

Projects
None yet
4 participants
Owner

wesm commented Apr 13, 2013

No description provided.

@jreback jreback modified the milestone: 0.15.0, 0.14.0 Mar 11, 2014

@jreback jreback modified the milestone: 0.16.0 Jan 26, 2015

TheInan commented Mar 30, 2015

I'd like to take a stab at this if no one minds

Contributor

jreback commented Mar 30, 2015

go for it!

TheInan commented Apr 27, 2015

Would it be an appropriate solution to have any instances of "All" used as an index or column changed to "All_" in the original dataframe and the resulting pivot table so that having a user use "All" as a column name will not cause issues anymore?

Member

shoyer commented Apr 27, 2015

I'm not sure exactly what "problems" are caused by this issue, but generally speaking it's a better idea to raise an exception and force users to take some fallback action themselves rather than silently proceeding with some guess about user intent.

TheInan commented Apr 27, 2015

Alright then, I think I'll just scan the DataFrame for any instances of All being used as a value or column title, and if there is one, I'll raise an exception. The problem was that when margins was passed in as true, columns with the title "All" were added to the dataframe, and then add a later point, instances of "All" columns had aggregates data filled into them, so if the original data used "All" it was being replaced I think

Contributor

jreback commented Apr 28, 2015

need to define what is the incorrect behavior here first.

jreback added this to the Next Major Release milestone Apr 28, 2015

TheInan commented May 2, 2015

So this is what I have so far showing the issue. What happens is that when aggregating the data, as the script is going through the columns, it looks for All columns to aggregate data under. When "foo" was changed to "All" in the next DataFrame, the output is clearly nothing like that of the first pivot table. As a result, its probably most appropriate to raise an error when "All" is being used as a data point since it would be more trouble than its worth to rewrite the code around just this corner case.

import pandas as pd
df = pd.DataFrame({'A' : ['one', 'one', 'two', 'three'] * 3,
                    'B' : ['A', 'B', 'C'] * 4,
                    'C' : ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 2,
                    'D' : [1,2,3,4,5,6,7,8,9,10,11,12],
                    'E' : [1,2,3,4,5,6,7,8,9,10,11,12]})
df
A B C D E
0 one A foo 1 1
1 one B foo 2 2
2 two C foo 3 3
3 three A bar 4 4
4 one B bar 5 5
5 one C bar 6 6
6 two A foo 7 7
7 three B foo 8 8
8 one C foo 9 9
9 one A bar 10 10
10 two B bar 11 11
11 three C bar 12 12
pd.pivot_table(df, values= 'D', index=['A','B'], columns=['C'], margins=True)
C bar foo All
A B
one A 10 1 5.5
B 5 2 3.5
C 6 9 7.5
three A 4 NaN 4.0
B NaN 8 8.0
C 12 NaN 12.0
two A NaN 7 7.0
B 11 NaN 11.0
C NaN 3 3.0
All 8 5 6.5
df2 = pd.DataFrame({'A' : ['one', 'one', 'two', 'three'] * 3,
                    'B' : ['A', 'B', 'C'] * 4,
                    'C' : ['All', 'All', 'All', 'bar', 'bar', 'bar'] * 2,
                    'D' : [1,2,3,4,5,6,7,8,9,10,11,12],
                    'E' : [1,2,3,4,5,6,7,8,9,10,11,12]})


df2
A B C D E
0 one A All 1 1
1 one B All 2 2
2 two C All 3 3
3 three A bar 4 4
4 one B bar 5 5
5 one C bar 6 6
6 two A All 7 7
7 three B All 8 8
8 one C All 9 9
9 one A bar 10 10
10 two B bar 11 11
11 three C bar 12 12
pd.pivot_table(df2, values= 'D', index=['A','B'], columns=['C'], margins=True)
C All bar
A B
one A 5.5 10
B 3.5 5
C 7.5 6
three A 4.0 4
B 8.0 NaN
C 12.0 12
two A 7.0 NaN
B 11.0 11
C 3.0 NaN
All 6.0 8

@lexual lexual added a commit to lexual/pandas that referenced this issue Jun 6, 2015

@lexual lexual ENH: #3335 Pivot table support for setting name of margins column.
ref #3335.

Adds margin_column parameter to pivot_table so that user can set it to
something other than 'All'.
Raises ValueError exception if there is a conflict between the value of
margin_column and one of the other values appearing in the indices of
the pivot table.
618110c

@lexual lexual added a commit to lexual/pandas that referenced this issue Nov 15, 2015

@lexual lexual ENH: #3335 Pivot table support for setting name of margins column.
ref #3335.

Adds margin_name parameter to pivot_table so that user can set it to
something other than 'All'.
Raises ValueError exception if there is a conflict between the value of
margin_column and one of the other values appearing in the indices of
the pivot table.
1ca006c

@jreback jreback modified the milestone: 0.17.1, Next Major Release Nov 15, 2015

jreback closed this in #11581 Nov 15, 2015

@jreback jreback added a commit that referenced this issue Nov 15, 2015

@jreback jreback Merge pull request #11581 from lexual/issue_3335_pivot_handle_all_for…
…_margins

ENH: #3335 Pivot table support for setting name of margins column.
10fe47e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment