Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test/fix case where use of "All" causes problems in pivot_table with margins=True #3335

Closed
wesm opened this issue Apr 13, 2013 · 7 comments · Fixed by #11581
Closed

Test/fix case where use of "All" causes problems in pivot_table with margins=True #3335

wesm opened this issue Apr 13, 2013 · 7 comments · Fixed by #11581
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Testing pandas testing functions or related to the test suite
Milestone

Comments

@wesm
Copy link
Member

wesm commented Apr 13, 2013

No description provided.

@jreback jreback modified the milestones: 0.15.0, 0.14.0 Mar 11, 2014
@jreback jreback modified the milestone: 0.16.0 Jan 26, 2015
@TheInan
Copy link

TheInan commented Mar 30, 2015

I'd like to take a stab at this if no one minds

@jreback
Copy link
Contributor

jreback commented Mar 30, 2015

go for it!

@TheInan
Copy link

TheInan commented Apr 27, 2015

Would it be an appropriate solution to have any instances of "All" used as an index or column changed to "All_" in the original dataframe and the resulting pivot table so that having a user use "All" as a column name will not cause issues anymore?

@shoyer
Copy link
Member

shoyer commented Apr 27, 2015

I'm not sure exactly what "problems" are caused by this issue, but generally speaking it's a better idea to raise an exception and force users to take some fallback action themselves rather than silently proceeding with some guess about user intent.

@TheInan
Copy link

TheInan commented Apr 27, 2015

Alright then, I think I'll just scan the DataFrame for any instances of All being used as a value or column title, and if there is one, I'll raise an exception. The problem was that when margins was passed in as true, columns with the title "All" were added to the dataframe, and then add a later point, instances of "All" columns had aggregates data filled into them, so if the original data used "All" it was being replaced I think

@jreback
Copy link
Contributor

jreback commented Apr 28, 2015

need to define what is the incorrect behavior here first.

@jreback jreback added this to the Next Major Release milestone Apr 28, 2015
@TheInan
Copy link

TheInan commented May 2, 2015

So this is what I have so far showing the issue. What happens is that when aggregating the data, as the script is going through the columns, it looks for All columns to aggregate data under. When "foo" was changed to "All" in the next DataFrame, the output is clearly nothing like that of the first pivot table. As a result, its probably most appropriate to raise an error when "All" is being used as a data point since it would be more trouble than its worth to rewrite the code around just this corner case.

import pandas as pd
df = pd.DataFrame({'A' : ['one', 'one', 'two', 'three'] * 3,
                    'B' : ['A', 'B', 'C'] * 4,
                    'C' : ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 2,
                    'D' : [1,2,3,4,5,6,7,8,9,10,11,12],
                    'E' : [1,2,3,4,5,6,7,8,9,10,11,12]})
df
A B C D E
0 one A foo 1 1
1 one B foo 2 2
2 two C foo 3 3
3 three A bar 4 4
4 one B bar 5 5
5 one C bar 6 6
6 two A foo 7 7
7 three B foo 8 8
8 one C foo 9 9
9 one A bar 10 10
10 two B bar 11 11
11 three C bar 12 12
pd.pivot_table(df, values= 'D', index=['A','B'], columns=['C'], margins=True)
C bar foo All
A B
one A 10 1 5.5
B 5 2 3.5
C 6 9 7.5
three A 4 NaN 4.0
B NaN 8 8.0
C 12 NaN 12.0
two A NaN 7 7.0
B 11 NaN 11.0
C NaN 3 3.0
All 8 5 6.5
df2 = pd.DataFrame({'A' : ['one', 'one', 'two', 'three'] * 3,
                    'B' : ['A', 'B', 'C'] * 4,
                    'C' : ['All', 'All', 'All', 'bar', 'bar', 'bar'] * 2,
                    'D' : [1,2,3,4,5,6,7,8,9,10,11,12],
                    'E' : [1,2,3,4,5,6,7,8,9,10,11,12]})


df2
A B C D E
0 one A All 1 1
1 one B All 2 2
2 two C All 3 3
3 three A bar 4 4
4 one B bar 5 5
5 one C bar 6 6
6 two A All 7 7
7 three B All 8 8
8 one C All 9 9
9 one A bar 10 10
10 two B bar 11 11
11 three C bar 12 12
pd.pivot_table(df2, values= 'D', index=['A','B'], columns=['C'], margins=True)
C All bar
A B
one A 5.5 10
B 3.5 5
C 7.5 6
three A 4.0 4
B 8.0 NaN
C 12.0 12
two A 7.0 NaN
B 11.0 11
C 3.0 NaN
All 6.0 8

lexual added a commit to lexual/pandas that referenced this issue Jun 6, 2015
…column.

ref pandas-dev#3335.

Adds margin_column parameter to pivot_table so that user can set it to
something other than 'All'.
Raises ValueError exception if there is a conflict between the value of
margin_column and one of the other values appearing in the indices of
the pivot table.
lexual added a commit to lexual/pandas that referenced this issue Nov 15, 2015
…column.

ref pandas-dev#3335.

Adds margin_name parameter to pivot_table so that user can set it to
something other than 'All'.
Raises ValueError exception if there is a conflict between the value of
margin_column and one of the other values appearing in the indices of
the pivot table.
@jreback jreback modified the milestones: 0.17.1, Next Major Release Nov 15, 2015
jreback added a commit that referenced this issue Nov 15, 2015
…_margins

ENH: #3335 Pivot table support for setting name of margins column.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Testing pandas testing functions or related to the test suite
Projects
None yet
4 participants