Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
BUG: pivot_table with margins=True fails for categorical dtype, #10989 #10993
Conversation
jreback
commented on the diff
Sep 4, 2015
| @@ -159,6 +159,19 @@ def _add_margins(table, data, values, rows, cols, aggfunc): | ||
| grand_margin = _compute_grand_margin(data, values, aggfunc) | ||
| + # categorical index or columns will fail below when 'All' is added | ||
| + # here we'll convert all categorical indices to object | ||
| + def convert_categorical(ind): | ||
| + _convert = lambda ind: (ind.astype('object') | ||
| + if ind.dtype.name == 'category' else ind) |
jreback
Contributor
|
|
I should add one thing: I decided that rather than adding a new item to the category, it would be cleaner to simply change categories to objects rather than trying to track down all the corner cases of hierarchical indices with categorical levels. |
|
A much cleaner solution, IMO, would be to add a utility function that does something along the lines of "add a new entry to this index, even if it requires a new category". I imagine there are other places in the package where this sort of thing might happen, and such a utility could be used there as well. |
|
I think we should return a regular Index with object dtypes. What happens if the user has a |
|
@TomAugspurger – I checked – it turns out this is another bug in the current codebase, even if you're not using categories! If one of the index entries is called In [19]: data = pd.DataFrame({'x': np.arange(99),
'y': np.arange(99) // 50,
'z': np.arange(99) % 3})
In [20]: data.z = np.array(['Any', 'All', 'None'])[data.z]
In [21]: data.pivot_table('x', 'y', 'z')
Out[21]:
z All Any None
y
0 25.0 24.0 24.5
1 74.5 73.5 74.0
In [22]: data.pivot_table('x', 'y', 'z', margins=True)
Out[22]:
z All Any None
y
0 24.5 24.0 24.5
1 74.0 73.5 74.0
All 49.0 48.0 50.0 |
jreback
added Bug Reshaping Categorical
labels
Sep 5, 2015
jreback
changed the title from
BUG: quick fix for #10989 to BUG: pivot_table with margins=True fails for categorical dtype, #10989
Sep 10, 2015
|
@jakevdp can you update according to comments |
jreback
closed this
Oct 18, 2015
|
I think I've already addressed all comments above – any others you have in mind? Any reason you closed this without merging? |
|
I thought I put the comments before this needs a more general soln as it too much if/then on type determination |
jreback
reopened this
Oct 18, 2015
|
I guess I'm not entirely clear about what you're wanting as a "more general" solution. Any specific ideas? |
|
So there is something more going on here; this bug report is a sympton of a different issue. Namely,
So this is the same as the index in [10]
So I think that |
jreback
added the
MultiIndex
label
Oct 18, 2015
jreback
added this to the
0.17.1
milestone
Oct 18, 2015
|
I suppose we should close this PR then, and leave the issue open. Hacking into the internals of MultiIndex is well beyond my level of comfort with the library. |
|
haha. well, I'll take your tests in any event. So going to leave open for a bit. |
jreback
referenced
this pull request
Oct 19, 2015
Merged
BUG: pivot table bug with Categorical indexes, #10993 #11371
|
Sounds good – thanks! |
jreback
closed this
in #11371
Oct 20, 2015
jreback
added a commit
that referenced
this pull request
Oct 20, 2015
|
|
jreback |
db884d9
|
jakevdp commentedSep 4, 2015
This is a fix for the issue reported in #10989. I suspect this is an example of "fixing the symptom" rather than "fixing the problem", but I think it makes clear what the source of the problem is: to compute margins, the pivot table must add a row and/or column to the result. If the index or column is categorical, a new value cannot be added.
Let me know if you think there are better approaches to this.