Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
groupby on multiple columns does not preserve (categorical) dtype #13743
Comments
martijnvermaat
commented
Jul 21, 2016
|
I thought I'd quickly workaround it by converting the resulting In [6]: df.groupby(['a', 'b']).sum().reset_index().assign(
...: a=lambda df: df.a.astype('category', categories=list('xyz')),
...: b=lambda df: df.b.astype('category', categories=list('xyz'))
...: ).set_index(['a', 'b']).reset_index().dtypes
Out[6]:
a object
b object
c float64
dtype: objectSo I guess my bug report is now for |
sinhrks
added Groupby MultiIndex Categorical
labels
Jul 22, 2016
I still have it in mind and will submit a fix soon. |
pijucha
referenced
this issue
Jul 31, 2016
Closed
BUG: Preserve categorical dtypes in MultiIndex levels (#13743) #13854
jreback
added this to the
0.19.0
milestone
Aug 1, 2016
jreback
added the
Bug
label
Aug 1, 2016
pijucha
referenced
this issue
Aug 16, 2016
Open
BUG: unstack doesn't preserve categorical dtype #14018
pijucha
added a commit
to pijucha/pandas
that referenced
this issue
Sep 2, 2016
|
|
pijucha |
99e4a52
|
jreback
closed this
in d26363b
Sep 2, 2016
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
martijnvermaat commentedJul 21, 2016
When doing a
groupbyon more than one column, the resultingMultiIndexdoes not seem to preserve the original column dtypes. I noticed it when working withCategoricalcolumns, expectingCategoricalIndexwhen grouping on them, but this is only the case when grouping on just one column.I did see that the behaviour was discussed in a PR, but it ultimately was not addressed.
Code Sample, a copy-pastable example if possible
Expected Output
output of
pd.show_versions()