Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Unstack with mixed dtypes coerces everything to object #11847
Comments
|
pls show a copy-pastable example |
jreback
added Reshaping Dtypes
labels
Dec 15, 2015
potash
commented
Dec 15, 2015
|
Ah, looking for an example helped me narrow down the bug. It is specific to passing a list of levels to unstack, even when that list only has a single entry. E.g. compare:
So a workaround in my case with multiple levels is to replace |
|
so looks like what you want is: pydata#9023 which is almost finished. in fact if you are looking for something to do...could use some updating :) |
|
i'll mark this as a bug, which may be independent. want to see if you can put in a fix with the existing framework? |
jreback
added this to the
Next Major Release
milestone
Dec 15, 2015
jreback
added the
Bug
label
Dec 15, 2015
jreback
referenced
this issue
Dec 15, 2015
Closed
ENH/API: DataFrame.stack() support for level=None, sequentially=True/False, and NaN level values. #9023
potash
commented
Dec 15, 2015
|
Thanks! I will try but I do not use pandas from master and I've never played with the source so it won't be quick. |
jreback
added Difficulty Intermediate Effort Low
labels
Dec 16, 2015
jreback
referenced
this issue
Aug 11, 2016
Open
BUG: unstack(fill_value) does nothing when unstacking multiple columns #13971
|
Picking this up to take a look |
kordek
referenced
this issue
Aug 20, 2016
Closed
BUG: GH11847 Unstack with mixed dtypes coerces everything to object #14053
jreback
modified the milestone: 0.19.2, Next Major Release
Nov 21, 2016
jreback
closed this
in d531718
Dec 10, 2016
jorisvandenbossche
added a commit
that referenced
this issue
Dec 15, 2016
|
|
kordek + jorisvandenbossche |
1bc64b1
|
ischurov
added a commit
to ischurov/pandas
that referenced
this issue
Dec 19, 2016
|
|
kordek + ischurov |
eb6e5a1
|
potash commentedDec 15, 2015
Related to #2929, if I unstack a dataframe with mixed dtypes they all get coerced to object and I have to recast to go back which is surprisingly slow (30 seconds for 400k rows and 400 np.float32 columns)
Is there any reason pandas doesn't keep the np.float32 dtype, especially since it supports missing values so even when there are missing index/column positions it shouldn't pose a problem?