Unstack with mixed dtypes coerces everything to object #11847

Closed
potash opened this Issue Dec 15, 2015 · 6 comments

Comments

Projects
None yet
3 participants

potash commented Dec 15, 2015

Related to #2929, if I unstack a dataframe with mixed dtypes they all get coerced to object and I have to recast to go back which is surprisingly slow (30 seconds for 400k rows and 400 np.float32 columns)

Is there any reason pandas doesn't keep the np.float32 dtype, especially since it supports missing values so even when there are missing index/column positions it shouldn't pose a problem?

Contributor

jreback commented Dec 15, 2015

pls show a copy-pastable example

potash commented Dec 15, 2015

Ah, looking for an example helped me narrow down the bug. It is specific to passing a list of levels to unstack, even when that list only has a single entry. E.g. compare:

> df = pd.DataFrame({'state':['IL', 'MI'], 'index':['a','a'], 'value1':[1.0,1.0], 'value2':['c','c'] })
> df.set_index(['state','index']).unstack(['index']).dtypes

        index
value1  a        object
value2  a        object
dtype: object

> df.set_index(['state','index']).unstack('index').dtypes
index
value1  a        float64
value2  a         object
dtype: object

So a workaround in my case with multiple levels is to replace unstack(['index1', 'index2']) with unstack('index1').unstack('index2') and indeed I checked that it works.

Contributor

jreback commented Dec 15, 2015

so looks like what you want is: pydata#9023

which is almost finished. in fact if you are looking for something to do...could use some updating :)

Contributor

jreback commented Dec 15, 2015

i'll mark this as a bug, which may be independent. want to see if you can put in a fix with the existing framework?

jreback added this to the Next Major Release milestone Dec 15, 2015

jreback added the Bug label Dec 15, 2015

potash commented Dec 15, 2015

Thanks! I will try but I do not use pandas from master and I've never played with the source so it won't be quick.

Contributor

kordek commented Aug 20, 2016

Picking this up to take a look

@jreback jreback modified the milestone: 0.19.2, Next Major Release Nov 21, 2016

jreback closed this in d531718 Dec 10, 2016

@jorisvandenbossche jorisvandenbossche added a commit that referenced this issue Dec 15, 2016

@kordek @jorisvandenbossche kordek + jorisvandenbossche BUG: GH11847 Unstack with mixed dtypes coerces everything to object
closes #11847

Changed the way
in which the original data frame is copied (dropped use of .values,
since it does not preserve dtypes).

Author: Pawel Kordek <pawel.kordek@gmail.com>

Closes #14053 from kordek/#11847 and squashes the following commits:

6a381ce [Pawel Kordek] BUG: GH11847 Unstack with mixed dtypes coerces everything to object

(cherry picked from commit d531718)
1bc64b1

@ischurov ischurov added a commit to ischurov/pandas that referenced this issue Dec 19, 2016

@kordek @ischurov kordek + ischurov BUG: GH11847 Unstack with mixed dtypes coerces everything to object
closes #11847

Changed the way
in which the original data frame is copied (dropped use of .values,
since it does not preserve dtypes).

Author: Pawel Kordek <pawel.kordek@gmail.com>

Closes #14053 from kordek/#11847 and squashes the following commits:

6a381ce [Pawel Kordek] BUG: GH11847 Unstack with mixed dtypes coerces everything to object
eb6e5a1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment