New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unstack with mixed dtypes coerces everything to object #11847

Closed
potash opened this Issue Dec 15, 2015 · 6 comments

Comments

Projects
None yet
3 participants
@potash

potash commented Dec 15, 2015

Related to #2929, if I unstack a dataframe with mixed dtypes they all get coerced to object and I have to recast to go back which is surprisingly slow (30 seconds for 400k rows and 400 np.float32 columns)

Is there any reason pandas doesn't keep the np.float32 dtype, especially since it supports missing values so even when there are missing index/column positions it shouldn't pose a problem?

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Dec 15, 2015

Contributor

pls show a copy-pastable example

Contributor

jreback commented Dec 15, 2015

pls show a copy-pastable example

@potash

This comment has been minimized.

Show comment
Hide comment
@potash

potash Dec 15, 2015

Ah, looking for an example helped me narrow down the bug. It is specific to passing a list of levels to unstack, even when that list only has a single entry. E.g. compare:

> df = pd.DataFrame({'state':['IL', 'MI'], 'index':['a','a'], 'value1':[1.0,1.0], 'value2':['c','c'] })
> df.set_index(['state','index']).unstack(['index']).dtypes

        index
value1  a        object
value2  a        object
dtype: object

> df.set_index(['state','index']).unstack('index').dtypes
index
value1  a        float64
value2  a         object
dtype: object

So a workaround in my case with multiple levels is to replace unstack(['index1', 'index2']) with unstack('index1').unstack('index2') and indeed I checked that it works.

potash commented Dec 15, 2015

Ah, looking for an example helped me narrow down the bug. It is specific to passing a list of levels to unstack, even when that list only has a single entry. E.g. compare:

> df = pd.DataFrame({'state':['IL', 'MI'], 'index':['a','a'], 'value1':[1.0,1.0], 'value2':['c','c'] })
> df.set_index(['state','index']).unstack(['index']).dtypes

        index
value1  a        object
value2  a        object
dtype: object

> df.set_index(['state','index']).unstack('index').dtypes
index
value1  a        float64
value2  a         object
dtype: object

So a workaround in my case with multiple levels is to replace unstack(['index1', 'index2']) with unstack('index1').unstack('index2') and indeed I checked that it works.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Dec 15, 2015

Contributor

so looks like what you want is: #9023

which is almost finished. in fact if you are looking for something to do...could use some updating :)

Contributor

jreback commented Dec 15, 2015

so looks like what you want is: #9023

which is almost finished. in fact if you are looking for something to do...could use some updating :)

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Dec 15, 2015

Contributor

i'll mark this as a bug, which may be independent. want to see if you can put in a fix with the existing framework?

Contributor

jreback commented Dec 15, 2015

i'll mark this as a bug, which may be independent. want to see if you can put in a fix with the existing framework?

@potash

This comment has been minimized.

Show comment
Hide comment
@potash

potash Dec 15, 2015

Thanks! I will try but I do not use pandas from master and I've never played with the source so it won't be quick.

potash commented Dec 15, 2015

Thanks! I will try but I do not use pandas from master and I've never played with the source so it won't be quick.

@kordek

This comment has been minimized.

Show comment
Hide comment
@kordek

kordek Aug 20, 2016

Contributor

Picking this up to take a look

Contributor

kordek commented Aug 20, 2016

Picking this up to take a look

@jreback jreback modified the milestones: 0.19.2, Next Major Release Nov 21, 2016

@jreback jreback closed this in d531718 Dec 10, 2016

jorisvandenbossche added a commit that referenced this issue Dec 15, 2016

BUG: GH11847 Unstack with mixed dtypes coerces everything to object
closes #11847

Changed the way
in which the original data frame is copied (dropped use of .values,
since it does not preserve dtypes).

Author: Pawel Kordek <pawel.kordek@gmail.com>

Closes #14053 from kordek/#11847 and squashes the following commits:

6a381ce [Pawel Kordek] BUG: GH11847 Unstack with mixed dtypes coerces everything to object

(cherry picked from commit d531718)

ischurov added a commit to ischurov/pandas that referenced this issue Dec 19, 2016

BUG: GH11847 Unstack with mixed dtypes coerces everything to object
closes #11847

Changed the way
in which the original data frame is copied (dropped use of .values,
since it does not preserve dtypes).

Author: Pawel Kordek <pawel.kordek@gmail.com>

Closes #14053 from kordek/#11847 and squashes the following commits:

6a381ce [Pawel Kordek] BUG: GH11847 Unstack with mixed dtypes coerces everything to object
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment