Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: convert_dtypes changes BooleanDtype to Int64 #32287

Closed
jiannmeng opened this issue Feb 27, 2020 · 3 comments · Fixed by #32490
Closed

BUG: convert_dtypes changes BooleanDtype to Int64 #32287

jiannmeng opened this issue Feb 27, 2020 · 3 comments · Fixed by #32490
Assignees
Labels
Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays
Milestone

Comments

@jiannmeng
Copy link

jiannmeng commented Feb 27, 2020

Code Sample, a copy-pastable example if possible

>>> import pandas as pd
>>> df = pd.DataFrame(data=[["abc", 123, True]])
>>> print(df)
     0    1     2
0  abc  123  True
>>> print(df.dtypes)
0    object
1     int64
2      bool
dtype: object
>>> df = df.convert_dtypes()
>>> print(df)
     0    1     2
0  abc  123  True
>>> print(df.dtypes)
0     string
1      Int64
2    boolean
dtype: object
>>> df = df.convert_dtypes()
>>> print(df)
        0    1  2
0  b'abc'  123  1
>>> print(df.dtypes)
0    object
1     Int64
2     Int64
dtype: object

Problem description

Applying convert_dtypes() to a column with dtype string converts it to a column dtype 'object' (and the individual values from str type to bytes type).

Applying convert_dtypes() to a column with dtype boolean converts it to a column dtype 'Int64' (and the individual values from bool type to int type).

Expected Output

convert_dtypes() should keep StringDtype columns as StringDtype and BooleanDtype columns as BooleanDtype.

Output of pd.show_versions()

pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.8.1.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 10, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_Malaysia.1252

pandas : 1.0.1
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 19.2.3
setuptools : 41.2.0
Cython : None
pytest : 5.3.5
hypothesis : None
sphinx : None
blosc : None

@jiannmeng
Copy link
Author

Oops, string bug is a duplicate and seems to be fixed, as per #31731.
However, I can't find any mention of the boolean bug.

@jorisvandenbossche
Copy link
Member

@jiannmeng Thanks for the report!
The string one is indeed already fixed, the boolean not yet.

@jorisvandenbossche jorisvandenbossche added Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays labels Feb 29, 2020
@jorisvandenbossche jorisvandenbossche added this to the 1.0.2 milestone Feb 29, 2020
@jorisvandenbossche jorisvandenbossche changed the title convert_dtypes changes StringDtype to bytes, and BooleanDtype to Int64 BUG: convert_dtypes changes BooleanDtype to Int64 Feb 29, 2020
@AnnaDaglis
Copy link
Contributor

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants