Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Data types are not preserved while concatenating DataFrames with nullable integers #27692

Closed
vss888 opened this issue Aug 1, 2019 · 1 comment · Fixed by #33522
Closed
Labels
Bug ExtensionArray Extending pandas with custom dtypes or arrays. Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@vss888
Copy link

vss888 commented Aug 1, 2019

Copy-pastable example

# input data
import pandas as pd
t1 = pd.DataFrame(index=[0], data={'x':[1]}, dtype='UInt8')
t2 = pd.DataFrame(index=[1], data={'y':[1]}, dtype='UInt8')
t3 = pd.concat([t1,t2], join='outer', sort=False)

'''actual result'''
print(t3.dtypes)
# x    object
# y    object
# dtype: object

Problem description

Data types are not preserved data type while concatenating DataFrames with nullable integers. Instead, the result of concatenation has mixed data types and so the column types are object:

>>> type(t3.at[0,'x'])
<class 'int'>
>>> type(t3.at[1,'x'])
<class 'float'>

Expected Output

'''expected result'''
print(t3.dtypes)
# x    UInt8
# y    UInt8
# dtype: object

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : None python : 3.6.3.final.0 python-bits : 64 OS : Linux OS-release : 3.10.0-862.11.6.el7.x86_64 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : en_US.utf-8 LANG : en_US.utf-8 LOCALE : en_US.UTF-8

pandas : 0.25.0
numpy : 1.16.3
pytz : 2018.4
dateutil : 2.7.3
pip : 19.1.1
setuptools : 39.0.1
Cython : 0.29.12
pytest : 3.3.2
hypothesis : None
sphinx : 1.6.6
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.3.4
html5lib : 0.9999999
pymysql : None
psycopg2 : None
jinja2 : 2.10
IPython : 6.2.1
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.3.4
matplotlib : 3.0.2
numexpr : 2.6.4
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.11.1
pytables : None
s3fs : None
scipy : 1.1.0
sqlalchemy : None
tables : 3.5.2
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None

@vss888 vss888 changed the title Data types are not preserved data type while concatenating DataFrames with nullable integers Data types are not preserved while concatenating DataFrames with nullable integers Aug 2, 2019
@TomAugspurger
Copy link
Contributor

Tangentially related to #22994, which proposes a solution for getting the right dtypes in this situation.

@TomAugspurger TomAugspurger added ExtensionArray Extending pandas with custom dtypes or arrays. Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Aug 2, 2019
@jorisvandenbossche jorisvandenbossche changed the title Data types are not preserved while concatenating DataFrames with nullable integers BUG: Data types are not preserved while concatenating DataFrames with nullable integers Mar 19, 2020
TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Apr 13, 2020
@jreback jreback added this to the 1.1 milestone Apr 17, 2020
jreback pushed a commit that referenced this issue Jul 1, 2020
* BUG: Fixed concat with reindex and extension types

Closes #27692
Closes #33027

* rebase

* fixup

* cleanup

* fixups
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug ExtensionArray Extending pandas with custom dtypes or arrays. Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants