Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pd.concat() changes index values from int to boolean #21108

Open
MajorMajorMajorMajor opened this issue May 17, 2018 · 2 comments
Open

pd.concat() changes index values from int to boolean #21108

MajorMajorMajorMajor opened this issue May 17, 2018 · 2 comments
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@MajorMajorMajorMajor
Copy link

MajorMajorMajorMajor commented May 17, 2018

Code Sample, a copy-pastable example if possible

import pandas as pd
df1 = pd.DataFrame([123, 456], columns=['data'], index=[True, False])
#        data
# True    123
# False   456

df2 = pd.DataFrame([55, 983, 69, 112, 0], columns=['data'], index=[1, 2, 3, 4, 99])
#     data
# 1     55
# 2    983
# 3     69
# 4    112
# 99     0

my_dict = {'One': df1, 'Two':df2}

df_combined = pd.concat(my_dict)
#            data
# One True    123
#     False   456
# Two True     55  <-------- this index should be "1" instead of True
#     2       983
#     3        69
#     4       112
#     99        0

Problem description

When concatenating a bool-indexed dataset with an int-indexed dataset, concat() uses whichever value it saw first (True or 1; False or 0) instead of 0 or 1 in the hierarchical index.

A less surprising behavior would be to preserve the original indices.

Is there a workaround?

Expected Output

df_combined:

 One True    123
     False   456
 Two 1        55   <--------
     2       983
     3        69
     4       112
     99        0

Output of pd.show_versions()

[paste the output of ``pd.show_versions()`` here below this line]

INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 63 Stepping 2, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.22.0
pytest: 3.3.2
pip: 9.0.1
setuptools: 38.4.0
Cython: 0.27.3
numpy: 1.14.3
scipy: 1.0.1
pyarrow: 0.7.1
xarray: None
IPython: 6.3.1
sphinx: 1.6.6
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.5
feather: None
matplotlib: 2.2.2
openpyxl: 2.4.9
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.4
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.0
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@gfyoung gfyoung added Bug Dtype Conversions Unexpected or buggy dtype conversions labels May 21, 2018
@gfyoung
Copy link
Member

gfyoung commented May 21, 2018

Much agreed that we shouldn't have flipped the dtype.

The workaround in your case is to just "correct" the index afterward (i.e. set the index to the correct MultiIndex), but let's see if we can find the actual cause and patch it.

@mickaelReuze
Copy link

Hello @MajorMajorMajorMajor, @gfyoung,

In my opinion the root cause is the method kh_get_pymap() (file : 'hashtable_class_helper.pxi.in', class : PyObjectHashTable(), method : get_labels()).

kh_get_pymap() takes as input a list of indexes, and determines for each if it was already present previously in the same list.
I think it doesn't make difference between BOOL and INT(0 or 1), so in our example 1 is computed as True.

This method is defined under khash.h file, SCOPE khint_t kh_get_##name(const kh_##name##_t *h, khkey_t key).

For now I'm not able to modify it for debug. Still investigating...
If someone knows more on this, he is welcome :)

Regards.

@mroeschke mroeschke added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Oct 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

4 participants