Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent behavior when calling join/concat on DataFrames with CategoricalIndex and IntervalIndex #25019

Open
shengpu-tang opened this issue Jan 30, 2019 · 1 comment
Labels
Bug Categorical Categorical Data Type Interval Interval data type Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@shengpu-tang
Copy link
Contributor

shengpu-tang commented Jan 30, 2019

Code Sample

import pandas as pd
df1 = pd.DataFrame(data={'col1': [1, 1]}, index=pd.interval_range(0, 1, 2))
df2 = pd.DataFrame(data={'col2': [1, 1]}, index=pd.interval_range(0, 1, 2))
df2.index = pd.CategoricalIndex(df2.index)

df2.join(df1) # works
df2.join([df1]) # works
df1.join(df2) # works
df1.join([df2]) # `TypeError: object of type 'NoneType' has no len()`
pd.concat([df1, df2], axis=1) # `TypeError: the other index needs to be an IntervalIndex too, but was type CategoricalIndex`

Problem description

When joining dataframes with CategoricalIndex and IntervalIndex, sometimes it does not work. In particular, when the df on which we call join has IntervalIndex, and the argument is a df with CategoricalIndex and is wrapped in a list, somewhere in concat.py throws a TypeError with an undecipherable error message.

pd.concat on these DataFrames throws a slightly more helpful message.

Expected Output

I'm not exactly sure what the current rule is regarding how to handle these two index types, but we should make it consistent. I think if the index'es can be matched (possibly after conversion), then the join/concat operation should be allowed. Or maybe give a warning saying some conversion is done implicitly.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-138-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.0
pytest: None
pip: 19.0.1
setuptools: 39.1.0
Cython: None
numpy: 1.16.0
scipy: 1.2.0
pyarrow: None
xarray: None
IPython: 7.1.1
sphinx: None
patsy: None
dateutil: 2.7.2
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.2
openpyxl: None
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml.etree: 4.2.4
bs4: None
html5lib: 0.9999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@jschendel jschendel added Bug Categorical Categorical Data Type Interval Interval data type labels Jan 30, 2019
@jschendel jschendel added this to the Contributions Welcome milestone Jan 30, 2019
@shengpu-tang
Copy link
Contributor Author

Hi just to follow up on this, it looks like in pandas 1.0.1, it no longer throws errors for all the above commands 👍. However, the behavior is still not consistent:

import pandas as pd
df1 = pd.DataFrame(data={'col1': [1, 1]}, index=pd.interval_range(0, 1, 2))
df2 = pd.DataFrame(data={'col2': [1, 1]}, index=pd.interval_range(0, 1, 2))
df2.index = pd.CategoricalIndex(df2.index)

df2.join(df1).index
# Index([(0.0, 0.5], (0.5, 1.0]], dtype='object')

df2.join([df1]).index
# CategoricalIndex([(0.0, 0.5], (0.5, 1.0]], categories=[(0.0, 0.5], (0.5, 1.0]], ordered=False, dtype='category')

df1.join(df2).index
# Index([(0.0, 0.5], (0.5, 1.0]], dtype='object')

df1.join([df2]).index
# IntervalIndex([(0.0, 0.5], (0.5, 1.0]],
#               closed='right',
#               dtype='interval[float64]')

pd.concat([df1, df2], axis=1).index
# Index([(0.0, 0.5], (0.5, 1.0]], dtype='object')

What is the current strategy of determining the index type of a join/concat output? Could you point me to the relevant code?

@mroeschke mroeschke added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Jun 26, 2021
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Categorical Categorical Data Type Interval Interval data type Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

3 participants