Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: join operation fails on overlapping IntervalIndex levels #45661

Closed
3 tasks done
johannes-mueller opened this issue Jan 27, 2022 · 2 comments · Fixed by #45662
Closed
3 tasks done

BUG: join operation fails on overlapping IntervalIndex levels #45661

johannes-mueller opened this issue Jan 27, 2022 · 2 comments · Fixed by #45662
Labels
Bug Regression Functionality that used to work in a prior pandas version Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@johannes-mueller
Copy link
Contributor

johannes-mueller commented Jan 27, 2022

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

range_index = pd.RangeIndex(3, name="range_index")

interval_index = pd.IntervalIndex.from_tuples([
    (0.0, 1.0), (1.0, 2.0), (1.5, 2.5)
], name='interval_index')

multi_index = pd.MultiIndex.from_product([interval_index, range_index])

print(interval_index.join(multi_index))

# This causes the same issue
print(multi_index.join(interval_index))

Issue Description

Observed output:

Traceback (most recent call last):
  File "/home/jmu3si/tmp/join_index_flipped.py", line 11, in <module>
    print(interval_index.join(multi_index))
  File "/home/jmu3si/miniconda3/envs/myroot/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 216, in join
    join_index, lidx, ridx = meth(self, other, how=how, level=level, sort=sort)
  File "/home/jmu3si/miniconda3/envs/myroot/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 4368, in join
    return self._join_multi(other, how=how)
  File "/home/jmu3si/miniconda3/envs/myroot/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 4531, in _join_multi
    result = self._join_level(other, level, how=how)
  File "/home/jmu3si/miniconda3/envs/myroot/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 4633, in _join_level
    new_level, left_lev_indexer, right_lev_indexer = old_level.join(
  File "/home/jmu3si/miniconda3/envs/myroot/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 216, in join
    join_index, lidx, ridx = meth(self, other, how=how, level=level, sort=sort)
  File "/home/jmu3si/miniconda3/envs/myroot/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 4426, in join
    return self._join_via_get_indexer(other, how, sort)
  File "/home/jmu3si/miniconda3/envs/myroot/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 4456, in _join_via_get_indexer
    lindexer = self.get_indexer(join_index)
  File "/home/jmu3si/miniconda3/envs/myroot/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3721, in get_indexer
    raise InvalidIndexError(self._requires_unique_msg)
pandas.errors.InvalidIndexError: cannot handle overlapping indices; use IntervalIndex.get_indexer_non_unique

The join operation fails, because the get_indexer() fails due to overlapping intervals. It is very similar to #44096. The difference is probably that in here it is not two MultiIndexs that we are trying to join.

Expected Behavior

Expected output:

MultiIndex([((0.0, 1.0], 0),
            ((0.0, 1.0], 1),                                                                                                                  ((0.0, 1.0], 2),
            ((1.0, 2.0], 0),                                                                                                                  ((1.0, 2.0], 1),
            ((1.0, 2.0], 2),                                                                                                                  ((1.5, 2.5], 0),
            ((1.5, 2.5], 1),                                                                                                                  ((1.5, 2.5], 2)],
           names=['interval_index', 'range_index'])
MultiIndex([((0.0, 1.0], 0),
            ((0.0, 1.0], 1),                                                                                                                  ((0.0, 1.0], 2),
            ((1.0, 2.0], 0),
            ((1.0, 2.0], 1),                                                                                                                  ((1.0, 2.0], 2),
            ((1.5, 2.5], 0),                                                                                                                  ((1.5, 2.5], 1),
            ((1.5, 2.5], 2)],
           names=['interval_index', 'range_index'])

Installed Versions

INSTALLED VERSIONS ------------------ commit : bb1f651 python : 3.9.7.final.0 python-bits : 64 OS : Linux OS-release : 5.4.0-96-lowlatency Version : #109-Ubuntu SMP PREEMPT Wed Jan 12 17:51:01 UTC 2022 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : de_DE.UTF-8 LOCALE : de_DE.UTF-8

pandas : 1.4.0
numpy : 1.19.5
pytz : 2021.1
dateutil : 2.8.2
pip : 21.2.4
setuptools : 58.0.4
Cython : 0.29.24
pytest : None
hypothesis : None
sphinx : 4.2.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.0.1
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.5.0
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.7.3
sqlalchemy : None
tables : None
tabulate : None
xarray : 0.20.1
xlrd : None
xlwt : None
zstandard : None

@johannes-mueller johannes-mueller added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 27, 2022
@johannes-mueller
Copy link
Contributor Author

PR underway 😅

@johannes-mueller johannes-mueller changed the title BUG: join operation fail on overlapping IntervalIndex levels BUG: join operation fails on overlapping IntervalIndex levels Jan 27, 2022
johannes-mueller added a commit to boschresearch/pandas that referenced this issue Jan 27, 2022
Replacing calls to `get_indexer()` with `get_indexer_for()` as
`IntervalIndex`es can be unique and overlapping.

Similar to pandas-dev#44588
johannes-mueller added a commit to boschresearch/pandas that referenced this issue Jan 27, 2022
Replacing calls to `get_indexer()` with `get_indexer_for()` as
`IntervalIndex`es can be unique and overlapping.

Similar to pandas-dev#44588
@phofl phofl added Regression Functionality that used to work in a prior pandas version Reshaping Concat, Merge/Join, Stack/Unstack, Explode and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 27, 2022
@phofl phofl added this to the 1.4.1 milestone Jan 27, 2022
johannes-mueller added a commit to boschresearch/pandas that referenced this issue Jan 28, 2022
Replacing calls to `get_indexer()` with `get_indexer_for()` as
`IntervalIndex`es can be unique and overlapping.

Similar to pandas-dev#44588
johannes-mueller added a commit to boschresearch/pandas that referenced this issue Jan 28, 2022
Replacing calls to `get_indexer()` with `get_indexer_for()` as
`IntervalIndex`es can be unique and overlapping.

Similar to pandas-dev#44588
johannes-mueller added a commit to boschresearch/pandas that referenced this issue Jan 28, 2022
Replacing calls to `get_indexer()` with `get_indexer_for()` as
`IntervalIndex`es can be unique and overlapping.

Similar to pandas-dev#44588
meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this issue Jan 28, 2022
phofl pushed a commit that referenced this issue Jan 28, 2022
GH-45661) (#45682)

Co-authored-by: Johannes Mueller <johannes.mueller4@de.bosch.com>
simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this issue Feb 8, 2022
@simonjayhawkins
Copy link
Member

first bad commit: [b490507] REF: implement Index._can_use_libjoin (#43692)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Regression Functionality that used to work in a prior pandas version Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants