Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: join MultiIndex with overlapping IntervalIndex level (#44096) #44588

Conversation

johannes-mueller
Copy link
Contributor

@johannes-mueller johannes-mueller commented Nov 23, 2021

BaseMultiIndexCodesEngine._extract_level_codes relies on the assumption that
get_indexer() is always usable as any index level is_unique. However, that
is not the case for overlapping IntervalIndex levels. get_indexer() can
not be used on an IntervalIndex that is_overlapping even if it is_unique.

This patch uses get_indexer_for() instead.

This patch checks if the index level is an overlapping IntervalIndex and then
calls get_indexer_non_unique() instead.

We should not switch to get_indexer_non_unique() for other index levels than
overlapping IntervalIndex, even though that actually would work as well.
That is because get_indexer_non_unique() comes with a performance penalty of
roughly 30% compared to get_indexer(), measured by simple %timeit
measurements.

pandas/_libs/index.pyx Outdated Show resolved Hide resolved
@johannes-mueller johannes-mueller force-pushed the bugfix/join_multiindex-overlapping-intervals-44096 branch from 5e3a141 to 8f1027f Compare November 23, 2021 13:22
Copy link
Member

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comment, otherwise lgtm

@johannes-mueller johannes-mueller force-pushed the bugfix/join_multiindex-overlapping-intervals-44096 branch from 8f1027f to 7939a68 Compare November 24, 2021 07:47
@jreback jreback added this to the 1.4 milestone Nov 25, 2021
@jreback jreback added Interval Interval data type MultiIndex Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Nov 25, 2021
@jreback
Copy link
Contributor

jreback commented Nov 25, 2021

@johannes-mueller can you merge master and ping on green

…#44096)

`BaseMultiIndexCodesEngine._extract_level_codes` relies on the assumption that
`get_indexer()` is always usable as any index level `is_unique`.  However, that
is not the case for overlapping `IntervalIndex` levels.  `get_indexer()` can
not be used on an `IntervalIndex` that `is_overlapping` even if it `is_unique`.

This patch uses `get_indexer_for()` instead.
@johannes-mueller johannes-mueller force-pushed the bugfix/join_multiindex-overlapping-intervals-44096 branch from 7939a68 to 83d9ab7 Compare November 25, 2021 17:33
@johannes-mueller
Copy link
Contributor Author

@jreback rebased

@jreback jreback merged commit f0d4d8d into pandas-dev:master Nov 25, 2021
@jreback
Copy link
Contributor

jreback commented Nov 25, 2021

thanks @johannes-mueller

@johannes-mueller johannes-mueller deleted the bugfix/join_multiindex-overlapping-intervals-44096 branch November 26, 2021 07:29
johannes-mueller added a commit to boschresearch/pandas that referenced this pull request Jan 27, 2022
Replacing calls to `get_indexer()` with `get_indexer_for()` as
`IntervalIndex`es can be unique and overlapping.

Similar to pandas-dev#44588
johannes-mueller added a commit to boschresearch/pandas that referenced this pull request Jan 27, 2022
Replacing calls to `get_indexer()` with `get_indexer_for()` as
`IntervalIndex`es can be unique and overlapping.

Similar to pandas-dev#44588
johannes-mueller added a commit to boschresearch/pandas that referenced this pull request Jan 28, 2022
Replacing calls to `get_indexer()` with `get_indexer_for()` as
`IntervalIndex`es can be unique and overlapping.

Similar to pandas-dev#44588
johannes-mueller added a commit to boschresearch/pandas that referenced this pull request Jan 28, 2022
Replacing calls to `get_indexer()` with `get_indexer_for()` as
`IntervalIndex`es can be unique and overlapping.

Similar to pandas-dev#44588
johannes-mueller added a commit to boschresearch/pandas that referenced this pull request Jan 28, 2022
Replacing calls to `get_indexer()` with `get_indexer_for()` as
`IntervalIndex`es can be unique and overlapping.

Similar to pandas-dev#44588
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Interval Interval data type MultiIndex Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Joining MultiIndex with IntervalIndex level fails when IntervalIndex level is overlapping
3 participants