Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: IntervalIndex([np.nan, np.nan]).is_monotonic returns True #41831

Closed
jbrockmendel opened this issue Jun 5, 2021 · 1 comment · Fixed by #41863
Closed

BUG: IntervalIndex([np.nan, np.nan]).is_monotonic returns True #41831

jbrockmendel opened this issue Jun 5, 2021 · 1 comment · Fixed by #41863
Labels
Bug Index Related to the Index class or subclasses Interval Interval data type
Milestone

Comments

@jbrockmendel
Copy link
Member

I'm pretty sure this is related to other questionable behavior:

index = pd.IntervalIndex([np.nan, np.nan])
other = pd.IntervalIndex([np.nan])

In [4]: index.is_monotonic  # should be False
Out[4]: True

In [5]: index._index_as_unique  # should be False
Out[5]: True

In [6]: index.get_indexer_for(other)  # shouldn't raise
InvalidIndexError: Reindexing only valid with uniquely valued Index objects
@jbrockmendel jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 5, 2021
@jbrockmendel
Copy link
Member Author

@jschendel I've got a branch trying to fix this and could use your input as ive broken a bunch of test_is_overlapping cases.

The main thing ive done is in IntervalTree.__init__ defined self._na_count = len(mask) - mask.sum(), then 1) edited is_monotonic to return False if self._na_count > 0 and 2) edited is_overlapping to return True if self._na_count > 1

Finally I've edited IntervalIndex._get_indexer_pointwise to handle np.nan

         indexer, missing = [], []
         for i, key in enumerate(target):
+            if is_interval_dtype(target.dtype) and isna(key):
+                # self.get_loc(np.nan) will treat it as a float instead of as
+                #  our own dtype.
+                # TODO: handle this in get_loc?
+                locs = self.isna().nonzero()[0]
+                if len(locs) == 0:
+                    missing.append(i)
            else:
                  [what we have now, indented one more time]

This suffices to fix the three things in the OP, but breaks is_overlapping tests, so I'd like to get your thoughts on if this is the right way to solve this. (also how to keep get_indexer consistent with get_loc)

@jreback jreback added this to the 1.3 milestone Jun 9, 2021
@lithomas1 lithomas1 added Interval Interval data type Index Related to the Index class or subclasses and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Index Related to the Index class or subclasses Interval Interval data type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants