Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: RangeIndex.get_indexer is incorrect for some decreasing RangeIndex #28678

Closed
jschendel opened this issue Sep 30, 2019 · 1 comment · Fixed by #28680
Milestone

Comments

@jschendel
Copy link
Member

@jschendel jschendel commented Sep 30, 2019

For some decreasing RangeIndex, the get_indexer method will indicate that all of it's own values are missing, and find matches for values not included in the index:

In [2]: ri = pd.RangeIndex(10, 0, -3)

In [3]: ri.get_indexer(ri)
Out[3]: array([-1, -1, -1, -1])

In [4]: ri.get_indexer(ri - 1)
Out[4]: array([ 1,  2,  3, -1])

This will in turn result in methods like Series.reindex not working properly:

In [5]: s = pd.Series(list('abcd'), index=ri) 

In [6]: s
Out[6]: 
10    a
7     b
4     c
1     d
dtype: object

In [7]: s.reindex([10, 9, 7])
Out[7]: 
10    NaN
9       b
7     NaN
dtype: object

The issue appears to occur specifically for decreasing RangeIndex that are not in their canonical form. By canonical form, I mean when stop is the next valid value in the range that's not included, e.g. when you think of a more standard range like range(1, 7, 1), 7 is the next valid value that's not present, but when the step is larger than 1 you lose uniqueness of representation with stop (i.e. range(1, 6, 2) == range(1, 7, 2)).

Note that the code above works properly for the equivalent RangeIndex in it's canonical form:

In [8]: ri2 = pd.RangeIndex(start=10, stop=-2, step=-3)

In [9]: ri2.equals(ri)
Out[9]: True

In [10]: ri2.get_indexer(ri2)
Out[10]: array([0, 1, 2, 3])

In [11]: ri2.get_indexer(ri2 - 1)
Out[11]: array([-1, -1, -1, -1])

In [12]: s2 = pd.Series(list('abcd'), index=ri2)

In [13]: s2
Out[13]: 
10    a
7     b
4     c
1     d
dtype: object

In [14]: s2.reindex([10, 9, 7])
Out[14]: 
10      a
9     NaN
7       b
dtype: object

The cause of the issue appears to be that the code to determine start, stop, step when dealing with decreasing RangeIndex in get_indexer assumes self.stop is the canonical form:

if self.step > 0:
start, stop, step = self.start, self.stop, self.step
else:
# Work on reversed range for simplicity:
start, stop, step = (self.stop - self.step, self.start + 1, -self.step)

Instead of directly computing the reversed values ourselves, I think we should simply take the values from the reversed underlying range object:

diff --git a/pandas/core/indexes/range.py b/pandas/core/indexes/range.py
index 8783351cc..4c5904e5a 100644
--- a/pandas/core/indexes/range.py
+++ b/pandas/core/indexes/range.py
@@ -387,7 +387,8 @@ class RangeIndex(Int64Index):
             start, stop, step = self.start, self.stop, self.step
         else:
             # Work on reversed range for simplicity:
-            start, stop, step = (self.stop - self.step, self.start + 1, -self.step)
+            reverse = self._range[::-1]
+            start, stop, step = reverse.start, reverse.stop, reverse.step
 
         target_array = np.asarray(target)
         if not (is_integer_dtype(target_array) and target_array.ndim == 1):
@jschendel

This comment has been minimized.

Copy link
Member Author

@jschendel jschendel commented Sep 30, 2019

Note that this is a regression as this worked in 0.24.2:

In [1]: import pandas as pd; pd.__version__
Out[1]: '0.24.2'

In [2]: ri = pd.RangeIndex(10, 0, -3)

In [3]: ri.get_indexer(ri)
Out[3]: array([0, 1, 2, 3])

In [4]: ri.get_indexer(ri - 1)
Out[4]: array([-1, -1, -1, -1])

In [5]: s = pd.Series(list('abcd'), index=ri)

In [6]: s
Out[6]: 
10    a
7     b
4     c
1     d
dtype: object

In [7]: s.reindex([10, 9, 7])
Out[7]: 
10      a
9     NaN
7       b
dtype: object
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.