Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Intersection of RangeIndexes fails when the step sign is negative #17296

Closed
jschendel opened this issue Aug 20, 2017 · 2 comments · Fixed by #17374
Closed

BUG: Intersection of RangeIndexes fails when the step sign is negative #17296

jschendel opened this issue Aug 20, 2017 · 2 comments · Fixed by #17374
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@jschendel
Copy link
Member

jschendel commented Aug 20, 2017

Code Sample, a copy-pastable example if possible

Setup

In [1]: import pandas as pd
   ...: idx1 = pd.RangeIndex(10)
   ...: idx2 = pd.RangeIndex(9, -1, -1)
   ...: idx3 = pd.RangeIndex(9, 5, -1)
   ...:

Issue 1: Mixed sign steps

In [2]: idx1.intersection(idx2)
Out[2]: RangeIndex(start=0, stop=10, step=1)

In [3]: idx2.intersection(idx1)
Out[3]: RangeIndex(start=0, stop=10, step=-1)

Issue 2: Both steps negative

In [4]: idx2.intersection(idx3)
Out[4]: RangeIndex(start=6, stop=10, step=-1)

Problem description

Issue 1: The output is inconsistent depending on the order of the intersection; one returns an index identical to idx1, the other an empty index.

Issue 2: The returned index is empty, even though idx3 is a subset of idx2.

Expected Output

Issue 1: I'm not 100% sure what the output for Issue 1 should be. To obey the properties of set intersections, all ten values should be returned, but what should the sign of the returned step be?

Seems like the options are:

  • Always default to positive step?
  • Default to the step sign of the first index in the intersection?
  • Or does the step sign not matter as long as the values within the index are correct?

Issue 2: An index identical to idx3 should be returned. Should the step sign be negative in this case, since both inputs have a negative step? Or should we always default to positive?

Seems like this issue could be lumped in with #13432. However, this issue seems restricted to just RangeIndex (all code modifications for a fix would be contained to range.py), whereas the other issue seems to be broader than a specific type of index. Regardless, I'm happy to work on a PR to fix this specific bug.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.2.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.20.3
pytest: 3.1.2
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.26
numpy: 1.13.1
scipy: 0.19.1
xarray: None
IPython: 6.1.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

@TomAugspurger
Copy link
Contributor

Thanks for the report, this is distinct enough from #13432 to warrant a new issue

Or does the step sign not matter as long as the values within the index are correct?

I think this is correct. RangeIndex is just a performance thing, I don't think we can make any guarantees beyond that the values will be correct (including whether or not you get back a RangeIndex or regular Index). All of our tests should be of the form

assert_frame_equal(idx1 & idx2, Index(list(idx1)) &Index list(idx2)))

with additional checks for whether or not the return value should be a RangeIndex.

Implementation wise, I'd say just do whatever is easiest. I suspect that will be

  • positive step when both are positive
  • negative step when both are negative
  • not sure about mixed

toobaz added a commit to toobaz/pandas that referenced this issue Aug 29, 2017
toobaz added a commit to toobaz/pandas that referenced this issue Aug 29, 2017
@gfyoung gfyoung added the Bug label Aug 29, 2017
toobaz added a commit to toobaz/pandas that referenced this issue Aug 29, 2017
toobaz added a commit to toobaz/pandas that referenced this issue Aug 30, 2017
@toobaz
Copy link
Member

toobaz commented Aug 30, 2017

Implementation wise, I'd say just do whatever is easiest. I suspect that will be

positive step when both are positive
negative step when both are negative
not sure about mixed

In #17374 I followed "negative step iff both are negative", which has the advantage of being symmetric, but it would pretty trivial to change to "negative step iff step of caller is negative".

@jreback jreback added this to the 0.21.0 milestone Aug 30, 2017
toobaz added a commit to toobaz/pandas that referenced this issue Aug 30, 2017
toobaz added a commit to toobaz/pandas that referenced this issue Sep 6, 2017
jbrockmendel pushed a commit to jbrockmendel/pandas that referenced this issue Sep 10, 2017
alanbato pushed a commit to alanbato/pandas that referenced this issue Nov 10, 2017
No-Stream pushed a commit to No-Stream/pandas that referenced this issue Nov 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants