Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Implement new indexing behavior for intervals #27100

Merged
merged 13 commits into from
Jul 2, 2019

Conversation

jschendel
Copy link
Member

@jschendel jschendel commented Jun 28, 2019

I left the test_interval.py and test_interval_new.py files separate for now, in order to make it more obvious what is being changed. Will condense these files in a follow-up.

@jschendel jschendel added Indexing Related to indexing on series/frames, not to indexes themselves API Design Interval Interval data type labels Jun 28, 2019
@jschendel jschendel added this to the 0.25.0 milestone Jun 28, 2019
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks really good, some comments.

doc/source/whatsnew/v0.25.0.rst Outdated Show resolved Hide resolved
doc/source/whatsnew/v0.25.0.rst Show resolved Hide resolved
doc/source/whatsnew/v0.25.0.rst Outdated Show resolved Hide resolved
doc/source/whatsnew/v0.25.0.rst Show resolved Hide resolved
doc/source/whatsnew/v0.25.0.rst Show resolved Hide resolved
pandas/core/indexes/interval.py Show resolved Hide resolved
pandas/core/indexes/interval.py Show resolved Hide resolved
pandas/core/indexes/interval.py Show resolved Hide resolved
pandas/core/indexes/interval.py Outdated Show resolved Hide resolved
pandas/core/indexes/interval.py Outdated Show resolved Hide resolved
@jreback
Copy link
Contributor

jreback commented Jun 28, 2019

if you have other issues you are closing add them as closes in the top of the PR

@codecov
Copy link

codecov bot commented Jun 29, 2019

Codecov Report

Merging #27100 into master will decrease coverage by 49.97%.
The diff coverage is 8.69%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #27100       +/-   ##
===========================================
- Coverage   91.87%    41.9%   -49.98%     
===========================================
  Files         180      180               
  Lines       50853    50861        +8     
===========================================
- Hits        46719    21311    -25408     
- Misses       4134    29550    +25416
Flag Coverage Δ
#multiple ?
#single 41.9% <8.69%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexing.py 53.68% <100%> (-39.62%) ⬇️
pandas/core/indexes/base.py 55.52% <100%> (-41.41%) ⬇️
pandas/core/indexes/interval.py 34.24% <6.66%> (-62.21%) ⬇️
pandas/io/formats/latex.py 0% <0%> (-100%) ⬇️
pandas/io/gcs.py 0% <0%> (-100%) ⬇️
pandas/io/sas/sas_constants.py 0% <0%> (-100%) ⬇️
pandas/core/groupby/categorical.py 0% <0%> (-100%) ⬇️
pandas/tseries/plotting.py 0% <0%> (-100%) ⬇️
pandas/tseries/converter.py 0% <0%> (-100%) ⬇️
pandas/io/s3.py 0% <0%> (-100%) ⬇️
... and 135 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1be0561...50c257c. Read the comment docs.

doc/source/whatsnew/v0.25.0.rst Outdated Show resolved Hide resolved
doc/source/whatsnew/v0.25.0.rst Show resolved Hide resolved
doc/source/whatsnew/v0.25.0.rst Show resolved Hide resolved
pandas/core/indexes/interval.py Show resolved Hide resolved
[1, -1])])
def test_get_indexer_with_interval(self, query, expected):

tuples = [(0, 2.5), (1, 3), (2, 4)]
tuples = [(0, 2), (2, 4), (5, 7)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change here was to have a non-overlapping index?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, since overlapping indexes are invalid for get_indexer and need to use get_indexer_non_unique instead. For intervals, overlapping is the analogue for uniqueness, as uniqueness isn't a strong enough condition since a query on a unique IntervalIndex can still have multiple values returned if they come from the overlap of 2+ intervals.

pandas/tests/indexing/interval/test_interval_new.py Outdated Show resolved Hide resolved
@jorisvandenbossche
Copy link
Member

It would also be good to add some documentation on indexing in the advanced.rst interval section (maybe copy some example from the whatsnew you added)

@jschendel
Copy link
Member Author

@jorisvandenbossche : added a description of the new behavior to advanced.rst.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates look good!

doc/source/user_guide/advanced.rst Outdated Show resolved Hide resolved
@jschendel
Copy link
Member Author

@jreback @jorisvandenbossche : also removed the warning about indexing behavior being provisional

@ghost ghost mentioned this pull request Jul 1, 2019
@jschendel
Copy link
Member Author

@jreback @jorisvandenbossche : all existing comments have been addressed

@@ -965,6 +964,32 @@ If you select a label *contained* within an interval, this will also select the
df.loc[2.5]
df.loc[[2.5, 3.5]]

Selecting using an ``Interval`` will only return exact matches.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a versionchanged 0.25 here

Indexing an ``IntervalIndex`` with ``Interval`` objects
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Indexing methods for :class:`IntervalIndex` have been modified to require exact matches only for :class:`Interval` queries.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a link the docs you added above in indexing

pandas/core/indexes/interval.py Outdated Show resolved Hide resolved
pandas/core/indexes/interval.py Outdated Show resolved Hide resolved

elif lhs == -1:
# check that target IntervalIndex is compatible
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe worth it to make a helper method for indexing with an II (as this is the same code as in get_indexer for unique)

@jreback
Copy link
Contributor

jreback commented Jul 1, 2019

also merge master

@jreback
Copy link
Contributor

jreback commented Jul 2, 2019

merge master (as the other PR was merged)

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 small changes that remove code done in the precursor and i think good to go

pandas/core/indexes/interval.py Outdated Show resolved Hide resolved
@jreback jreback merged commit c407b73 into pandas-dev:master Jul 2, 2019
@jreback
Copy link
Contributor

jreback commented Jul 2, 2019

thanks @jschendel very nice patch!

@alexlenail
Copy link
Contributor

Glad to see this finally resolved. Looking forward to trying out the new behavior.

@jorisvandenbossche
Copy link
Member

@zfrenchee it would be very welcome to test and give feedback !
(there should also be a release candidate in a few days, if that is easier to test than master)

@jschendel jschendel deleted the ii-new-behavior branch July 3, 2019 15:02
@alexlenail
Copy link
Contributor

@jorisvandenbossche I noticed today

NotImplementedError: contains not implemented for two intervals

Both in 25.3 and master. Is there still room to contribute to intervalIndex?

@jorisvandenbossche
Copy link
Member

@alexlenail I think it is best to open a new issue for this (to ensure it doesn't get lost)
I need to look back whether this was done on purpose or just because it was less straightforward / lack of time (the error type seems to suggest the second). But let's use a new issue to discuss that. Contributions are certainly welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Indexing Related to indexing on series/frames, not to indexes themselves Interval Interval data type
Projects
None yet
4 participants