Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Perform i8 conversion for datetimelike IntervalTree queries #22988

Merged
merged 3 commits into from
Oct 7, 2018

Conversation

jschendel
Copy link
Member

I've seen a few complaints about this on StackOverflow, so wanted to get this into 0.24.0 before I try implementing larger changes like the new behavior specs and interval accessor.

@jschendel jschendel added Bug Datetime Datetime data dtype Indexing Related to indexing on series/frames, not to indexes themselves Interval Interval data type labels Oct 4, 2018
@jschendel jschendel added this to the 0.24.0 milestone Oct 4, 2018
@pep8speaks
Copy link

Hello @jschendel! Thanks for submitting the PR.

@mroeschke
Copy link
Member

Could you also add tests for Periods as well? I think they also need i8 conversion (though possibly less commonly used).

@jschendel
Copy link
Member Author

We currently do not allow an IntervalIndex to have Period endpoints, so I don't think tests are necessary:

In [2]: pd.IntervalIndex.from_breaks(pd.period_range('2018Q1', freq='Q', periods=3))
---------------------------------------------------------------------------
ValueError: Period dtypes are not supported, use a PeriodIndex instead

Interestingly enough, it looks like an Interval can have Period endpoints, which seems like a bug:

In [3]: pd.Interval(pd.Period('2018Q1', freq='Q'), pd.Period('2018Q3', freq='Q'))
Out[3]: Interval(Period('2018Q1', 'Q-DEC'), Period('2018Q3', 'Q-DEC'), closed='right')

@mroeschke
Copy link
Member

Gotcha. Are there plans for IntervalIndex to support Period? That might get tricky since Periods themselves are conceptually 1 label for an interval of time.

@codecov
Copy link

codecov bot commented Oct 4, 2018

Codecov Report

Merging #22988 into master will increase coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22988      +/-   ##
==========================================
+ Coverage   92.18%    92.2%   +0.01%     
==========================================
  Files         169      169              
  Lines       50823    50865      +42     
==========================================
+ Hits        46853    46898      +45     
+ Misses       3970     3967       -3
Flag Coverage Δ
#multiple 90.62% <100%> (+0.01%) ⬆️
#single 42.33% <13.88%> (-0.04%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/interval.py 94.63% <100%> (+0.3%) ⬆️
pandas/core/dtypes/dtypes.py 95.56% <0%> (-0.3%) ⬇️
pandas/core/reshape/merge.py 93.89% <0%> (-0.26%) ⬇️
pandas/core/strings.py 98.63% <0%> (ø) ⬆️
pandas/core/dtypes/base.py 100% <0%> (ø) ⬆️
pandas/core/dtypes/cast.py 88.58% <0%> (ø) ⬆️
pandas/core/series.py 93.75% <0%> (ø) ⬆️
pandas/core/frame.py 97.2% <0%> (ø) ⬆️
pandas/core/dtypes/common.py 95.03% <0%> (+0.01%) ⬆️
pandas/core/arrays/datetimelike.py 95.56% <0%> (+0.02%) ⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c282e31...f338f3d. Read the comment docs.

@jschendel
Copy link
Member Author

Are there plans for IntervalIndex to support Period?

I'm not aware of any such plans. I don't think the same type algorithms would apply; a Period is basically a special case of an Interval, so you'd essentially have intervals with interval endpoints (e.g. Period('2018Q1') is conceptually Interval('2018-01-01', '2018-04-01', closed='left') with some additional special properties.). Haven't fully thought it through though.

There is an open issue to have Period/PeriodIndex inherit from Interval/IntervalIndex though, allowing one to use any of the interval methods on periods. Not quite as straight forward as it might initially seem though, as we allow a Period to be outside the bounds of Timestamp, so just using Timestamp endpoints for the Interval representation wouldn't cut it.

@@ -514,6 +519,41 @@ def _maybe_cast_indexed(self, key):

return key

def _maybe_convert_i8(self, key):
if isinstance(key, Interval):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you document this a bit, and can you simplify at all?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Look better?

@jreback
Copy link
Contributor

jreback commented Oct 7, 2018

@mroeschke as indicated above, a Period doesn't make sense as an element of an IntervalIndex, though I don't think we actually preclude it, maybe we should. However implemented Period as a sub-class of Intevral and PeriodIndex as a sub-class of IntervalIndex, does make sense (though may not be straightforward to actually do this).

@jreback jreback merged commit 2a1ef8c into pandas-dev:master Oct 7, 2018
@jreback
Copy link
Contributor

jreback commented Oct 7, 2018

thanks @jschendel

@jschendel jschendel deleted the ivtree-i8 branch October 8, 2018 17:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Indexing Related to indexing on series/frames, not to indexes themselves Interval Interval data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: indexing in datetime IntervalIndex with duplicate values fails
4 participants