-
-
Notifications
You must be signed in to change notification settings - Fork 17.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: indexing in datetime IntervalIndex with duplicate values fails #20636
Comments
@avnovikov Can you provide a reproducible example? (in this case, provide example code for |
I can reproduce the error on master with a duplicated In [2]: ii = pd.interval_range(pd.Timestamp('20180101'), periods=2).repeat(2)
In [3]: ii
Out[3]:
IntervalIndex([(2018-01-01, 2018-01-02], (2018-01-01, 2018-01-02], (2018-01-02, 2018-01-03], (2018-01-02, 2018-01-03]]
closed='right',
dtype='interval[datetime64[ns]]')
In [4]: ii.get_loc(pd.Timestamp('20180102'))
---------------------------------------------------------------------------
KeyError: ('datetime64[ns]', 'right') |
OK, and for numeric intervals, this works correctly:
|
I suspect this will fail for dtypes other than pandas/pandas/_libs/intervaltree.pxi.in Lines 202 to 208 in 4e6aa1c
Using In [2]: ii = pd.interval_range(pd.Timedelta('0 days'), periods=2).repeat(2)
In [3]: ii
Out[3]:
IntervalIndex([(0 days 00:00:00, 1 days 00:00:00], (0 days 00:00:00, 1 days 00:00:00], (1 days 00:00:00, 2 days 00:00:00], (1 days 00:00:00, 2 days 00:00:00]]
closed='right',
dtype='interval[timedelta64[ns]]')
In [4]: ii.get_loc(pd.Timedelta('1 day'))
---------------------------------------------------------------------------
KeyError: ('timedelta64[ns]', 'right') Using In [11]: ii = pd.interval_range(1, periods=2).repeat(2).astype('interval[uint64]')
In [12]: ii
Out[12]:
IntervalIndex([(1, 2], (1, 2], (2, 3], (2, 3]]
closed='right',
dtype='interval[uint64]')
In [13]: ii.get_loc(1.5)
---------------------------------------------------------------------------
KeyError: ('uint64', 'right') I'm guessing for datetimelike we'll need to do i8 conversion? Or is there a way to add that directly? I think |
Code Sample
Problem description
It is impossible to use non unique IntervalIndex with datetimes as start-end points. Both index.get_loc and DataFrame.loc produce the same error in IntervalTree.
Expected Output
Output of
pd.show_versions()
[paste the output of
pd.show_versions()
here below this line]INSTALLED VERSIONS
commit: None
pandas: 0.22.0
pytest: None
pip: 9.0.1
setuptools: 38.5.1
Cython: None
numpy: 1.14.2
scipy: 1.0.0
pyarrow: None
xarray: 0.10.2
IPython: 6.2.1
sphinx: 1.7.1
patsy: 0.5.0
dateutil: 2.7.0
pytz: 2018.3
blosc: None
bottleneck: 1.2.1
tables: None
numexpr: None
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: 1.2.5
pymysql: None
psycopg2: 2.7.4 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: