Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should DatetimeIndex indexing with strings ever raise KeyError? #25803

Open
shoyer opened this issue Mar 20, 2019 · 3 comments
Open

Should DatetimeIndex indexing with strings ever raise KeyError? #25803

shoyer opened this issue Mar 20, 2019 · 3 comments
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Timeseries

Comments

@shoyer
Copy link
Member

shoyer commented Mar 20, 2019

With pandas 0.24:

In [1]: import pandas as pd

In [2]: s = pd.Series([1, 2, 3], pd.to_datetime(['2018-01-01', '2018-02-02T01:01', '2018-02-02T02:02']))

In [3]: s.loc['2018-01-01'].size
Out[3]: 1

In [4]: s.loc['2018-01-02'].size
Out[4]: 0

In [5]: s.loc['2018-02-02'].size
Out[5]: 2

In [6]: s.loc['2018-03-03'].size
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

TypeError: an integer is required

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
~/miniconda3/envs/xarray-py37/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2601             try:
-> 2602                 return self._engine.get_loc(key)
   2603             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine._date_check_type()

KeyError: '2018-03-03'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

TypeError: an integer is required

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
~/miniconda3/envs/xarray-py37/lib/python3.7/site-packages/pandas/core/indexes/datetimes.py in get_loc(self, key, method, tolerance)
    998         try:
--> 999             return Index.get_loc(self, key, method, tolerance)
   1000         except (KeyError, ValueError, TypeError):

~/miniconda3/envs/xarray-py37/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2603             except KeyError:
-> 2604                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2605         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine._date_check_type()

KeyError: '2018-03-03'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 1520035200000000000

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
~/miniconda3/envs/xarray-py37/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2601             try:
-> 2602                 return self._engine.get_loc(key)
   2603             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc()

KeyError: Timestamp('2018-03-03 00:00:00')

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 1520035200000000000

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
~/miniconda3/envs/xarray-py37/lib/python3.7/site-packages/pandas/core/indexes/datetimes.py in get_loc(self, key, method, tolerance)
   1011                     stamp = stamp.tz_localize(self.tz)
-> 1012                 return Index.get_loc(self, stamp, method, tolerance)
   1013             except KeyError:

~/miniconda3/envs/xarray-py37/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2603             except KeyError:
-> 2604                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2605         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc()

KeyError: Timestamp('2018-03-03 00:00:00')

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-6-92239fa9614e> in <module>()
----> 1 s.loc['2018-03-03'].size

~/miniconda3/envs/xarray-py37/lib/python3.7/site-packages/pandas/core/indexing.py in __getitem__(self, key)
   1498
   1499             maybe_callable = com.apply_if_callable(key, self.obj)
-> 1500             return self._getitem_axis(maybe_callable, axis=axis)
   1501
   1502     def _is_scalar_access(self, key):

~/miniconda3/envs/xarray-py37/lib/python3.7/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1911         # fall thru to straight lookup
   1912         self._validate_key(key, axis)
-> 1913         return self._get_label(key, axis=axis)
   1914
   1915

~/miniconda3/envs/xarray-py37/lib/python3.7/site-packages/pandas/core/indexing.py in _get_label(self, label, axis)
    135             # but will fail when the index is not present
    136             # see GH5667
--> 137             return self.obj._xs(label, axis=axis)
    138         elif isinstance(label, tuple) and isinstance(label[axis], slice):
    139             raise IndexingError('no slices here, handle elsewhere')

~/miniconda3/envs/xarray-py37/lib/python3.7/site-packages/pandas/core/generic.py in xs(self, key, axis, level, drop_level)
   3573                                                       drop_level=drop_level)
   3574         else:
-> 3575             loc = self.index.get_loc(key)
   3576
   3577             if isinstance(loc, np.ndarray):

~/miniconda3/envs/xarray-py37/lib/python3.7/site-packages/pandas/core/indexes/datetimes.py in get_loc(self, key, method, tolerance)
   1012                 return Index.get_loc(self, stamp, method, tolerance)
   1013             except KeyError:
-> 1014                 raise KeyError(key)
   1015             except ValueError as e:
   1016                 # list-like tolerance size must match target index size

KeyError: '2018-03-03'

(side note: this is quite the traceback!)

Bizarrely, whether indexing with a string raises a KeyError or returns an array of size 0 depends upon the value.

But more generally, does it ever make sense to raise an error? It's arguably more consistent to only return size 0 arrays.

xref #7827 and pydata/xarray#2825

@WillAyd
Copy link
Member

WillAyd commented Mar 22, 2019

Why wouldn't you want to raise a Key Error here? Due to assumed alignment on daily precision?

IMO it's unexpected for line 4 to return a 0 sized array

@shoyer
Copy link
Member Author

shoyer commented Mar 22, 2019

I guess this is similar to indexing with an index with duplicate values (which is probably a separate issue). It's nice to be able to rety on invariants, like the size of the result matching the number of matching values in the index.

KeyError makes sense for indexes without duplicates, because the alternative is returning a scalar, which isn't possible if there isn't a match.

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Mar 22, 2019

IMO it's unexpected for line 4 to return a 0 sized array

The explanation here is that the string '2018-01-02' is to be considered as a slice, because the resolution of the string is higher than the resolution of the index:

In [31]: s.index.resolution
Out[31]: 'minute'

In [32]: _, _, resolution = parsing.parse_time_string('2018-01-02', freq=None)

In [33]: resolution 
Out[33]: 'day'

So if the strings '2018-01-01', '2018-02-02' etc are considered as slices, why not '2018-03-03' ?
The only difference is that it is "out of range" for the index. But with normal slicing, out of range slice bounds return an empty object, and don't raise an error:

In [34]: s.iloc[0:2]
Out[34]: 
2018-01-01 00:00:00    1
2018-02-02 01:01:00    2
dtype: int64

In [35]: s.iloc[10:12]
Out[35]: Series([], dtype: int64)

So given that, I agree with @shoyer that it would be more consistent (and reliable) to return an empty object here instead of raising an error.
Although, Stephan, note that it would still depend on the resolution of the passed string (so it would still depend to a certain extent on the value of the key, and you can't be sure that whathever string will not raise an error, but at least for datetime strings of the same resolution, it wouldn't depend any more on the exact value).

@jbrockmendel jbrockmendel added the Indexing Related to indexing on series/frames, not to indexes themselves label Feb 22, 2020
@mroeschke mroeschke added the Bug label Apr 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Timeseries
Projects
None yet
Development

No branches or pull requests

5 participants