Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: non-monotonic DatetimeIndex does not raise KeyError for missing labels #7827

Open
shoyer opened this issue Jul 24, 2014 · 10 comments
Open
Labels
Bug Error Reporting Incorrect or improved errors from pandas
Projects

Comments

@shoyer
Copy link
Member

shoyer commented Jul 24, 2014

import pandas as pd
index = pd.to_datetime(['2000-01-02', '2000-01-01'])
index.get_loc('1900-01-01')
array([], dtype=int64)
@jreback
Copy link
Contributor

jreback commented Jul 24, 2014

this is correct. It cannot tell its out of range because is not monotonic, so its effectively just a mask on the original. I suppose this could raise, what is the usecase?

Scalars raise KeyErrors when not found

In [29]: index.get_loc(Timestamp('2000-1-1'))
Out[29]: 1

In [30]: index.get_loc(Timestamp('1900-1-1'))
KeyError: Timestamp('1900-01-01 00:00:00')

@shoyer
Copy link
Member Author

shoyer commented Jul 24, 2014

Honestly in this case it was mostly a mistake -- non-monotonic time indexes are not very useful.

I think the right behavior is probably to raise a KeyError anytime an empty array would be returned from an indexing method. Returning a length 0 array in some cases means it is possible to end up with unexpected empty series and dataframes. Inevitably, you run into an error later (or worse, maybe not!), but it's harder to debug when it is further from the source.

@jreback jreback added this to the 0.15.0 milestone Jul 24, 2014
@immerrr
Copy link
Contributor

immerrr commented Jul 24, 2014

raise a KeyError anytime an empty array would be returned from an indexing method

Please, don't, it is a legitimate use case to work with empty datasets, e.g. that happens to me when I fill a frame with incoming data over time.

@shoyer
Copy link
Member Author

shoyer commented Jul 24, 2014

@immerrr I should clarify: by "indexing method" I guess I just really just meant get_loc (and perhaps get_locs for MultiIndex).

Yes, it's absolutely a legitimate case to work with empty datasets -- but do you expect to be able to index them with scalar labels?

@immerrr
Copy link
Contributor

immerrr commented Jul 24, 2014

Ah, ok, sure. I would indeed expect index.get_locs(timestr) and index.get_locs(Timestamp(timestr)) to behave the same, but I guess I just don't know the whole story.

@jreback
Copy link
Contributor

jreback commented Jul 24, 2014

see here: https://github.com/pydata/pandas/blob/master/pandas/tseries/index.py#L1200

I remember putting this in for a monotonic index a while back, where you can easily figure out if its out of range.

but didn't test (obviously) for non-monotonic.

its easy to figure out if its not their, and I am leaning that it should raise (as get_loc/s DOES always raise KeyError if all values are out of range, rather than return an empty).

all that said, I don't know if anything will (or should break).

@jreback
Copy link
Contributor

jreback commented Sep 9, 2014

@shoyer PR for this?

@jreback jreback modified the milestones: 0.15.1, 0.15.0 Sep 9, 2014
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@jbrockmendel jbrockmendel added this to Index Methods (e.g. get_loc) in Indexing Feb 18, 2020
@mroeschke mroeschke removed the Indexing Related to indexing on series/frames, not to indexes themselves label Apr 11, 2021
@NumberPiOso
Copy link
Contributor

Nowadays the error is correctly reported

index = pd.to_datetime(['2000-01-02', '2000-01-01'])
index.get_loc('1900-01-01')
File ~/git/mypandas/pandas/core/indexes/datetimes.py:681, in DatetimeIndex.get_loc(self, key, method, tolerance)
    679     return Index.get_loc(self, key, method, tolerance)
    680 except KeyError as err:
--> 681     raise KeyError(orig_key) from err

KeyError: '1900-01-01'

So this issue is solved and should be closed

@jreback
Copy link
Contributor

jreback commented Feb 21, 2022

see if we have a direct test for this (probably) if not would take a PR with one

@NumberPiOso
Copy link
Contributor

Indeed, it was highly probable

def test_get_loc_reasonable_key_error(self):
# GH#1062
index = DatetimeIndex(["1/3/2000"])
with pytest.raises(KeyError, match="2000"):
index.get_loc("1/1/2000")

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Error Reporting Incorrect or improved errors from pandas
Projects
No open projects
Indexing
Index Methods (e.g. get_loc)
Development

No branches or pull requests

6 participants