Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-monotonic-increasing DatetimeIndex claims not to __contain__ duplicate entries #9512

Closed
ischwabacher opened this issue Feb 18, 2015 · 4 comments · Fixed by #9525
Closed
Labels

Comments

@ischwabacher
Copy link
Contributor

This was fun to debug.

In [1]: import pandas as pd

In [2]: 0 in pd.Int64Index([0, 0, 1])
Out[2]: True

In [3]: 0 in pd.Int64Index([0, 1, 0])
Out[3]: True

In [4]: 0 in pd.Int64Index([0, 0, -1])
Out[4]: True

In [5]: pd.Timestamp(0) in pd.DatetimeIndex([0, 1, -1])
Out[5]: True

In [6]: pd.Timestamp(0) in pd.DatetimeIndex([0, 1, 0])
Out[6]: False   # BAD

In [7]: pd.Timestamp(0) in pd.DatetimeIndex([0, 0, 1])
Out[7]: True

In [8]: pd.Timestamp(0) in pd.DatetimeIndex([0, 0, -1])
Out[8]: False   # BAD

TimedeltaIndex is also broken.

The problem is in DatetimeIndexOpsMixin.__contains__, which checks the type of idx.get_loc(key) to determine whether the key was found in the index. If the index contains duplicate entries and is not monotonic increasing (for some reason, monotonic decreasing doesn't cut it), get_loc eventually falls back to Int64Engine._maybe_get_bool_indexer, which returns an ndarray of bools if the key is duplicated. Since the original __contains__ method is looking for scalars or slices, it reports that the duplicated entry is not present.

@shoyer shoyer added the Bug label Feb 18, 2015
@shoyer
Copy link
Member

shoyer commented Feb 18, 2015

Thanks for the report and the debugging. A PR to fix this would be very welcome!

@ischwabacher
Copy link
Contributor Author

I'm just afraid that some of the other indexing code accidentally depends on this bug and will blow up...

@shoyer
Copy link
Member

shoyer commented Feb 18, 2015

The fix here might be as simple as just adding or np.any(res) to that line... only way to find out is to try!

@ischwabacher
Copy link
Contributor Author

Truuuue....

The embarrassing truth is that I lost my SSH key for uploading to github, and haven't gotten around to regenerating it yet. So I won't be able to squash. /sheepish

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants