Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index.isin() always True for nans #7066

Closed
goyodiaz opened this issue May 7, 2014 · 6 comments · Fixed by #7068
Closed

Index.isin() always True for nans #7066

goyodiaz opened this issue May 7, 2014 · 6 comments · Fixed by #7068
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@goyodiaz
Copy link
Contributor

goyodiaz commented May 7, 2014

This looks like a regression to me, the following code used to print [ True], which I think is the correct behaviour, but prints [False] in recent development versions (0.13.1.dev).

import numpy as np
import pandas as pd

i = pd.Index([np.nan])
print i.isin({0})  # prints [ True], should be [False]

I do not think the following is relevant but just in case:

>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-24-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: es_ES.UTF-8

pandas: 0.13.1.dev
nose: 1.3.1
Cython: None
numpy: 1.8.1
scipy: 0.13.3
statsmodels: 0.6.0.dev-Unknown
IPython: 3.0.0-dev
sphinx: None
patsy: 0.2.1
scikits.timeseries: None
dateutil: 1.5
pytz: 2012c
bottleneck: 0.8.0
tables: 3.1.1
numexpr: 2.2.2
matplotlib: 1.3.1
openpyxl: 1.7.0
xlrd: 0.9.2
xlwt: 0.7.5
xlsxwriter: 0.5.2
lxml: 3.3.3
bs4: 4.2.1
html5lib: 0.999
bq: None
apiclient: None
rpy2: None
sqlalchemy: None
pymysql: None
psycopg2: 2.4.5 (dt dec mx pq3 ext)
@cpcloud
Copy link
Member

cpcloud commented May 7, 2014

This is intentional, to allow speedups in Float64Index

@cpcloud
Copy link
Member

cpcloud commented May 7, 2014

If you have a nan in your index and it can be converted to Float64Index it will. Before #6879, the backend was using object dtype, which allowed checks for nan internally, but isin relied on the behavior of Index which uses set. If you look at the beginning of the issue thread, I was pulling my hair out trying to figure out why nan behaves like an object in some cases and like a large integer in others.

Here's why:

It's because internally nan is represented as a large integer, but when you use it as an object, it's a pointer. When it's used as an "object" it's okay to use in a set and you'll get True if nan is in a set and False otherwise. If you use it like a float value, then each one is a different object (large-ish integers in Python behave like this as well). Of course, you can compare integers and other numbers so e.g., 1 == 1 is True and it would be in the set object. However, nan by definition has the property that nan == nan is False so comparison of id (address) returns False and so does equality and thus set membership is False for these "different" kinds of nans.

@cpcloud
Copy link
Member

cpcloud commented May 7, 2014

Oh wow, I just realized I misread what you wrote.

@jreback
Copy link
Contributor

jreback commented May 7, 2014

yeh I think mark as a bug

@cpcloud
Copy link
Member

cpcloud commented May 7, 2014

This is a bug.

@cpcloud cpcloud added Bug and removed Internals labels May 7, 2014
@cpcloud cpcloud added this to the 0.14.0 milestone May 7, 2014
@cpcloud cpcloud self-assigned this May 7, 2014
@cpcloud
Copy link
Member

cpcloud commented May 7, 2014

@goyodiaz Thanks for catching this! A dumb logic error on my part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants