Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checking that a pandas.Series.index contains a value #22085

Closed
samorani opened this issue Jul 27, 2018 · 7 comments

Comments

Projects
None yet
5 participants
@samorani
Copy link

commented Jul 27, 2018

Code Sample, a copy-pastable example if possible

# Your code here
import numpy as np
import pandas as pd

some_numbers = np.random.randint(0,4,size=10)
print(some_numbers)
s = pd.Series(some_numbers)
gb = s.groupby(s).size() / len(s)
print(gb)
1.3 in gb

Problem description

I reported it here. The output of the last line is True instead of being False.

Expected Output

False

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit: None
python: 3.6.2.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.3
pytest: 3.2.1
pip: 9.0.1
setuptools: 40.0.0
Cython: 0.26.1
numpy: 1.15.0
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.2.2
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 0.9.8
lxml: 3.8.0
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Jul 27, 2018

Thanks for the report!

A simpler example:

In [11]: 1.3 in pd.Index([0, 1, 2, 3])
Out[11]: True

it seems to be specific for the Int64Engine (float or object correctly give false)

@fjdiod

This comment has been minimized.

Copy link
Contributor

commented Jul 30, 2018

The problem seems to be here:

def __contains__(self, object val):
self._ensure_mapping_populated()
hash(val)
return val in self.mapping

for Int64Engine mapping is Int64HashTable

>>> from pandas._libs.hashtable import Int64HashTable
>>> ht = Int64HashTable(3)
>>> ht.map_locations(np.array([1, 2, 3]))
>>> 1.1 in ht
True
@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Aug 13, 2018

This is because of cython casting the input float to int.

Small example showing that behaviour:

In [7]: %load_ext cython

In [9]: %%cython
   ...: def contains(int val):
   ...:     return val in [1, 2, 3]

In [10]: contains(1)
Out[10]: True

In [11]: contains(5)
Out[11]: False

In [12]: contains(1.3)
Out[12]: True

In [13]: contains('a')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-13-0aaf064190fe> in <module>()
----> 1 contains('a')

_cython_magic_32cffc084fddd25975b4666794b79f3a.pyx in _cython_magic_32cffc084fddd25975b4666794b79f3a.contains()

TypeError: an integer is required

Not sure if you can let cython not do such a cast on demand? (cc @jreback)
Otherwise we should check the type of the value in/before calling the HashTable contains

@jreback

This comment has been minimized.

Copy link
Contributor

commented Aug 14, 2018

these get casted to int by cython
you need to check types with is_integer first

@a1shadows

This comment has been minimized.

Copy link

commented Aug 14, 2018

Can I take this issue? I want to start contributing to this repository.

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Aug 14, 2018

Yes, go ahead. If you have any questions related to the fix, don't hesitate to ask here.

@a1shadows

This comment has been minimized.

Copy link

commented Aug 14, 2018

@jorisvandenbossche thanks a lot.

yeojin-dev added a commit to yeojin-dev/pandas that referenced this issue Aug 15, 2018

a1shadows added a commit to a1shadows/pandas that referenced this issue Aug 15, 2018

@jreback jreback added the Indexing label Aug 16, 2018

yeojin-dev added a commit to yeojin-dev/pandas that referenced this issue Aug 16, 2018

yeojin-dev added a commit to yeojin-dev/pandas that referenced this issue Aug 16, 2018

@jreback jreback modified the milestones: Contributions Welcome, 0.24.0 Aug 22, 2018

yeojin-dev added a commit to yeojin-dev/pandas that referenced this issue Sep 5, 2018

yeojin-dev added a commit to yeojin-dev/pandas that referenced this issue Sep 5, 2018

yeojin-dev added a commit to yeojin-dev/pandas that referenced this issue Sep 5, 2018

yeojin-dev added a commit to yeojin-dev/pandas that referenced this issue Sep 5, 2018

yeojin-dev added a commit to yeojin-dev/pandas that referenced this issue Sep 5, 2018

yeojin-dev added a commit to yeojin-dev/pandas that referenced this issue Sep 5, 2018

jreback added a commit that referenced this issue Sep 19, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.