BUG: ValueError with Series.isin and tuples #16394

Closed
wmp3 opened this Issue May 20, 2017 · 3 comments

Comments

Projects
None yet
4 participants
@wmp3

wmp3 commented May 20, 2017

Code Sample, a copy-pastable example if possible

import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']})
df['C'] = list(zip(df['A'], df['B']))
df['C'].isin([(1, 'a')])

Problem description

Returns ValueError:
Traceback (most recent call last):
File "", line 1, in
File "/anaconda/envs/pandas_dev/lib/python3.6/site-packages/pandas/core/series.py", line 2555, in isin
result = algorithms.isin(_values_from_object(self), values)
File "/anaconda/envs/pandas_dev/lib/python3.6/site-packages/pandas/core/algorithms.py", line 421, in isin
return f(comps, values)
File "/anaconda/envs/pandas_dev/lib/python3.6/site-packages/pandas/core/algorithms.py", line 399, in
f = lambda x, y: htable.ismember_object(x, values)
File "pandas/_libs/hashtable_func_helper.pxi", line 428, in pandas._libs.hashtable.ismember_object (pandas/_libs/hashtable.c:29677)
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

Expected Output

In pandas 0.19.2 returns:
0 True
1 False
2 False
Name: C, dtype: bool

Output of pd.show_versions()

# Paste the output here pd.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Darwin OS-release: 16.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.20.0rc2
pytest: None
pip: 9.0.1
setuptools: 27.2.0
Cython: None
numpy: 1.12.1
scipy: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
pandas_gbq: None
pandas_datareader: None

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback May 20, 2017

Contributor

this code was refactored to be more general, so this was a missing case. easy fix I think. np.array converts nested tuples to lists, which is not nice, so do this.

if you'd like to submit a PR with this as an added tests (and make sure nothing else breaks), would be great.

diff --git a/pandas/core/algorithms.py b/pandas/core/algorithms.py
index a745ec6..77d79c9 100644
--- a/pandas/core/algorithms.py
+++ b/pandas/core/algorithms.py
@@ -388,7 +388,7 @@ def isin(comps, values):
                         "[{0}]".format(type(values).__name__))
 
     if not isinstance(values, (ABCIndex, ABCSeries, np.ndarray)):
-        values = np.array(list(values), dtype='object')
+        values = lib.list_to_object_array(list(values))
 
     comps, dtype, _ = _ensure_data(comps)
     values, _, _ = _ensure_data(values, dtype=dtype)
Contributor

jreback commented May 20, 2017

this code was refactored to be more general, so this was a missing case. easy fix I think. np.array converts nested tuples to lists, which is not nice, so do this.

if you'd like to submit a PR with this as an added tests (and make sure nothing else breaks), would be great.

diff --git a/pandas/core/algorithms.py b/pandas/core/algorithms.py
index a745ec6..77d79c9 100644
--- a/pandas/core/algorithms.py
+++ b/pandas/core/algorithms.py
@@ -388,7 +388,7 @@ def isin(comps, values):
                         "[{0}]".format(type(values).__name__))
 
     if not isinstance(values, (ABCIndex, ABCSeries, np.ndarray)):
-        values = np.array(list(values), dtype='object')
+        values = lib.list_to_object_array(list(values))
 
     comps, dtype, _ = _ensure_data(comps)
     values, _, _ = _ensure_data(values, dtype=dtype)

@jreback jreback added this to the Next Major Release milestone May 20, 2017

@jreback jreback changed the title from ValueError with Series.isin and tuples to BUG: ValueError with Series.isin and tuples May 20, 2017

@jorisvandenbossche jorisvandenbossche modified the milestones: 0.20.2, Next Major Release May 20, 2017

@jaredsnyder

This comment has been minimized.

Show comment
Hide comment
@jaredsnyder

jaredsnyder May 22, 2017

Contributor

I'm taking a crack at this. Is the solution to just add lib.list_to_object_array back in along with a test for the tuple case, or should we check if comps contains tuples and use lib.list_to_object_array only if it does?

Contributor

jaredsnyder commented May 22, 2017

I'm taking a crack at this. Is the solution to just add lib.list_to_object_array back in along with a test for the tuple case, or should we check if comps contains tuples and use lib.list_to_object_array only if it does?

@jorisvandenbossche

This comment has been minimized.

Show comment
Hide comment
@jorisvandenbossche

jorisvandenbossche May 22, 2017

Member

@jaredsnyder I think you can try the exact change that @jreback showed above, when it are not tuples, both approaches should normally do the same, so I don't think it is needed to check if it contains tuples or not. And for sure adding a test!

Member

jorisvandenbossche commented May 22, 2017

@jaredsnyder I think you can try the exact change that @jreback showed above, when it are not tuples, both approaches should normally do the same, so I don't think it is needed to check if it contains tuples or not. And for sure adding a test!

jorisvandenbossche added a commit that referenced this issue May 23, 2017

BUG: fix isin with Series of tuples values (#16394) (#16434)
* Swiched out "values = np.array(list(values), dtype='object')" for "values = lib.list_to_object_array(list(values))" in the isin() method found in core/algorithms.py
Added test for comparing to a list of tuples

pvomelveny added a commit to pvomelveny/pandas that referenced this issue May 23, 2017

BUG: fix isin with Series of tuples values (#16394) (#16434)
* Swiched out "values = np.array(list(values), dtype='object')" for "values = lib.list_to_object_array(list(values))" in the isin() method found in core/algorithms.py
Added test for comparing to a list of tuples

pvomelveny added a commit to pvomelveny/pandas that referenced this issue May 23, 2017

BUG: fix isin with Series of tuples values (#16394) (#16434)
* Swiched out "values = np.array(list(values), dtype='object')" for "values = lib.list_to_object_array(list(values))" in the isin() method found in core/algorithms.py
Added test for comparing to a list of tuples

pvomelveny added a commit to pvomelveny/pandas that referenced this issue May 23, 2017

BUG: fix isin with Series of tuples values (#16394) (#16434)
* Swiched out "values = np.array(list(values), dtype='object')" for "values = lib.list_to_object_array(list(values))" in the isin() method found in core/algorithms.py
Added test for comparing to a list of tuples

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue May 29, 2017

BUG: fix isin with Series of tuples values (#16394) (#16434)
* Swiched out "values = np.array(list(values), dtype='object')" for "values = lib.list_to_object_array(list(values))" in the isin() method found in core/algorithms.py
Added test for comparing to a list of tuples

(cherry picked from commit e053ee3)

TomAugspurger added a commit that referenced this issue May 30, 2017

BUG: fix isin with Series of tuples values (#16394) (#16434)
* Swiched out "values = np.array(list(values), dtype='object')" for "values = lib.list_to_object_array(list(values))" in the isin() method found in core/algorithms.py
Added test for comparing to a list of tuples

(cherry picked from commit e053ee3)

stangirala added a commit to stangirala/pandas that referenced this issue Jun 11, 2017

BUG: fix isin with Series of tuples values (#16394) (#16434)
* Swiched out "values = np.array(list(values), dtype='object')" for "values = lib.list_to_object_array(list(values))" in the isin() method found in core/algorithms.py
Added test for comparing to a list of tuples
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment