Regression from 0.19.2 to 0.20.1 in pandas.unique() when applied to list of tuples #16519

Closed
jotterbach opened this Issue May 27, 2017 · 1 comment

Comments

Projects
None yet
3 participants
@jotterbach

jotterbach commented May 27, 2017

Code Sample, a copy-pastable example if possible

import pandas as pd

input = [(0, 0), (0, 1), (1, 0), (1, 1), (0, 0), (0, 1), (1, 0), (1, 1)]
print pd.unique(input)

Problem description

The code exits unexpectedly

Traceback (most recent call last):
  File "pandas_bug.py", line 6, in <module>
    pd.unique(input)
  File "/Users/johannes/.virtualenvs/pandas/lib/python2.7/site-packages/pandas/core/algorithms.py", line 351, in unique
    uniques = table.unique(values)
  File "pandas/_libs/hashtable_class_helper.pxi", line 1271, in pandas._libs.hashtable.PyObjectHashTable.unique (pandas/_libs/hashtable.c:21384)
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

Expected Output

The code works on pandas version 0.19.2 and produces the expected output

[(0, 0) (0, 1) (1, 0) (1, 1)]

Moreover this problem is not limited to MacOSX, but was also encounter on Ubuntu CI server.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 2.7.10.final.0 python-bits: 64 OS: Darwin OS-release: 15.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None

pandas: 0.20.1
pytest: None
pip: 9.0.1
setuptools: 35.0.2
Cython: None
numpy: 1.12.1
scipy: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
pandas_gbq: None
pandas_datareader: None
None

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback May 30, 2017

Contributor

this is related to #16394 and needs the same fix, along with some tests; ensuring that nothing else breaks.

diff --git a/pandas/core/algorithms.py b/pandas/core/algorithms.py
index 77d79c9..9cfaf04 100644
--- a/pandas/core/algorithms.py
+++ b/pandas/core/algorithms.py
@@ -163,7 +163,7 @@ def _ensure_arraylike(values):
                                ABCIndexClass, ABCSeries)):
         inferred = lib.infer_dtype(values)
         if inferred in ['mixed', 'string', 'unicode']:
-            values = np.asarray(values, dtype=object)
+            values = lib.list_to_object_array(values)
         else:
             values = np.asarray(values)
     return values
Contributor

jreback commented May 30, 2017

this is related to #16394 and needs the same fix, along with some tests; ensuring that nothing else breaks.

diff --git a/pandas/core/algorithms.py b/pandas/core/algorithms.py
index 77d79c9..9cfaf04 100644
--- a/pandas/core/algorithms.py
+++ b/pandas/core/algorithms.py
@@ -163,7 +163,7 @@ def _ensure_arraylike(values):
                                ABCIndexClass, ABCSeries)):
         inferred = lib.infer_dtype(values)
         if inferred in ['mixed', 'string', 'unicode']:
-            values = np.asarray(values, dtype=object)
+            values = lib.list_to_object_array(values)
         else:
             values = np.asarray(values)
     return values

@jreback jreback added this to the 0.20.2 milestone May 30, 2017

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue May 30, 2017

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue May 31, 2017

jreback added a commit to TomAugspurger/pandas that referenced this issue May 31, 2017

@jreback jreback closed this in #16543 Jun 1, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment