New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: in Python3 MultiIndex.from_tuples cannot take "zipped" tuples #18434

Closed
Xbar opened this Issue Nov 22, 2017 · 8 comments

Comments

Projects
None yet
3 participants
@Xbar
Contributor

Xbar commented Nov 22, 2017

Code Sample, a copy-pastable example if possible

# This code fails in python 3
import pandas as pd
my_index = pd.MultiIndex.from_tuples(zip(['a', 'b'], ['c', 'd'], ['e', 'f']), names=['A','B', 'C'])

Problem description

The code above gives an Exception

TypeError: object of type 'zip' has no len()

Because in python3, unlike in python2, the return from zip is NOT a list and cannot get length.

In pandas, there are multiple instances in MultiIndex and related classes, where the code tries to get len() from the arguments, which are valid input but no longer have len property in python3.

Expected Output

Same as in python2
In the case above, should be

MultiIndex(levels=[['a', 'b'], ['c', 'd'], ['e', 'f']],
           labels=[[0, 1], [0, 1], [0, 1]],
           names=['A', 'B', 'C'])

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.21.0
pytest: None
pip: 9.0.1
setuptools: 36.5.0
Cython: None
numpy: 1.13.3
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.0.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@bashtage

This comment has been minimized.

Contributor

bashtage commented Nov 22, 2017

I'm not sure this is a bug. An iterator is not a list/sequence of tuples.

@Xbar

This comment has been minimized.

Contributor

Xbar commented Nov 22, 2017

Pandas is supposed to have consistent performance between python2 and python3. If the code works in python2, then it is a bug.

@bashtage

This comment has been minimized.

Contributor

bashtage commented Nov 22, 2017

zip in Python 2 returns a list. Pandas will work with list(zip(a,b)).

@bashtage

This comment has been minimized.

Contributor

bashtage commented Nov 22, 2017

To put it another way, the "bug" is in zip which changes its behavior. Pandas from_tuples requires a list or other sequence on either Python 2 or 3.

@Xbar

This comment has been minimized.

Contributor

Xbar commented Nov 22, 2017

Granted, the issue is due to different behavior of zip; but should iterators be supported here?

Python3 changed behaviors of series of functions, so iterators and alikes are encountered more often than before. Should they be taken as valid arguments?

@bashtage

This comment has been minimized.

Contributor

bashtage commented Nov 22, 2017

It is hard/impossible to allocate contiguous data blocks like NumPy arrays unless you know the data size (hence the need for len). This happens in both NumPy and Pandas with iterators.

@jreback

This comment has been minimized.

Contributor

jreback commented Nov 22, 2017

we generally support list-likes / iterables for most operations (for certain types of indexing a tuple is distinstince). so for the .from_product/tuple/array we say in the doc-string accept list / sequence, so I would be ok accepting an Iterable here.

@Xbar would you do a PR for this?

@jreback jreback added this to the Next Major Release milestone Nov 22, 2017

@Xbar

This comment has been minimized.

Contributor

Xbar commented Nov 22, 2017

An easy fix might be, at the beginning of "from_tuples", test if the argument is an instance of iterator, then convert it to a list. I can make the fix and test it.

Xbar added a commit to Xbar/pandas that referenced this issue Nov 24, 2017

Fix pandas-dev#18434
MultiIndex.from_tuples/from_arrays/from_product accept iterators in
python 3. Ensures compatibility
between 2 and 3.

@jreback jreback modified the milestones: Next Major Release, 0.22.0 Nov 24, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment