Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: IndexError on read_csv/read_table when using usecols/names parameters and omitting last column #5766

Closed
wrenoud opened this issue Dec 23, 2013 · 4 comments
Labels
Bug Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@wrenoud
Copy link
Contributor

wrenoud commented Dec 23, 2013

Example code:

from StringIO import StringIO
import pandas as pd

names = ["a","b","c"]

data = """\
0,1,2
3,4,5
6,7,8"""

# usecols works as expected if all columns are named
print pd.read_csv(StringIO(data), header=None, usecols=[1,2], names=names)
print pd.read_csv(StringIO(data), header=None, usecols=[0,1], names=names)

# naming only columns selected with usecols works when last column is included
print pd.read_csv(StringIO(data), header=None, usecols=[1,2], names=names[1:])
# causes IndexError
print pd.read_csv(StringIO(data), header=None, usecols=[0,1], names=names[:-1])

Output:

   b  c
0  1  2
1  4  5
2  7  8

[3 rows x 2 columns]
   a  b
0  0  1
1  3  4
2  6  7

[3 rows x 2 columns]
   b  c
0  1  2
1  4  5
2  7  8

[3 rows x 2 columns]
Traceback (most recent call last):
  File "pandas_test2.py", line 18, in <module>
    print pd.read_csv(StringIO(data), header=None, usecols=[0,1], names=names[:-1])
  File "/home/weston/pandas/pandas/io/parsers.py", line 404, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/home/weston/pandas/pandas/io/parsers.py", line 212, in _read
    return parser.read()
  File "/home/weston/pandas/pandas/io/parsers.py", line 610, in read
    ret = self._engine.read(nrows)
  File "/home/weston/pandas/pandas/io/parsers.py", line 1050, in read
    data = self._reader.read(nrows)
  File "parser.pyx", line 727, in pandas.parser.TextReader.read (pandas/parser.c:6475)
  File "parser.pyx", line 749, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:6695)
  File "parser.pyx", line 824, in pandas.parser.TextReader._read_rows (pandas/parser.c:7517)
  File "parser.pyx", line 902, in pandas.parser.TextReader._convert_column_data (pandas/parser.c:8296)
  File "parser.pyx", line 1139, in pandas.parser.TextReader._get_column_name (pandas/parser.c:11353)
IndexError: list index out of range

print_versions.py output:

INSTALLED VERSIONS
------------------
Python: 2.7.3.final.0
OS: Linux 3.2.0-51-generic #77-Ubuntu SMP Wed Jul 24 20:21:10 UTC 2013 i686
byteorder: little
LC_ALL: None
LANG: en_CA.UTF-8

pandas: 0.13.0rc1-119-g2485e09
Cython: 0.15.1
Numpy: 1.6.1
Scipy: 0.9.0
statsmodels: Not installed
    patsy: Not installed
scikits.timeseries: Not installed
dateutil: 1.5
pytz: 2011k
bottleneck: Not installed
PyTables: Not Installed
    numexpr: Not Installed
matplotlib: 1.1.1rc
openpyxl: Not installed
xlrd: Not installed
xlwt: Not installed
xlsxwriter: Not installed
sqlalchemy: Not installed
lxml: Not installed
bs4: Not installed
html5lib: Not installed
bigquery: Not installed
apiclient: Not installed
wrenoud added a commit to wrenoud/pandas that referenced this issue Dec 23, 2013
This is an issue in read_csv/read_table  where there is no header and
both usecols and names and assigned but the last column is not included.
This caused an IndexError after reaching the last column specified in usecols.
@wrenoud
Copy link
Contributor Author

wrenoud commented Dec 23, 2013

This originated from the stackoverflow question "IndexError when trying to read_table with pandas"

@wrenoud
Copy link
Contributor Author

wrenoud commented Dec 24, 2013

Updated the example to the simplest case.

@ghost
Copy link

ghost commented Dec 25, 2013

I can repro.
Bisected to d05f3b1 #4406
Looking into it.

@ghost
Copy link

ghost commented Dec 25, 2013

merged #5770, just in time for christmas. Cheers.

@ghost ghost closed this as completed Dec 25, 2013
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

No branches or pull requests

1 participant