Handling of trailing delimiters in read_csv #2442

wesm · 2012-12-06T22:15:57Z

xref http://stackoverflow.com/questions/13719946/python-pandas-trailing-delimiter-confuses-read-csv

edwardw · 2012-12-08T13:09:47Z

To reproduce the bug, just create a two-line csv file. The first line is the header, without trailing delimiter. The second line is data, with trailing delimiter. Then read_csv will create a DataFrame in which headers and columns offset by one.

wesm · 2012-12-10T15:40:28Z

This is very annoying because the index/row name inference is very useful in most cases, but breaks down in the case where you have a malformed file. I'll think about it some

changhiskhan · 2012-12-10T17:53:11Z

Hmmm...custom dialect option?

wesm · 2012-12-10T19:38:45Z

Probably should have an option like index_col=False and deal with an empty column. I have the latest FEC file (which has ballooned--!!-- to 900+MB) to try it out

edwardw · 2012-12-10T20:06:58Z

While we are on it, may I suggest a feature about read_csv? The FEC file I used (it was 700+MB a week ago) was too large for 4GB memory I have on my macbook. If I try to read the file in one run, it would take forever because of page fault. But since not all 20 or so columns were used, I read file in 4 chunks, ditched half unused columns, appended it to a accumulator and ended up with a big DataFrame, which held all rows but only columns I was interested in.

So having an option to tell pandas.read_csv to only read certain columns could be very useful.

wesm · 2012-12-10T20:12:33Z

Already done in the development version of pandas-- you should install it. usecols option

wesm · 2012-12-10T21:22:40Z

I wrote a blog about this, enjoy: http://wesmckinney.com/blog/?p=635

johannesschweig · 2018-12-04T11:36:23Z

Blog link is dead.

wesm · 2018-12-04T15:20:06Z

see http://wesmckinney.com/blog/update-on-upcoming-pandas-v0-10-new-file-parser-other-performance-wins/

smcinerney · 2020-02-11T02:22:37Z

Trailing delimiters on data rows confusing the parser is still an unresolved issue as of 1.0:

import pandas as pd
from io import StringIO

bad_dat = """A,B\n1,2,\n"""
pd.read_csv(StringIO(bad_dat), sep=',', header=0, index_col='A')

Traceback (most recent call last):
  File "read_csv_trailing_delimiter_bug2.py", line 6, in <module>
    df = pd.read_csv(StringIO(bad_dat), sep=',', header=0, index_col='A')
  File "/opt/anaconda/lib/python3.7/site-packages/pandas/io/parsers.py", line 676, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/opt/anaconda/lib/python3.7/site-packages/pandas/io/parsers.py", line 454, in _read
    data = parser.read(nrows)
  File "/opt/anaconda/lib/python3.7/site-packages/pandas/io/parsers.py", line 1133, in read
    ret = self._engine.read(nrows)
  File "/opt/anaconda/lib/python3.7/site-packages/pandas/io/parsers.py", line 2078, in read
    values = data.pop(self.index_col[i])
KeyError: 'A'

and also, if we try to explicitly specify the (single) header row:

>>> pd.read_csv(StringIO(bad_dat), sep=',', header=[0], index_col='A')
...
ValueError: index_col must only contain row numbers when specifying a multi-index header

wesm closed this as completed in 648d581 Dec 10, 2012

vishnu2kmohan mentioned this issue Dec 23, 2012

read_csv, compression, and bad column names. #2535

Closed

foobarbecue mentioned this issue Sep 4, 2013

BUG: if first row is short, read_csv raises exception instead of filling with NaN #4749

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling of trailing delimiters in read_csv #2442

Handling of trailing delimiters in read_csv #2442

wesm commented Dec 6, 2012

edwardw commented Dec 8, 2012

wesm commented Dec 10, 2012

changhiskhan commented Dec 10, 2012

wesm commented Dec 10, 2012

edwardw commented Dec 10, 2012

wesm commented Dec 10, 2012

wesm commented Dec 10, 2012

johannesschweig commented Dec 4, 2018

wesm commented Dec 4, 2018

smcinerney commented Feb 11, 2020 •

edited

Loading

Handling of trailing delimiters in read_csv #2442

Handling of trailing delimiters in read_csv #2442

Comments

wesm commented Dec 6, 2012

edwardw commented Dec 8, 2012

wesm commented Dec 10, 2012

changhiskhan commented Dec 10, 2012

wesm commented Dec 10, 2012

edwardw commented Dec 10, 2012

wesm commented Dec 10, 2012

wesm commented Dec 10, 2012

johannesschweig commented Dec 4, 2018

wesm commented Dec 4, 2018

smcinerney commented Feb 11, 2020 • edited Loading

smcinerney commented Feb 11, 2020 •

edited

Loading