skipfooter doesn't really "skip" in read_csv #13879

Closed
gfyoung opened this Issue Aug 2, 2016 · 2 comments

Comments

Projects
None yet
3 participants
Member

gfyoung commented Aug 2, 2016 edited

On master:

from pandas import read_csv
from pandas.compat import StringIO
data = 'a,b,c\ncat,foo,bar\ndog,foo,"baz'  # Note the stray quotation mark
read_csv(StringIO(data), engine='python', skipfooter=1)
...
_csv.Error: unexpected end of data

If we were truly "skipping" the last row, no error should have been raised. However, this occurs because the data is all parsed in memory first with Python's csv library.

Whether this is intended behaviour or not has implications for the C engine in terms of implementing analogous skipfooter behaviour. Or perhaps it has something to do with the fact that error_bad_lines and error_warn_lines parameters not with the Python engine?

xref #5232

jreback added this to the Next Major Release milestone Aug 2, 2016

@gfyoung gfyoung added a commit to gfyoung/pandas that referenced this issue Nov 26, 2016

@gfyoung gfyoung BUG: Improve error message for skipfooter malformed rows in Python en…
…gine

Python's native CSV library does not respect the
skipfooter parameter, so if one of those skipped
rows is malformed, it will still raise an error.

Closes gh-13879.
8ef5aa5

If this feature would be implemented in the C engine, I would expect it to work in this case, so that the skipped lines need not to parse correctly. But I am not sure if this is actually possible?

Questions on how to treat quotations marks (are they parsed or not to determine the number of lines to skip .. ?) similar as those recent issues about skiprows will also come up. So for this to be consistent, they maybe need to get parsed to some extent?

Member

gfyoung commented Nov 26, 2016

@jorisvandenbossche : You are correct. This code should not break, though whether it's possible is another story, as some parsing might be needed. But in any case, not sure yet how to implement for the C engine, though that can be dealt with separately from this issue.

@gfyoung gfyoung added a commit to gfyoung/pandas that referenced this issue Nov 28, 2016

@gfyoung gfyoung BUG: Improve error message for skipfooter malformed rows in Python en…
…gine

Python's native CSV library does not respect the
skipfooter parameter, so if one of those skipped
rows is malformed, it will still raise an error.

Closes gh-13879.
8bcfb77

@gfyoung gfyoung added a commit to gfyoung/pandas that referenced this issue Nov 28, 2016

@gfyoung gfyoung BUG: Improve error message for skipfooter malformed rows in Python en…
…gine

Python's native CSV library does not respect the
skipfooter parameter, so if one of those skipped
rows is malformed, it will still raise an error.

Closes gh-13879.
9b1d065

@gfyoung gfyoung added a commit to gfyoung/pandas that referenced this issue Nov 29, 2016

@gfyoung gfyoung BUG: Improve error message for skipfooter malformed rows in Python en…
…gine

Python's native CSV library does not respect the
skipfooter parameter, so if one of those skipped
rows is malformed, it will still raise an error.

Closes gh-13879.
8aae4fe

@jorisvandenbossche jorisvandenbossche added a commit that referenced this issue Nov 29, 2016

@gfyoung @jorisvandenbossche gfyoung + jorisvandenbossche BUG: Improve error message for skipfooter malformed rows in Python en…
…gine (#14749)

Python's native CSV library does not respect the
skipfooter parameter, so if one of those skipped
rows is malformed, it will still raise an error.

Closes gh-13879.
dfeae39

@jorisvandenbossche jorisvandenbossche added a commit that referenced this issue Dec 15, 2016

@gfyoung @jorisvandenbossche gfyoung + jorisvandenbossche [Backport #14749] BUG: Improve error message for skipfooter malformed…
… rows in Python engine (#14749)

Python's native CSV library does not respect the
skipfooter parameter, so if one of those skipped
rows is malformed, it will still raise an error.

Closes gh-13879.
(cherry picked from commit dfeae39)
8fda0c9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment