New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

skipfooter doesn't really "skip" in read_csv #13879

Closed
gfyoung opened this Issue Aug 2, 2016 · 2 comments

Comments

Projects
None yet
3 participants
@gfyoung
Member

gfyoung commented Aug 2, 2016

On master:

from pandas import read_csv
from pandas.compat import StringIO
data = 'a,b,c\ncat,foo,bar\ndog,foo,"baz'  # Note the stray quotation mark
read_csv(StringIO(data), engine='python', skipfooter=1)
...
_csv.Error: unexpected end of data

If we were truly "skipping" the last row, no error should have been raised. However, this occurs because the data is all parsed in memory first with Python's csv library.

Whether this is intended behaviour or not has implications for the C engine in terms of implementing analogous skipfooter behaviour. Or perhaps it has something to do with the fact that error_bad_lines and error_warn_lines parameters not with the Python engine?

xref #5232

@jreback jreback added this to the Next Major Release milestone Aug 2, 2016

gfyoung added a commit to gfyoung/pandas that referenced this issue Nov 26, 2016

BUG: Improve error message for skipfooter malformed rows in Python en…
…gine

Python's native CSV library does not respect the
skipfooter parameter, so if one of those skipped
rows is malformed, it will still raise an error.

Closes gh-13879.
@jorisvandenbossche

This comment has been minimized.

Show comment
Hide comment
@jorisvandenbossche

jorisvandenbossche Nov 26, 2016

Member

If this feature would be implemented in the C engine, I would expect it to work in this case, so that the skipped lines need not to parse correctly. But I am not sure if this is actually possible?

Questions on how to treat quotations marks (are they parsed or not to determine the number of lines to skip .. ?) similar as those recent issues about skiprows will also come up. So for this to be consistent, they maybe need to get parsed to some extent?

Member

jorisvandenbossche commented Nov 26, 2016

If this feature would be implemented in the C engine, I would expect it to work in this case, so that the skipped lines need not to parse correctly. But I am not sure if this is actually possible?

Questions on how to treat quotations marks (are they parsed or not to determine the number of lines to skip .. ?) similar as those recent issues about skiprows will also come up. So for this to be consistent, they maybe need to get parsed to some extent?

@gfyoung

This comment has been minimized.

Show comment
Hide comment
@gfyoung

gfyoung Nov 26, 2016

Member

@jorisvandenbossche : You are correct. This code should not break, though whether it's possible is another story, as some parsing might be needed. But in any case, not sure yet how to implement for the C engine, though that can be dealt with separately from this issue.

Member

gfyoung commented Nov 26, 2016

@jorisvandenbossche : You are correct. This code should not break, though whether it's possible is another story, as some parsing might be needed. But in any case, not sure yet how to implement for the C engine, though that can be dealt with separately from this issue.

gfyoung added a commit to gfyoung/pandas that referenced this issue Nov 28, 2016

BUG: Improve error message for skipfooter malformed rows in Python en…
…gine

Python's native CSV library does not respect the
skipfooter parameter, so if one of those skipped
rows is malformed, it will still raise an error.

Closes gh-13879.

gfyoung added a commit to gfyoung/pandas that referenced this issue Nov 28, 2016

BUG: Improve error message for skipfooter malformed rows in Python en…
…gine

Python's native CSV library does not respect the
skipfooter parameter, so if one of those skipped
rows is malformed, it will still raise an error.

Closes gh-13879.

gfyoung added a commit to gfyoung/pandas that referenced this issue Nov 29, 2016

BUG: Improve error message for skipfooter malformed rows in Python en…
…gine

Python's native CSV library does not respect the
skipfooter parameter, so if one of those skipped
rows is malformed, it will still raise an error.

Closes gh-13879.

jorisvandenbossche added a commit that referenced this issue Nov 29, 2016

BUG: Improve error message for skipfooter malformed rows in Python en…
…gine (#14749)

Python's native CSV library does not respect the
skipfooter parameter, so if one of those skipped
rows is malformed, it will still raise an error.

Closes gh-13879.

@jorisvandenbossche jorisvandenbossche modified the milestones: 0.19.2, Next Major Release Nov 29, 2016

jorisvandenbossche added a commit that referenced this issue Dec 15, 2016

[Backport #14749] BUG: Improve error message for skipfooter malformed…
… rows in Python engine (#14749)

Python's native CSV library does not respect the
skipfooter parameter, so if one of those skipped
rows is malformed, it will still raise an error.

Closes gh-13879.
(cherry picked from commit dfeae39)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment