Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
skipfooter doesn't really "skip" in read_csv #13879
from pandas import read_csv from pandas.compat import StringIO data = 'a,b,c\ncat,foo,bar\ndog,foo,"baz' # Note the stray quotation mark read_csv(StringIO(data), engine='python', skipfooter=1) ... _csv.Error: unexpected end of data
If we were truly "skipping" the last row, no error should have been raised. However, this occurs because the data is all parsed in memory first with Python's
Whether this is intended behaviour or not has implications for the C engine in terms of implementing analogous
added a commit
Nov 26, 2016
referenced this issue
Nov 26, 2016
If this feature would be implemented in the C engine, I would expect it to work in this case, so that the skipped lines need not to parse correctly. But I am not sure if this is actually possible?
Questions on how to treat quotations marks (are they parsed or not to determine the number of lines to skip .. ?) similar as those recent issues about skiprows will also come up. So for this to be consistent, they maybe need to get parsed to some extent?
@jorisvandenbossche : You are correct. This code should not break, though whether it's possible is another story, as some parsing might be needed. But in any case, not sure yet how to implement for the C engine, though that can be dealt with separately from this issue.