Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Pandas 0.19 read_csv with header=[0, 1] on an empty df throws error #14515
Comments
jorisvandenbossche
added the
IO CSV
label
Oct 27, 2016
|
@kaloramik So the change is not in In versions < 0.19.0, the file looks like:
while in 0.19.0 it looks like (what you showed above):
So previously there was an extra line with empty values. Reading this in with 0.19.0 still gives your desired result of an empty frame:
(however, something could be said this should actually give you one row of NaNs) So the change is in
while in 0.18.0 there was an extra line with comma's:
This was a bug (since you don't have any data, there should not be a line of missing values), and this bug was fixed in 0.19.0, see #6618 |
jorisvandenbossche
added this to the
No action
milestone
Oct 27, 2016
kaloramik
commented
Oct 27, 2016
|
@jorisvandenbossche hmm really? That's not what I'm seeing at all. Is it possible I have a package thats screwing something up? Can you post your pd.show_versions? But looking at the behavior, shouldn't the expected behavior be what I posted? As in, if you read in a file of length 2, and your headers are taken up to by 2 lines, then it should return an empty df with those columns. I believe the same behavior applies for a single header. The error message doesn't seem to make sense
it DOES have 2 lines in the file, so it should be able to construct the header. In addition, the source code has the following comment
According to the comment, the function should fail if the file has less than len(header) lines, implying that the function should succeed if len(header) == len(lines). Does that sound right? |
kaloramik
commented
Oct 27, 2016
|
Oh actually, scratch that, you are right about 0.18.1 returning an extra line of commas (And so the read_csv succeeds I guess) But this breaks behavior now, as in my data pipelines, I am unable to write then read empty dataframes as before. I think the above behavior I described is still the desired one? Unless you have better workarounds? ( I don't think replicating the old behavior by forcibly adding a row of commas would be a good idea) |
Possibly. But I am just pointing out that it is not a change in Apart from that, it is worth discussing if we should allow this. IMO returning an empty frame is indeed more logical to do. |
|
The bug fix in
Note that also for a single header, once you pass the
|
kaloramik
commented
Oct 27, 2016
|
Got it. Thanks for the clarification! Actually as a temporary workaround I guess forcing a write of an empty row on empty data frames should be ok. Do you know if there are any other workarounds, perhaps from the read side? |
|
Hmm, I don't directly see a workaround on the read side. If you want to end up with the multi-index, I don't think there is an easy solution. Probably easier to temporarily fix on the write side as you point out. |
jreback
modified the milestone: 0.19.2, No action
Nov 22, 2016
jreback
closed this
in f862b52
Nov 22, 2016
amolkahat
added a commit
to amolkahat/pandas
that referenced
this issue
Nov 26, 2016
|
|
+ amolkahat |
e48e61a
|
jorisvandenbossche
added a commit
to jorisvandenbossche/pandas
that referenced
this issue
Dec 14, 2016
|
|
+ jorisvandenbossche |
57482f7
|
kaloramik commentedOct 27, 2016
•
edited by jorisvandenbossche
Pandas 0.19 incorrectly handles empty dataframe files with multi index columns
What the file looks like
Expected Output
yields what we expect, an empty MultiIndex data frame
Throws
Expected Output
Output of
pd.show_versions()For pandas 0.81
For pandas 0.19