BUG: to_csv extra header line with multiindex columns #6618

dsm054 · 2014-03-12T16:22:58Z

This seems strange to me, but I don't often use a MultiIndex so I might be missing something obvious.

>>> pd.__version__
'0.13.1-420-g6899ed6'
>>> df2 = pd.DataFrame([1], columns=pd.MultiIndex.from_arrays([[1],[2]]))
>>> df2
   1
   2
0  1

[1 rows x 1 columns]
>>> df2.columns
MultiIndex(levels=[[1], [2]],
           labels=[[0], [0]])
>>> print df2.to_csv()
,1
,2
,
0,1

Is there supposed to be that empty line at the end of the header? Compare

>>> print df2.to_csv(header=False)
0,1

The text was updated successfully, but these errors were encountered:

jreback · 2014-03-12T16:24:08Z

yes, its for the row index-names (they are None here), in theory could could not print it as the read_csv will try both ways, but that's the 'original' format.

dsm054 · 2014-03-12T16:28:50Z

Ah, okay.

jreback · 2014-03-12T16:29:51Z

@dsm054 I think its reasonable to do a PR which takes out the line and see if anything breaks....(obviously a tests which exactly is supposed to match won't), but I am talking about the read_csv should still work correctly.

and I guess its more in-line with what you'd except.

meloun · 2014-06-23T19:18:41Z

hi i have the same issue, any workarround how to not have this empty line there?

jreback · 2014-06-23T19:24:21Z

pandas will read this format
what version?

meloun · 2014-06-23T19:32:21Z

yes pandas will, but I need an output without this extra line (it's an input for other application)
see http://stackoverflow.com/questions/24372993/pandas-dataframe-with-2-rows-header-and-export-to-csv

jreback · 2014-06-23T20:05:06Z

you can use tupleize_cols=True to make the header in a single line

Closes pandas-devgh-6618.

closes #14515 This commit fixes a bug where `read_csv` failed when given a file with a multiindex header and empty content. Because pandas reads index names as a separate line following the header lines, the reader looks for the line with index names in it. If the content of the dataframe is empty, the reader will choke. This bug surfaced after #6618 stopped writing an extra line after multiindex columns, which led to a situation where pandas could write CSV's that it couldn't then read. This commit changes that behavior by explicitly checking if the index name row exists, and processing it correctly if it doesn't. Author: Ben Kandel <ben.kandel@gmail.com> Closes #14596 from bkandel/fix-parse-empty-df and squashes the following commits: 32e3b0a [Ben Kandel] lint e6b1237 [Ben Kandel] lint fedfff8 [Ben Kandel] fix multiindex column parsing 518982d [Ben Kandel] move to 0.19.2 fc23e5c [Ben Kandel] fix errant this_columns 3d9bbdd [Ben Kandel] whatsnew 68eadf3 [Ben Kandel] Modify test. 17e44dd [Ben Kandel] fix python parser too 72adaf2 [Ben Kandel] remove unnecessary test bfe0423 [Ben Kandel] typo 2f64d57 [Ben Kandel] pep8 b8200e4 [Ben Kandel] BUG: read_csv with empty df

closes pandas-dev#14515 This commit fixes a bug where `read_csv` failed when given a file with a multiindex header and empty content. Because pandas reads index names as a separate line following the header lines, the reader looks for the line with index names in it. If the content of the dataframe is empty, the reader will choke. This bug surfaced after pandas-dev#6618 stopped writing an extra line after multiindex columns, which led to a situation where pandas could write CSV's that it couldn't then read. This commit changes that behavior by explicitly checking if the index name row exists, and processing it correctly if it doesn't. Author: Ben Kandel <ben.kandel@gmail.com> Closes pandas-dev#14596 from bkandel/fix-parse-empty-df and squashes the following commits: 32e3b0a [Ben Kandel] lint e6b1237 [Ben Kandel] lint fedfff8 [Ben Kandel] fix multiindex column parsing 518982d [Ben Kandel] move to 0.19.2 fc23e5c [Ben Kandel] fix errant this_columns 3d9bbdd [Ben Kandel] whatsnew 68eadf3 [Ben Kandel] Modify test. 17e44dd [Ben Kandel] fix python parser too 72adaf2 [Ben Kandel] remove unnecessary test bfe0423 [Ben Kandel] typo 2f64d57 [Ben Kandel] pep8 b8200e4 [Ben Kandel] BUG: read_csv with empty df (cherry picked from commit f862b52)

ronanpaixao · 2017-10-25T13:08:43Z

It appears this bugs still manifests itself if using to_excel:

>>> df = pd.DataFrame([[1,2,3],[4,5,6]], columns=pd.MultiIndex.from_tuples([('A',''),('B','C'),('B','D')]))
>>> df
   A  B
      C  D
0  1  2  3
1  4  5  6
>>> df.to_excel("out.xlsx")

This outputs the spreadsheet with an additional (and pretty much useless) blank line (3):

This is some workaround, but is really worse for cases like my example, where the upper row in the MultiIndex gets repeated (see df['B']).

tsznxx · 2018-04-18T14:24:43Z

A workaround to fix this is to save the headers and table contents separately.

writer = pandas.ExcelWriter("test.xlsx")
# writer headers as data frame
df.columns.to_frame().transpose().to_excel(writer,"test")
# writer table body without headers
df.to_excel(writer,"test",header=False,startrow=2))
writer.save()
writer.close()

This trick can also be used when saving pandas style to Excel, because pandas style doesn't support multiindex.

jstefaniakk · 2020-01-13T12:43:32Z

The solution given above by tsznxx worked for me, however a line:
df.columns.to_frame().transpose().to_excel(writer,"test")
generates two rows of data with names of the columns and when I had less than 3 rows of data in my dataframe, the unnecessary columns data remained printed into the excel. Instead, I propose to first write your empty dataframe to excel :
df.drop(df.index).to_excel(writer,"test")
and then follow it up with dataframe data without headers. So the whole example would look like:

writer = pandas.ExcelWriter("test.xlsx")
# write only column names
df.drop(df.index).to_excel(writer,"test")
# writer table body without headers
df.to_excel(writer,"test",header=False,startrow=2))
writer.save()
writer.close()

dsm054 closed this as completed Mar 12, 2014

jreback reopened this Mar 12, 2014

jreback added CSV labels Mar 12, 2014

jreback added this to the 0.14.0 milestone Mar 12, 2014

jreback modified the milestones: 0.15.0, 0.14.0 Mar 28, 2014

jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015

chris-b1 mentioned this issue Sep 3, 2015

ENH: read_excel MultiIndex #4679 #10967

Merged

jreback mentioned this issue Sep 4, 2015

BUG? Parser adds empty MultiIndex level names #10984

Closed

This was referenced May 2, 2016

Extra empty row when saving to CSV with MultiIndex columns #13053

Closed

Unable to read MultiIndex columns from CSV if empty levels #13054

Closed

gfyoung mentioned this issue Sep 1, 2016

BUG: Don't print stray newline with MultiIndex #14132

Closed

gfyoung added a commit to forking-repos/pandas that referenced this issue Sep 2, 2016

BUG: Don't print stray newline with MultiIndex

d1a600f

Closes pandas-devgh-6618.

jreback modified the milestones: 0.19.0, Next Major Release Sep 2, 2016

jreback closed this as completed in 362a561 Sep 2, 2016

jorisvandenbossche mentioned this issue Oct 27, 2016

Pandas 0.19 read_csv with header=[0, 1] on an empty df throws error #14515

Closed

bkandel mentioned this issue Nov 15, 2016

Fix parse empty df #14596

Closed

4 tasks

ronanpaixao mentioned this issue Oct 25, 2017

multiindex column in to_excel #2701

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: to_csv extra header line with multiindex columns #6618

BUG: to_csv extra header line with multiindex columns #6618

dsm054 commented Mar 12, 2014

jreback commented Mar 12, 2014

dsm054 commented Mar 12, 2014

jreback commented Mar 12, 2014

meloun commented Jun 23, 2014

jreback commented Jun 23, 2014

meloun commented Jun 23, 2014

jreback commented Jun 23, 2014

ronanpaixao commented Oct 25, 2017

tsznxx commented Apr 18, 2018 •

edited

Loading

jstefaniakk commented Jan 13, 2020 •

edited

Loading

BUG: to_csv extra header line with multiindex columns #6618

BUG: to_csv extra header line with multiindex columns #6618

Comments

dsm054 commented Mar 12, 2014

jreback commented Mar 12, 2014

dsm054 commented Mar 12, 2014

jreback commented Mar 12, 2014

meloun commented Jun 23, 2014

jreback commented Jun 23, 2014

meloun commented Jun 23, 2014

jreback commented Jun 23, 2014

ronanpaixao commented Oct 25, 2017

tsznxx commented Apr 18, 2018 • edited Loading

jstefaniakk commented Jan 13, 2020 • edited Loading

tsznxx commented Apr 18, 2018 •

edited

Loading

jstefaniakk commented Jan 13, 2020 •

edited

Loading