Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: to_csv extra header line with multiindex columns #6618

Closed
dsm054 opened this issue Mar 12, 2014 · 10 comments
Closed

BUG: to_csv extra header line with multiindex columns #6618

dsm054 opened this issue Mar 12, 2014 · 10 comments
Labels
IO CSV read_csv, to_csv Output-Formatting __repr__ of pandas objects, to_string
Milestone

Comments

@dsm054
Copy link
Contributor

dsm054 commented Mar 12, 2014

This seems strange to me, but I don't often use a MultiIndex so I might be missing something obvious.

>>> pd.__version__
'0.13.1-420-g6899ed6'
>>> df2 = pd.DataFrame([1], columns=pd.MultiIndex.from_arrays([[1],[2]]))
>>> df2
   1
   2
0  1

[1 rows x 1 columns]
>>> df2.columns
MultiIndex(levels=[[1], [2]],
           labels=[[0], [0]])
>>> print df2.to_csv()
,1
,2
,
0,1

Is there supposed to be that empty line at the end of the header? Compare

>>> print df2.to_csv(header=False)
0,1
@jreback
Copy link
Contributor

jreback commented Mar 12, 2014

yes, its for the row index-names (they are None here), in theory could could not print it as the read_csv will try both ways, but that's the 'original' format.

@dsm054
Copy link
Contributor Author

dsm054 commented Mar 12, 2014

Ah, okay.

@dsm054 dsm054 closed this as completed Mar 12, 2014
@jreback
Copy link
Contributor

jreback commented Mar 12, 2014

@dsm054 I think its reasonable to do a PR which takes out the line and see if anything breaks....(obviously a tests which exactly is supposed to match won't), but I am talking about the read_csv should still work correctly.

and I guess its more in-line with what you'd except.

@jreback jreback reopened this Mar 12, 2014
@jreback jreback added this to the 0.14.0 milestone Mar 12, 2014
@jreback jreback modified the milestones: 0.15.0, 0.14.0 Mar 28, 2014
@meloun
Copy link

meloun commented Jun 23, 2014

hi i have the same issue, any workarround how to not have this empty line there?

@jreback
Copy link
Contributor

jreback commented Jun 23, 2014

pandas will read this format
what version?

@meloun
Copy link

meloun commented Jun 23, 2014

yes pandas will, but I need an output without this extra line (it's an input for other application)
see http://stackoverflow.com/questions/24372993/pandas-dataframe-with-2-rows-header-and-export-to-csv

@jreback
Copy link
Contributor

jreback commented Jun 23, 2014

you can use tupleize_cols=True to make the header in a single line

@bkandel bkandel mentioned this issue Nov 15, 2016
4 tasks
jreback pushed a commit that referenced this issue Nov 22, 2016
closes #14515

This commit fixes a bug where `read_csv` failed when given a file with
a multiindex header and empty content. Because pandas reads index
names as a separate line following the header lines, the reader looks
for the line with index names in it. If the content of the dataframe
is empty, the reader will choke. This bug surfaced after
#6618 stopped writing an
extra line after multiindex columns, which led to a situation where
pandas could write CSV's that it couldn't then read.     This commit
changes that behavior by explicitly checking if the index name row
exists, and processing it correctly if it doesn't.

Author: Ben Kandel <ben.kandel@gmail.com>

Closes #14596 from bkandel/fix-parse-empty-df and squashes the following commits:

32e3b0a [Ben Kandel] lint
e6b1237 [Ben Kandel] lint
fedfff8 [Ben Kandel] fix multiindex column parsing
518982d [Ben Kandel] move to 0.19.2
fc23e5c [Ben Kandel] fix errant this_columns
3d9bbdd [Ben Kandel] whatsnew
68eadf3 [Ben Kandel] Modify test.
17e44dd [Ben Kandel] fix python parser too
72adaf2 [Ben Kandel] remove unnecessary test
bfe0423 [Ben Kandel] typo
2f64d57 [Ben Kandel] pep8
b8200e4 [Ben Kandel] BUG: read_csv with empty df
jorisvandenbossche pushed a commit to jorisvandenbossche/pandas that referenced this issue Dec 14, 2016
closes pandas-dev#14515

This commit fixes a bug where `read_csv` failed when given a file with
a multiindex header and empty content. Because pandas reads index
names as a separate line following the header lines, the reader looks
for the line with index names in it. If the content of the dataframe
is empty, the reader will choke. This bug surfaced after
pandas-dev#6618 stopped writing an
extra line after multiindex columns, which led to a situation where
pandas could write CSV's that it couldn't then read.     This commit
changes that behavior by explicitly checking if the index name row
exists, and processing it correctly if it doesn't.

Author: Ben Kandel <ben.kandel@gmail.com>

Closes pandas-dev#14596 from bkandel/fix-parse-empty-df and squashes the following commits:

32e3b0a [Ben Kandel] lint
e6b1237 [Ben Kandel] lint
fedfff8 [Ben Kandel] fix multiindex column parsing
518982d [Ben Kandel] move to 0.19.2
fc23e5c [Ben Kandel] fix errant this_columns
3d9bbdd [Ben Kandel] whatsnew
68eadf3 [Ben Kandel] Modify test.
17e44dd [Ben Kandel] fix python parser too
72adaf2 [Ben Kandel] remove unnecessary test
bfe0423 [Ben Kandel] typo
2f64d57 [Ben Kandel] pep8
b8200e4 [Ben Kandel] BUG: read_csv with empty df

(cherry picked from commit f862b52)
@ronanpaixao
Copy link

It appears this bugs still manifests itself if using to_excel:

>>> df = pd.DataFrame([[1,2,3],[4,5,6]], columns=pd.MultiIndex.from_tuples([('A',''),('B','C'),('B','D')]))
>>> df
   A  B
      C  D
0  1  2  3
1  4  5  6
>>> df.to_excel("out.xlsx")

This outputs the spreadsheet with an additional (and pretty much useless) blank line (3):
pandas_to_excel_bug

This is some workaround, but is really worse for cases like my example, where the upper row in the MultiIndex gets repeated (see df['B']).

@tsznxx
Copy link

tsznxx commented Apr 18, 2018

A workaround to fix this is to save the headers and table contents separately.

writer = pandas.ExcelWriter("test.xlsx")
# writer headers as data frame
df.columns.to_frame().transpose().to_excel(writer,"test")
# writer table body without headers
df.to_excel(writer,"test",header=False,startrow=2))
writer.save()
writer.close()

This trick can also be used when saving pandas style to Excel, because pandas style doesn't support multiindex.

@jstefaniakk
Copy link

jstefaniakk commented Jan 13, 2020

The solution given above by tsznxx worked for me, however a line:
df.columns.to_frame().transpose().to_excel(writer,"test")
generates two rows of data with names of the columns and when I had less than 3 rows of data in my dataframe, the unnecessary columns data remained printed into the excel. Instead, I propose to first write your empty dataframe to excel :
df.drop(df.index).to_excel(writer,"test")
and then follow it up with dataframe data without headers. So the whole example would look like:

writer = pandas.ExcelWriter("test.xlsx")
# write only column names
df.drop(df.index).to_excel(writer,"test")
# writer table body without headers
df.to_excel(writer,"test",header=False,startrow=2))
writer.save()
writer.close()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO CSV read_csv, to_csv Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

No branches or pull requests

6 participants