Pandas exporting to Excel (xls, xlsx) with multilevel columns #11408

dinya · 2015-10-22T06:54:32Z

Hello all,

I want collect few tables into the one with pandas. I use the code presented below:

import pandas as pd
import itertools
import types

df = None
for frame in ['x', 'y']:
    df_ = pd.read_excel(u'%s.xlsx' % frame)
    df_ = df_.set_index([u'time'])

    parameters = list(df_.columns)
    tuples = []
    for tup in itertools.product([frame,], parameters):
        tuples.append(tup)

    columns = pd.MultiIndex.from_tuples(tuples, names=[u'Frames',u'Parameters'])

    df_new = pd.DataFrame(columns=columns, index=df_.index)
    for par in parameters:
        df_new[frame, par] = df_[par]
    del df_

    if df is None:
        df = df_new
    else:
        df = pd.concat([df, df_new], axis=1)

df.to_excel('merged_xlsx.xlsx')
df.to_excel('merged_xls.xls')

Source data is x.xlsx

and y.xlsx files.

XLS engine works well (merged_xls.xls):

But something is wrong (cells merging) with XLSX engine (merged_xlsx.xlsx):

Manual cells unmerging works in Excel:

Is it bug in XLSX pandas engine (openpyxl)? Or what is wrong in my code?

Versions: python-2.7.10, pandas-0.17.0, openpyxl-2.3.0.

P.S. This issue is copy of my question on stackoverflow. It was suggested as bug and was adviced to post here.

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2015-10-22T09:05:05Z

@dinya Thanks for the report!

It is always easier if you provide an easier reproducible example (as we cannot run your code above), but I made a small one:

df = pd.DataFrame(np.random.randn(5, 6), 
                  columns=pd.MultiIndex.from_product([['x', 'y'], ['a', 'b', 'c']], names=[u'Frames',u'Parameters']), 
                  index=pd.Index(range(5), name='time'))
df.to_excel('test.xls')
df.to_excel('test.xlsx', engine='xlsxwriter')
df.to_excel('test2.xlsx', engine='openpyxl')

Can you see if this reproduces your issue?

I can't reproduce it with 0.17.0, but I was using openpyxl 1.6.1

dinya · 2015-10-22T09:32:16Z

@jorisvandenbossche,
Ok, thanks for advice, I'll be more careful next time

I run your code and get the following:

test.xls looks ok
test.xlsx looks ok (I get xlsxwriter==0.7.7 from pip)
test2.xlsx looks the same bad with cells merging troubles

jorisvandenbossche · 2015-10-22T12:07:49Z

Yes, I can confirm it with a more recent version of openpyxl

jorisvandenbossche · 2015-10-22T12:08:37Z

cc @chris-b1 another multi-index one

Dr-Irv · 2015-10-22T16:06:09Z

I just tested this with Python 3.4.3, pandas 0.17.0, xlsxwriter 0.7.3, and openpyxl 1.8.5 and cannot replicate. I tested with Python 3.4.3, a dev copy of pandas, xlsxwriter 0.7.6, and openpyxl 2.0.2 and cannot replicate. I tested with Python 2.7.10, a dev copy of pandas, xlsxwriter 0.7.6, and openpyxl 1.6.2, and cannot replicate.

However, I tested with Python 2.7.10, a dev copy of pandas, xlsxwriter 0.7.6, and openpyxl 2.3, and I do replicate.

I wonder it this relates to my pull request and different behaviors based on python/openpyxl combinations detailed here (scroll to the bottom): #11328

@chris-b1 would like your opinion

Dr-Irv · 2015-10-22T17:28:09Z

I found the bug in pandas/io/excel.py:_Openpyxl22Writer.write_cells and have fixed it in my branch that is in the pull request #11328 . So if one of you guys (@chris-b1, @jreback ) will answer my question there, I'll do a commit and fire off the tests and this bug can be put to rest.

Dr-Irv · 2015-10-22T17:30:35Z

@dinya If you use a version of openpyxl earlier than 2.2, then your problem will disappear with pandas 0.17.0 .

If using openpyxl < 2, and value is a string that could be a number, force a string to be written out. If using openpyxl >= 2.2, then fix issue pandas-dev#11408 to do with merging cells

This includes updates to 3 Excel files, plus a test in test_excel.py, plus the fix in parsers.py issue when read_html with previous fix With read_html, the fix didn't work on Python 2.7. Handle the string conversion correctly Add bug fixed to what's new Revert "Add bug fixed to what's new" This reverts commit 05b2344. Revert "issue when read_html with previous fix" This reverts commit d1bc296. Add what's new to describe bug. fix issue with original fix Added text to describe the bug. Fixed issue so that it works correctly in Python 2.7 Add round trip test Added round trip test and fixed error in writing sheets when merge_cells=false and columns have multi index DEPR: deprecate pandas.io.ga, pandas-dev#11308 DEPR: deprecate engine keyword from to_csv pandas-dev#11274 remove warnings from the tests for deprecation of engine in to_csv PERF: Checking monotonic-ness before sorting on an index pandas-dev#11080 BUG: Bug in list-like indexing with a mixed-integer Index, pandas-dev#11320 Add hex color strings test CLN: GH11271 move _get_handle, UTF encoders to io.common TST: tests for list skiprows in read_excel BUG: Fix to_dict() problem when using only datetime pandas-dev#11247 Fix a bug where to_dict() does not return Timestamp when there is only datetime dtype present. Undo change for when columns are multiindex There is still something wrong here in the format of the file when there are multiindex columns, but that's for another day Fix formatting in test_excel and remove spurious test See title BUG: bug in comparisons vs tuples, pandas-dev#11339 bug#10442 : fix, adding note and test BUG pandas-dev#10442(test) : Convert datetimelike index to strings with astype(str) BUG#10422: note added bug#10442 : tests added bug#10442 : note udated BUG pandas-dev#10442(test) : Convert datetimelike index to strings with astype(str) bug#10442: fix, adding note and test bug#10442: fix, adding note and test Adjust test so that merge_cells=False works correctly Adjust the test so that if merge_cells=false, it does a proper formatting of the columns in the single row header, and puts the row header in the first row Fix test for Python 2.7 and 3.5 The test is failing on Python 2.7 and 3.5, which appears to read in the values as floats, and I cannot replicate. So force the tests to pass by just making the column names equal when merge_cells=False Fix for openpyxl < 2, and for issue pandas-dev#11408 If using openpyxl < 2, and value is a string that could be a number, force a string to be written out. If using openpyxl >= 2.2, then fix issue pandas-dev#11408 to do with merging cells Use set_value_explicit instead of set_explicit_value set_value_explicit is in openpyxl 1.6, changed in openpyxl 1.8, but there is code in 1.8 to set set_value_explicit to set_explicit_value for compatibility Add line in whatsnew for issue 11408 ENH: added capability to handle Path/LocalPath objects, pandas-dev#11033 DOC: typo in whatsnew/0.17.1.txt PERF: Release GIL on some datetime ops BUG: Bug in DataFrame.replace with a datetime64[ns, tz] and a non-compat to_replace pandas-dev#11326 CLN: clean up internal impl of fillna/replace, xref pandas-dev#11153 PERF: fast inf checking in to_excel PERF: Series.dropna with non-nan dtypes fixed pathlib tests on windows DEPR: remove some SparsePanel deprecation warnings in testing DEPR: avoid numpy comparison to None warnings API: indexing with a null key will raise a TypeError rather than a ValueError, pandas-dev#11356 WARN: elementwise comparisons with index names, xref pandas-dev#11162 DEPR warning in io/data.py w.r.t. order->sort_values WARN: more elementwise comparisons to object WARN: more uncomparables of numeric array vs object BUG: quick fix for pandas-dev#10989 TST: add test case from Issue pandas-dev#10989 API: add _to_safe_for_reshape to allow safe insert/append with embedded CategoricalIndexes Signed-off-by: Jeff Reback <jeff@reback.net> BLD: conda Revert "BLD: conda" This reverts commit 0c8a8e1. TST: remove invalid symbol warnings TST: move some tests to slow TST: fix some warnings filters TST: import pandas_datareader, use for tests TST: remove some deprecation warnings from imports DEPR: fix VisibleDeprecationWarnings in sparse TST: remove some warnings in test_nanops ENH: Improve the error message in to_gbq when the DataFrame schema does not match pandas-dev#11359 add libgfortran to 1.8.1 build binstar -> anaconda remove link to issue 11328 in whatsnew Fixes to document issue in code, small efficiency fix Try to resolve rebase conflict in whats new

Bug in to_excel with openpyxl 2.2+ and merging #11408

jreback · 2015-10-25T14:03:18Z

closed by #11328

jorisvandenbossche added the IO Excel read_excel, to_excel label Oct 22, 2015

jorisvandenbossche added the Bug label Oct 22, 2015

jorisvandenbossche added this to the 0.17.1 milestone Oct 22, 2015

Dr-Irv mentioned this issue Oct 23, 2015

Fix for BUG: multi-index excel header fails if all numeric #11328

Closed

jreback pushed a commit that referenced this issue Oct 25, 2015

Bug in read_excel with multi-index containing integers #11317

d5a04c1

Bug in to_excel with openpyxl 2.2+ and merging #11408

jreback closed this as completed Oct 25, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pandas exporting to Excel (xls, xlsx) with multilevel columns #11408

Pandas exporting to Excel (xls, xlsx) with multilevel columns #11408

dinya commented Oct 22, 2015

jorisvandenbossche commented Oct 22, 2015

dinya commented Oct 22, 2015

jorisvandenbossche commented Oct 22, 2015

jorisvandenbossche commented Oct 22, 2015

Dr-Irv commented Oct 22, 2015

Dr-Irv commented Oct 22, 2015

Dr-Irv commented Oct 22, 2015

jreback commented Oct 25, 2015

Pandas exporting to Excel (xls, xlsx) with multilevel columns #11408

Pandas exporting to Excel (xls, xlsx) with multilevel columns #11408

Comments

dinya commented Oct 22, 2015

jorisvandenbossche commented Oct 22, 2015

dinya commented Oct 22, 2015

jorisvandenbossche commented Oct 22, 2015

jorisvandenbossche commented Oct 22, 2015

Dr-Irv commented Oct 22, 2015

Dr-Irv commented Oct 22, 2015

Dr-Irv commented Oct 22, 2015

jreback commented Oct 25, 2015