BUG: multi-index excel header fails if all numeric #11317

Closed
wants to merge 1 commit into
from

Conversation

Projects
None yet
2 participants
Contributor

Dr-Irv commented Oct 13, 2015

If the multi-line headers come from Excel, and the header was not a string, the line all(['Unnamed' in c[n] for c in columns]): will fail because c[n] is an int and not iterable. So either force the headers to be strings (the proposed change) or come up with some other test when looking for 'Unnamed'

@Dr-Irv Dr-Irv Update parsers.py
If the multi-line headers come from Excel, and the header was not a string, the line all(['Unnamed' in c[n] for c in columns]): will fail because c[n] is an int and not iterable.  So either force the headers to be strings (the proposed change) or come up with some other test when looking for 'Unnamed'
e429ca6
Contributor

jreback commented Oct 13, 2015

you need a test where this is actually failing

Contributor

Dr-Irv commented Oct 13, 2015

For some reason, I can't submit the simple XLSX file that illustrates the problem, or a CSV representation as a file. Here is the CSV version:
Id,A,A,B,C
is,1,2,1,3
1,13,23,33,43
2,24,44,64,84
3,35,65,95,125
4,46,86,126,166
5,57,107,157,207
6,68,128,188,248

Read this into excel, and save as a file "testmulti.xlsx"

Then this fails:
import pandas as pd
df=pd.read_excel('testmulti.xlsx',header=[0,1], index_col=0, parse_cols=4)

Contributor

jreback commented Oct 13, 2015

so add that as an example in test_excel.py. It should fail w/o your fix and pass afterwards

jreback changed the title from Update parsers.py to BUG: multi-index excel header fails if all numeric Oct 13, 2015

@Dr-Irv Dr-Irv added a commit to Dr-Irv/pandas that referenced this pull request Oct 14, 2015

@Dr-Irv Dr-Irv Fix for issue #11317
This includes updates to 3 Excel files, plus a test in test_excel.py,
plus the fix in parsers.py
6a4c85d
Contributor

Dr-Irv commented Oct 14, 2015

I had to create a new pull request with all the files that illustrate the bug and that the bug is fixed. pydata#11328

Contributor

jreback commented Oct 14, 2015

replaced by #11328

jreback closed this Oct 14, 2015

Dr-Irv deleted the Dr-Irv:patch-1 branch Oct 14, 2015

@Dr-Irv Dr-Irv added a commit to Dr-Irv/pandas that referenced this pull request Oct 16, 2015

@Dr-Irv Dr-Irv Merge remote-tracking branch 'refs/remotes/pydata/master' into Fix-for-
…#11317

Conflicts:
	doc/source/whatsnew/v0.17.1.txt

Bringing it up to date with the current master
7c0e6f7

jreback added this to the 0.17.1 milestone Oct 23, 2015

@Dr-Irv Dr-Irv added a commit to Dr-Irv/pandas that referenced this pull request Oct 24, 2015

@Dr-Irv @Dr-Irv Dr-Irv + Dr-Irv Fix for issue #11317
This includes updates to 3 Excel files, plus a test in test_excel.py,
plus the fix in parsers.py

issue when read_html with previous fix

With read_html, the fix didn't work on Python 2.7.  Handle the string
conversion correctly

Add bug fixed to what's new

Revert "Add bug fixed to what's new"

This reverts commit 05b2344.

Revert "issue when read_html with previous fix"

This reverts commit d1bc296.

Add what's new to describe bug.  fix issue with original fix

Added text to describe the bug.
Fixed issue so that it works correctly in Python 2.7

Add round trip test

Added round trip test and fixed error in writing sheets when
merge_cells=false and columns have multi index

DEPR: deprecate pandas.io.ga, #11308

DEPR: deprecate engine keyword from to_csv #11274

remove warnings from the tests for deprecation of engine in to_csv

PERF: Checking monotonic-ness before sorting on an index #11080

BUG: Bug in list-like indexing with a mixed-integer Index, #11320

Add hex color strings test

CLN: GH11271 move _get_handle, UTF encoders to io.common

TST: tests for list skiprows in read_excel

BUG: Fix to_dict() problem when using only datetime #11247

Fix a bug where to_dict() does not return Timestamp when there is only
datetime dtype present.

Undo change for when columns are multiindex

There is still something wrong here in the format of the file when there
are multiindex columns, but that's for another day

Fix formatting in test_excel and remove spurious test

See title

BUG: bug in comparisons vs tuples, #11339

bug#10442 : fix, adding note and test

BUG #10442(test) : Convert datetimelike index to strings with astype(str)

BUG#10422: note added

bug#10442 : tests added

bug#10442 : note udated

BUG #10442(test) : Convert datetimelike index to strings with astype(str)

bug#10442: fix, adding note and test

bug#10442: fix, adding note and test

Adjust test so that merge_cells=False works correctly

Adjust the test so that if merge_cells=false, it does a proper
formatting of the columns in the single row header, and puts the row
header in the first row

Fix test for Python 2.7 and 3.5

The test is failing on Python 2.7 and 3.5, which appears to read in the
values as floats, and I cannot replicate.  So force the tests to pass by
just making the column names equal when merge_cells=False

Fix for openpyxl < 2, and for issue #11408

If using openpyxl < 2, and value is a string that could be a number,
force a string to be written out.  If using openpyxl >= 2.2, then fix
issue #11408 to do with merging cells

Use set_value_explicit instead of set_explicit_value

set_value_explicit is in openpyxl 1.6, changed in openpyxl 1.8, but
there is code in 1.8 to set set_value_explicit to set_explicit_value for
compatibility

Add line in whatsnew for issue 11408

ENH: added capability to handle Path/LocalPath objects, #11033

DOC: typo in whatsnew/0.17.1.txt

PERF: Release GIL on some datetime ops

BUG: Bug in DataFrame.replace with a datetime64[ns, tz] and a non-compat to_replace #11326

CLN: clean up internal impl of fillna/replace, xref #11153

PERF: fast inf checking in to_excel

PERF: Series.dropna with non-nan dtypes

fixed pathlib tests on windows

DEPR: remove some SparsePanel deprecation warnings in testing

DEPR: avoid numpy comparison to None warnings

API: indexing with a null key will raise a TypeError rather than a ValueError, #11356

WARN: elementwise comparisons with index names, xref #11162

DEPR warning in io/data.py w.r.t. order->sort_values

WARN: more elementwise comparisons to object

WARN: more uncomparables of numeric array vs object

BUG: quick fix for #10989

TST: add test case from Issue #10989

API: add _to_safe_for_reshape to allow safe insert/append with embedded CategoricalIndexes

Signed-off-by: Jeff Reback <jeff@reback.net>

BLD: conda

Revert "BLD: conda"

This reverts commit 0c8a8e1.

TST: remove invalid symbol warnings

TST: move some tests to slow

TST: fix some warnings filters

TST: import pandas_datareader, use for tests

TST: remove some deprecation warnings from imports

DEPR: fix VisibleDeprecationWarnings in sparse

TST: remove some warnings in test_nanops

ENH: Improve the error message in to_gbq when the DataFrame schema does not match #11359

add libgfortran to 1.8.1 build

binstar -> anaconda

remove link to issue 11328 in whatsnew

Fixes to document issue in code, small efficiency fix

Try to resolve rebase conflict in whats new
4f62b99

@jreback jreback added a commit that referenced this pull request Oct 25, 2015

@Dr-Irv @jreback Dr-Irv + jreback Bug in read_excel with multi-index containing integers #11317
Bug in to_excel with openpyxl 2.2+ and merging #11408
d5a04c1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment