fix #54564 #54567

burnpanck · 2023-08-15T21:58:33Z

closes BUG: pd.read_excel with engine "xlrd" fails when a cell contains NaN/inf #54564
Tests added and passed
All code checks passed
Added type annotations to new arguments/methods/functions: No new functions.
Added an entry in the latest doc/source/whatsnew/v2.1.0.rst file: -> may this fix make it into a potential v2.0.4 instead?

burnpanck · 2023-08-15T22:27:38Z

Note that I had to add a new test6.xls file to support the test, as pandas doesn't write .xls files, and I didn't want to add xlwt to the dev requirements just to generate the file on the fly. However, all test1 to test5 test-files exist in various excel file variants. Should test6 be available in other versions too? How have those previous test files been generated?

pandas/io/excel/_xlrd.py

rhshadrach · 2023-08-16T18:44:34Z

doc/source/whatsnew/v2.1.0.rst

@@ -749,6 +749,7 @@ I/O
 - Bug in :func:`json_normalize`, fix json_normalize cannot parse metadata fields list type (:issue:`37782`)
 - Bug in :func:`read_csv` where it would error when ``parse_dates`` was set to a list or dictionary with ``engine="pyarrow"`` (:issue:`47961`)
 - Bug in :func:`read_csv`, with ``engine="pyarrow"`` erroring when specifying a ``dtype`` with ``index_col`` (:issue:`53229`)
+- Bug in :func:`read_excel`, with ``engine="xlrd"`` (``xls`` files) erroring when file contains NaNs/Infs (:issue:`54564`)


@mroeschke - should this be going into 2.2 at this point?

Yeah probably. (Maybe "higher impact" bugs can go in 2.1 still)

rhshadrach · 2023-08-18T12:48:31Z

pandas/io/excel/_xlrd.py

+                    # GH54564 - if the cell contents are NaN/Inf, we get an exception;
+                    # that is just another case where we don't want to convert.
+                    # The exception filter is quite general on purpose: whenever
+                    # the cell content cannot be converted to int - just don't.


This appears to be pretty verbose to me. Would something like "Don't convert NaN/Inf values" suffice?

pandas/tests/io/excel/test_xlrd.py

rhshadrach

lgtm; cc @mroeschke

mroeschke · 2023-08-18T20:23:25Z

Looks like the added test needs to handle Window's 32 bit default

rhshadrach · 2023-08-18T20:25:01Z

Yea, just noticed that. @burnpanck - NumPy has a bit of an oddity where ints default to 32 bit on Windows (they're fixing this in 2.0 I believe). Can you specify dtype="int64" when you create expected in your test.

https://github.com/pandas-dev/pandas/actions/runs/5902921221/job/16011817499?pr=54567#step:5:1041

Edit: I forgot this has a mix of columns. You can use astype on the integer one.

mroeschke · 2023-08-21T18:40:21Z

Thanks @burnpanck

added unit-test to highlight issue pandas-dev#54564

d3c7200

burnpanck marked this pull request as draft August 15, 2023 21:58

burnpanck added 2 commits August 16, 2023 00:13

fixed pandas-dev#54564

87fd933

added whatsnew entry

d033318

burnpanck changed the title ~~WIP: fix #54564~~ fix #54564 Aug 15, 2023

burnpanck marked this pull request as ready for review August 15, 2023 22:28

burnpanck requested a review from rhshadrach as a code owner August 15, 2023 22:28

mroeschke reviewed Aug 16, 2023

View reviewed changes

pandas/io/excel/_xlrd.py Show resolved Hide resolved

mroeschke added the IO Excel read_excel, to_excel label Aug 16, 2023

rhshadrach reviewed Aug 16, 2023

View reviewed changes

burnpanck added 2 commits August 17, 2023 19:42

anticipate this fix going into v2.2.0 instead of v2.1.0

2cf1b38

LBYL instead of EAFP

b773529

rhshadrach reviewed Aug 18, 2023

View reviewed changes

pandas/tests/io/excel/test_xlrd.py Outdated Show resolved Hide resolved

address review request

f0e92aa

rhshadrach approved these changes Aug 18, 2023

View reviewed changes

rhshadrach added Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Aug 18, 2023

rhshadrach added this to the 2.2 milestone Aug 18, 2023

fix tests on windows

aef4d2b

mroeschke approved these changes Aug 21, 2023

View reviewed changes

mroeschke merged commit 0b58c62 into pandas-dev:main Aug 21, 2023
37 of 38 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix #54564 #54567

fix #54564 #54567

burnpanck commented Aug 15, 2023 •

edited

Loading

burnpanck commented Aug 15, 2023

rhshadrach Aug 16, 2023

mroeschke Aug 16, 2023

rhshadrach Aug 18, 2023

rhshadrach left a comment

mroeschke commented Aug 18, 2023

rhshadrach commented Aug 18, 2023 •

edited

Loading

mroeschke commented Aug 21, 2023

fix #54564 #54567

fix #54564 #54567

Conversation

burnpanck commented Aug 15, 2023 • edited Loading

burnpanck commented Aug 15, 2023

rhshadrach Aug 16, 2023

Choose a reason for hiding this comment

mroeschke Aug 16, 2023

Choose a reason for hiding this comment

rhshadrach Aug 18, 2023

Choose a reason for hiding this comment

rhshadrach left a comment

Choose a reason for hiding this comment

mroeschke commented Aug 18, 2023

rhshadrach commented Aug 18, 2023 • edited Loading

mroeschke commented Aug 21, 2023

burnpanck commented Aug 15, 2023 •

edited

Loading

rhshadrach commented Aug 18, 2023 •

edited

Loading