Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix #54564 #54567

Merged
merged 7 commits into from
Aug 21, 2023
Merged

fix #54564 #54567

merged 7 commits into from
Aug 21, 2023

Conversation

burnpanck
Copy link
Contributor

@burnpanck burnpanck commented Aug 15, 2023

@burnpanck burnpanck marked this pull request as draft August 15, 2023 21:58
@burnpanck
Copy link
Contributor Author

Note that I had to add a new test6.xls file to support the test, as pandas doesn't write .xls files, and I didn't want to add xlwt to the dev requirements just to generate the file on the fly. However, all test1 to test5 test-files exist in various excel file variants. Should test6 be available in other versions too? How have those previous test files been generated?

@burnpanck burnpanck changed the title WIP: fix #54564 fix #54564 Aug 15, 2023
@burnpanck burnpanck marked this pull request as ready for review August 15, 2023 22:28
@mroeschke mroeschke added the IO Excel read_excel, to_excel label Aug 16, 2023
pandas/io/excel/_xlrd.py Show resolved Hide resolved
@@ -749,6 +749,7 @@ I/O
- Bug in :func:`json_normalize`, fix json_normalize cannot parse metadata fields list type (:issue:`37782`)
- Bug in :func:`read_csv` where it would error when ``parse_dates`` was set to a list or dictionary with ``engine="pyarrow"`` (:issue:`47961`)
- Bug in :func:`read_csv`, with ``engine="pyarrow"`` erroring when specifying a ``dtype`` with ``index_col`` (:issue:`53229`)
- Bug in :func:`read_excel`, with ``engine="xlrd"`` (``xls`` files) erroring when file contains NaNs/Infs (:issue:`54564`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke - should this be going into 2.2 at this point?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah probably. (Maybe "higher impact" bugs can go in 2.1 still)

Comment on lines 125 to 128
# GH54564 - if the cell contents are NaN/Inf, we get an exception;
# that is just another case where we don't want to convert.
# The exception filter is quite general on purpose: whenever
# the cell content cannot be converted to int - just don't.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be pretty verbose to me. Would something like "Don't convert NaN/Inf values" suffice?

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm; cc @mroeschke

@rhshadrach rhshadrach added Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Aug 18, 2023
@rhshadrach rhshadrach added this to the 2.2 milestone Aug 18, 2023
@mroeschke
Copy link
Member

Looks like the added test needs to handle Window's 32 bit default

@rhshadrach
Copy link
Member

rhshadrach commented Aug 18, 2023

Yea, just noticed that. @burnpanck - NumPy has a bit of an oddity where ints default to 32 bit on Windows (they're fixing this in 2.0 I believe). Can you specify dtype="int64" when you create expected in your test.

https://github.com/pandas-dev/pandas/actions/runs/5902921221/job/16011817499?pr=54567#step:5:1041

Edit: I forgot this has a mix of columns. You can use astype on the integer one.

@mroeschke mroeschke merged commit 0b58c62 into pandas-dev:main Aug 21, 2023
37 of 38 checks passed
@mroeschke
Copy link
Member

Thanks @burnpanck

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO Excel read_excel, to_excel Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: pd.read_excel with engine "xlrd" fails when a cell contains NaN/inf
3 participants