ENH: Adding engine_kwargs to Excel engines for issue #40274 #52214

rmhowe425 · 2023-03-26T02:02:06Z

closes ENH: read_excel (xlrd engine) add parameter for ignore_workbook_corruption #40274 (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

rhshadrach

Thanks for the PR!

I think we need to add documentation on which function pandas is using for each engine. I'd recommend adding it to the User Guide and then adding a link to this in the docstring.

Otherwise, generally looks good!

pandas/tests/io/excel/test_readers.py

rhshadrach

Thanks for the changes! There were a few things I missed on the first pass.

doc/source/user_guide/io.rst

doc/source/whatsnew/v2.1.0.rst

pandas/io/excel/_base.py

pandas/tests/io/excel/test_readers.py

rmhowe425 · 2023-03-29T03:15:38Z

@rhshadrach Changes have been made! Please let me know if I have missed anything else!

Quick question. Once this PR is approved, could this also close out issue 43053

rmhowe425 · 2023-03-30T18:41:20Z

@rhshadrach Wondering if I should submit a GH issue to implement the same feature for the to_excel method?

doc/source/user_guide/io.rst

doc/source/whatsnew/v2.1.0.rst

pandas/io/excel/_base.py

pandas/io/excel/_odfreader.py

pandas/tests/io/excel/test_readers.py

rhshadrach · 2023-03-31T03:07:41Z

@rhshadrach Wondering if I should submit a GH issue to implement the same feature for the to_excel method?

Yea - I think that makes sense to do!

rhshadrach · 2023-03-31T18:37:13Z

@rmhowe425 Please be aware that each time you merge main into this branch, it kicks off all the tests (30+hours of compute). This should be done only when needed - for example:

if there are conflicts
if you're aware of a PR that has been merged to main that touches on similar parts of the code
right before we merge this branch into main (to make sure all the tests still pass)

I don't think it needs to be done daily, or even once every couple of days.

…pre-commit hooks

rmhowe425 · 2023-04-08T15:23:58Z

@MarcoGorelli Looks like we're good to go! I think we just need your sign off?

mroeschke · 2023-04-10T20:10:00Z

doc/source/whatsnew/v2.1.0.rst

@@ -71,9 +71,10 @@ to ``na_action=None``, like for all the other array types.

 Other enhancements
 ^^^^^^^^^^^^^^^^^^
- :meth:`Categorical.map` and :meth:`CategoricalIndex.map` now have a ``na_action`` parameter.
-  :meth:`Categorical.map` implicitly had a default value of ``"ignore"`` for ``na_action``. This has formally been deprecated and will be changed to ``None`` in the future.
+- :meth:`Categorical.map` implicitly had a default value of ``"ignore"`` for ``na_action``. This has formally been deprecated and will be changed to ``None`` in the future.


Why are a lot of unrelated entries modified in this diff? I think only one entry should have been added

@mroeschke Mistakes caused by handling merge conflicts. They should have been fixed from previous comments left by reviewers. Are we still seeing issues?

Ideally for this pull request we should just see one new entry but other entries in this diff appear changed

@mroeschke I'll pull the latest version of whatsnew/v2.1.0.rst from master, add my change for this PR and push. That should guarantee that all incorrect modifications to the file have been fixed

@mroeschke Updated whatsnew/v2.1.0.rst should be good

MarcoGorelli

thanks @rmhowe425

just left some comments

sorry for the conflicting reviews regarding removing the trailing period from the whatsnew note. if you wanted to handle that in a separate pull request, that would be welcome, but for this one, let's try to keep the diff minimal

MarcoGorelli · 2023-04-11T13:47:27Z

pandas/tests/io/excel/test_readers.py

+        elif read_ext[1:] == "ods":
+            msg = re.escape(r"load() got an unexpected keyword argument 'foo'")
+
+        if engine is not None and expected_defaults[read_ext[1:]]:


why is the and expected_defaults[read_ext[1:]] condition necessary?

Not necessary! Fixed!

MarcoGorelli · 2023-04-11T13:48:18Z

pandas/tests/io/excel/test_readers.py

+        msg = re.escape(r"load_workbook() got an unexpected keyword argument 'foo'")
+
+        if read_ext[1:] == "xls" or read_ext[1:] == "xlsb":
+            msg = re.escape(r"open_workbook() got an unexpected keyword argument 'foo'")
+
+        elif read_ext[1:] == "ods":
+            msg = re.escape(r"load() got an unexpected keyword argument 'foo'")


the extra newlines makes this hard to read, how about

if read_ext[1:] == "xls" or read_ext[1:] == "xlsb": msg = ... elif read_ext[1:] == "ods": msg = ... else: msg = ...

?

MarcoGorelli · 2023-04-11T13:48:53Z

doc/source/whatsnew/v2.1.0.rst

@@ -339,7 +340,6 @@ Period
 - Bug in :meth:`arrays.PeriodArray.map` and :meth:`PeriodIndex.map`, where the supplied callable operated array-wise instead of element-wise (:issue:`51977`)
 - Bug in :func:`read_csv` not processing empty strings as a null value, with ``engine="pyarrow"`` (:issue:`52087`)
 - Bug in :func:`read_csv` returning ``object`` dtype columns instead of ``float64`` dtype columns with ``engine="pyarrow"`` for columns that are all null with ``engine="pyarrow"`` (:issue:`52087`)
- Bug in incorrectly allowing construction of :class:`Period` or :class:`PeriodDtype` with :class:`CustomBusinessDay` freq; use :class:`BusinessDay` instead (:issue:`52534`)


something went wrong when merging here, I presume you didn't mean to remove this line?

@MarcoGorelli Last night after I pushed the new whatsnew file there was a PR conflict and I removed that entry to fix the conflict. Shouldn't my PR only contain my contribution? Should I have added that entry to fix the conflict?

Shouldn't my PR only contain my contribution?

yes, exactly - whereas currently, it's showing that you also deleted another line

please check https://github.com/pandas-dev/pandas/pull/52214/files to verify that the PR only contains your changes

@MarcoGorelli Looking at that URL, it looks correct to me.

The PR only contains my changes. However, when I run into merge conflicts after other people's PRs are approved, do I need to add their changes when I examine the merge conflict diff, or remove them? In this case I removed them, which is why the line below is shaded in red. Previously I added them and that seemed to cause problems as well. hmmmm

Bug in incorrectly allowing construction of :class:Period or :class:PeriodDtype with :class:CustomBusinessDay freq; use :class:BusinessDay instead (:issue:52534)

depends on the merge conflict - if it's a whatsnew note, you probably need to select "keep both changes"

Understood! Thank you for the clarity! I'll add the above entry back and keep this in mind moving forward!

…/read_excel

MarcoGorelli

Nice!

Looks like the previous comments have been addressed. This looks like it should be fine, but I'm not too familiar with the Excel code and so would prefer it if someone with more expertise in it were to merge

pandas/io/excel/_base.py

mroeschke · 2023-04-12T15:52:13Z

Awesome! Thanks @rmhowe425

rmhowe425 · 2023-04-12T16:22:06Z

Thanks for all the help guys! Really appreciate it!

samukweku · 2023-08-30T11:01:09Z

hi team, pls what pandas version is this targeted for? I cant find it on pandas 2.0.3

rmhowe425 · 2023-08-30T12:35:13Z

@samukweku 2.1.0, which I believe will be released later today. If you check the "Files Changed", you'll see an entry in whatsnew/v2.1.0

rmhowe425 changed the title ~~Adding engine_kwargs to Excel engines for issue #40274~~ ENH: Adding engine_kwargs to Excel engines for issue #40274 Mar 26, 2023

mroeschke requested a review from rhshadrach March 27, 2023 20:58

mroeschke added the IO Excel read_excel, to_excel label Mar 27, 2023

rhshadrach requested changes Mar 27, 2023

View reviewed changes

pandas/tests/io/excel/test_readers.py Outdated Show resolved Hide resolved

pandas/tests/io/excel/test_readers.py Show resolved Hide resolved

rmhowe425 requested a review from rhshadrach March 28, 2023 20:31

rhshadrach requested changes Mar 28, 2023

View reviewed changes

rmhowe425 requested a review from rhshadrach March 29, 2023 03:15

rmhowe425 closed this Mar 30, 2023

Fixing merge conflicts

817199f

rmhowe425 reopened this Mar 30, 2023

rmhowe425 and others added 2 commits March 29, 2023 23:35

Fixing merge conflict

1333165

Merge branch 'main' into dev/read_excel

1cc54cd

rmhowe425 added 2 commits March 30, 2023 17:12

Merge branch 'main' into dev/read_excel

8391425

Merge branch 'main' into dev/read_excel

bec1da2

rhshadrach requested changes Mar 31, 2023

View reviewed changes

rmhowe425 and others added 3 commits March 30, 2023 23:40

Merge branch 'pandas-dev:main' into dev/read_excel

c0988d6

Fixing documentation issues

2267d30

Merge branch 'main' into dev/read_excel

db13a39

rmhowe425 and others added 8 commits April 1, 2023 11:12

Merge branch 'pandas-dev:main' into dev/read_excel

229954e

standardized usage of engine_kwargs, fixed unit tests & doc strings

14b4be0

Fixing documentation issues

057d5a2

Fixing implementation logic and unit tests

c05f182

Fixing implementation logic

9065261

Fixing formatting issues

45589bb

Fixing error for test Docstring validation, typing, and other manual …

93c6e60

…pre-commit hooks

Fixing documentation error

d60aa97

rhshadrach approved these changes Apr 8, 2023

View reviewed changes

Merge branch 'main' into dev/read_excel

cef90f4

mroeschke reviewed Apr 10, 2023

View reviewed changes

rmhowe425 requested a review from mroeschke April 10, 2023 21:47

rmhowe425 and others added 5 commits April 10, 2023 19:07

Fixing documentation issues

f692c8e

Fixing formatting errors

96c6fe0

Fixing formatting errors

0391c9f

Fixing formatting errors

f2c8e2a

Merge branch 'main' into dev/read_excel

af55880

MarcoGorelli requested changes Apr 11, 2023

View reviewed changes

rmhowe425 added 2 commits April 11, 2023 11:25

Fixing logic and formatting issues in unit tests

679ab4b

Merge branch 'dev/read_excel' of github.com:rmhowe425/pandas into dev…

f9be828

…/read_excel

MarcoGorelli mentioned this pull request Apr 11, 2023

STYLE sort whatsnew entries alphabeticaly, allow for trailing full stops #52598

Merged

rmhowe425 added 2 commits April 11, 2023 12:27

Fixing issues with merge conflict

3412af0

Fixing formatting issue

f379120

mroeschke approved these changes Apr 11, 2023

View reviewed changes

rmhowe425 requested a review from MarcoGorelli April 11, 2023 18:29

MarcoGorelli approved these changes Apr 12, 2023

View reviewed changes

mroeschke reviewed Apr 12, 2023

View reviewed changes

pandas/io/excel/_base.py Outdated Show resolved Hide resolved

Update pandas/io/excel/_base.py

8d7933c

mroeschke approved these changes Apr 12, 2023

View reviewed changes

mroeschke merged commit 7eeec0d into pandas-dev:main Apr 12, 2023

rmhowe425 mentioned this pull request Jun 25, 2023

ENH: Allow for passing in engine_kwargs to read_excel #43053

Closed

rhshadrach mentioned this pull request Oct 7, 2023

BUG: "with pd.ExcelWriter" produces a corrupt Excel file in case of .xlsm extension #44868

Open

3 tasks

rmhowe425 mentioned this pull request Nov 26, 2023

Refactor shodan convert command to use Pandas achillean/shodan-python#200

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Adding engine_kwargs to Excel engines for issue #40274 #52214

ENH: Adding engine_kwargs to Excel engines for issue #40274 #52214

rmhowe425 commented Mar 26, 2023 •

edited

Loading

rhshadrach left a comment

rhshadrach left a comment

rmhowe425 commented Mar 29, 2023 •

edited

Loading

rmhowe425 commented Mar 30, 2023

rhshadrach commented Mar 31, 2023

rhshadrach commented Mar 31, 2023 •

edited

Loading

rmhowe425 commented Apr 8, 2023

mroeschke Apr 10, 2023

rmhowe425 Apr 10, 2023

mroeschke Apr 10, 2023

rmhowe425 Apr 10, 2023

rmhowe425 Apr 11, 2023

MarcoGorelli left a comment

MarcoGorelli Apr 11, 2023

rmhowe425 Apr 11, 2023

MarcoGorelli Apr 11, 2023

rmhowe425 Apr 11, 2023

MarcoGorelli Apr 11, 2023

rmhowe425 Apr 11, 2023

MarcoGorelli Apr 11, 2023

rmhowe425 Apr 11, 2023 •

edited

Loading

MarcoGorelli Apr 11, 2023

rmhowe425 Apr 11, 2023

MarcoGorelli left a comment

mroeschke commented Apr 12, 2023

rmhowe425 commented Apr 12, 2023

samukweku commented Aug 30, 2023

rmhowe425 commented Aug 30, 2023

ENH: Adding engine_kwargs to Excel engines for issue #40274 #52214

ENH: Adding engine_kwargs to Excel engines for issue #40274 #52214

Conversation

rmhowe425 commented Mar 26, 2023 • edited Loading

rhshadrach left a comment

Choose a reason for hiding this comment

rhshadrach left a comment

Choose a reason for hiding this comment

rmhowe425 commented Mar 29, 2023 • edited Loading

rmhowe425 commented Mar 30, 2023

rhshadrach commented Mar 31, 2023

rhshadrach commented Mar 31, 2023 • edited Loading

rmhowe425 commented Apr 8, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MarcoGorelli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rmhowe425 Apr 11, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MarcoGorelli left a comment

Choose a reason for hiding this comment

mroeschke commented Apr 12, 2023

rmhowe425 commented Apr 12, 2023

samukweku commented Aug 30, 2023

rmhowe425 commented Aug 30, 2023

rmhowe425 commented Mar 26, 2023 •

edited

Loading

rmhowe425 commented Mar 29, 2023 •

edited

Loading

rhshadrach commented Mar 31, 2023 •

edited

Loading

rmhowe425 Apr 11, 2023 •

edited

Loading