Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Adding engine_kwargs to Excel engines for issue #40274 #52214

Merged
merged 39 commits into from
Apr 12, 2023
Merged

ENH: Adding engine_kwargs to Excel engines for issue #40274 #52214

merged 39 commits into from
Apr 12, 2023

Conversation

rmhowe425
Copy link
Contributor

@rmhowe425 rmhowe425 commented Mar 26, 2023

@rmhowe425 rmhowe425 changed the title Adding engine_kwargs to Excel engines for issue #40274 ENH: Adding engine_kwargs to Excel engines for issue #40274 Mar 26, 2023
@mroeschke mroeschke added the IO Excel read_excel, to_excel label Mar 27, 2023
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

Ref: #40274 (comment)

I think we need to add documentation on which function pandas is using for each engine. I'd recommend adding it to the User Guide and then adding a link to this in the docstring.

Otherwise, generally looks good!

pandas/tests/io/excel/test_readers.py Outdated Show resolved Hide resolved
pandas/tests/io/excel/test_readers.py Show resolved Hide resolved
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes! There were a few things I missed on the first pass.

doc/source/user_guide/io.rst Outdated Show resolved Hide resolved
doc/source/whatsnew/v2.1.0.rst Outdated Show resolved Hide resolved
pandas/io/excel/_base.py Outdated Show resolved Hide resolved
pandas/io/excel/_base.py Outdated Show resolved Hide resolved
pandas/io/excel/_base.py Outdated Show resolved Hide resolved
pandas/tests/io/excel/test_readers.py Outdated Show resolved Hide resolved
pandas/tests/io/excel/test_readers.py Outdated Show resolved Hide resolved
@rmhowe425
Copy link
Contributor Author

rmhowe425 commented Mar 29, 2023

@rhshadrach Changes have been made! Please let me know if I have missed anything else!

Quick question. Once this PR is approved, could this also close out issue 43053

@rmhowe425 rmhowe425 closed this Mar 30, 2023
@rmhowe425 rmhowe425 reopened this Mar 30, 2023
@rmhowe425
Copy link
Contributor Author

@rhshadrach Wondering if I should submit a GH issue to implement the same feature for the to_excel method?

doc/source/user_guide/io.rst Outdated Show resolved Hide resolved
doc/source/user_guide/io.rst Outdated Show resolved Hide resolved
doc/source/user_guide/io.rst Outdated Show resolved Hide resolved
doc/source/user_guide/io.rst Outdated Show resolved Hide resolved
doc/source/whatsnew/v2.1.0.rst Outdated Show resolved Hide resolved
doc/source/whatsnew/v2.1.0.rst Outdated Show resolved Hide resolved
pandas/io/excel/_base.py Outdated Show resolved Hide resolved
pandas/io/excel/_base.py Outdated Show resolved Hide resolved
pandas/io/excel/_odfreader.py Outdated Show resolved Hide resolved
pandas/tests/io/excel/test_readers.py Outdated Show resolved Hide resolved
@rhshadrach
Copy link
Member

@rhshadrach Wondering if I should submit a GH issue to implement the same feature for the to_excel method?

Yea - I think that makes sense to do!

@rhshadrach
Copy link
Member

rhshadrach commented Mar 31, 2023

@rmhowe425 Please be aware that each time you merge main into this branch, it kicks off all the tests (30+hours of compute). This should be done only when needed - for example:

  • if there are conflicts
  • if you're aware of a PR that has been merged to main that touches on similar parts of the code
  • right before we merge this branch into main (to make sure all the tests still pass)

I don't think it needs to be done daily, or even once every couple of days.

@rmhowe425
Copy link
Contributor Author

@MarcoGorelli Looks like we're good to go! I think we just need your sign off?

@@ -71,9 +71,10 @@ to ``na_action=None``, like for all the other array types.

Other enhancements
^^^^^^^^^^^^^^^^^^
- :meth:`Categorical.map` and :meth:`CategoricalIndex.map` now have a ``na_action`` parameter.
:meth:`Categorical.map` implicitly had a default value of ``"ignore"`` for ``na_action``. This has formally been deprecated and will be changed to ``None`` in the future.
- :meth:`Categorical.map` implicitly had a default value of ``"ignore"`` for ``na_action``. This has formally been deprecated and will be changed to ``None`` in the future.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are a lot of unrelated entries modified in this diff? I think only one entry should have been added

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke Mistakes caused by handling merge conflicts. They should have been fixed from previous comments left by reviewers. Are we still seeing issues?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally for this pull request we should just see one new entry but other entries in this diff appear changed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke I'll pull the latest version of whatsnew/v2.1.0.rst from master, add my change for this PR and push. That should guarantee that all incorrect modifications to the file have been fixed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke Updated whatsnew/v2.1.0.rst should be good

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @rmhowe425

just left some comments

sorry for the conflicting reviews regarding removing the trailing period from the whatsnew note. if you wanted to handle that in a separate pull request, that would be welcome, but for this one, let's try to keep the diff minimal

elif read_ext[1:] == "ods":
msg = re.escape(r"load() got an unexpected keyword argument 'foo'")

if engine is not None and expected_defaults[read_ext[1:]]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the and expected_defaults[read_ext[1:]] condition necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary! Fixed!

Comment on lines 162 to 168
msg = re.escape(r"load_workbook() got an unexpected keyword argument 'foo'")

if read_ext[1:] == "xls" or read_ext[1:] == "xlsb":
msg = re.escape(r"open_workbook() got an unexpected keyword argument 'foo'")

elif read_ext[1:] == "ods":
msg = re.escape(r"load() got an unexpected keyword argument 'foo'")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the extra newlines makes this hard to read, how about

if read_ext[1:] == "xls" or read_ext[1:] == "xlsb":
    msg = ...
elif read_ext[1:] == "ods":
    msg = ...
else:
    msg = ...

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

@@ -339,7 +340,6 @@ Period
- Bug in :meth:`arrays.PeriodArray.map` and :meth:`PeriodIndex.map`, where the supplied callable operated array-wise instead of element-wise (:issue:`51977`)
- Bug in :func:`read_csv` not processing empty strings as a null value, with ``engine="pyarrow"`` (:issue:`52087`)
- Bug in :func:`read_csv` returning ``object`` dtype columns instead of ``float64`` dtype columns with ``engine="pyarrow"`` for columns that are all null with ``engine="pyarrow"`` (:issue:`52087`)
- Bug in incorrectly allowing construction of :class:`Period` or :class:`PeriodDtype` with :class:`CustomBusinessDay` freq; use :class:`BusinessDay` instead (:issue:`52534`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something went wrong when merging here, I presume you didn't mean to remove this line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcoGorelli Last night after I pushed the new whatsnew file there was a PR conflict and I removed that entry to fix the conflict. Shouldn't my PR only contain my contribution? Should I have added that entry to fix the conflict?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't my PR only contain my contribution?

yes, exactly - whereas currently, it's showing that you also deleted another line

please check https://github.com/pandas-dev/pandas/pull/52214/files to verify that the PR only contains your changes

Copy link
Contributor Author

@rmhowe425 rmhowe425 Apr 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcoGorelli Looking at that URL, it looks correct to me.

The PR only contains my changes. However, when I run into merge conflicts after other people's PRs are approved, do I need to add their changes when I examine the merge conflict diff, or remove them? In this case I removed them, which is why the line below is shaded in red. Previously I added them and that seemed to cause problems as well. hmmmm

Bug in incorrectly allowing construction of :class:Period or :class:PeriodDtype with :class:CustomBusinessDay freq; use :class:BusinessDay instead (:issue:52534)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

depends on the merge conflict - if it's a whatsnew note, you probably need to select "keep both changes"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood! Thank you for the clarity! I'll add the above entry back and keep this in mind moving forward!

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Looks like the previous comments have been addressed. This looks like it should be fine, but I'm not too familiar with the Excel code and so would prefer it if someone with more expertise in it were to merge

pandas/io/excel/_base.py Outdated Show resolved Hide resolved
@mroeschke mroeschke merged commit 7eeec0d into pandas-dev:main Apr 12, 2023
@mroeschke
Copy link
Member

Awesome! Thanks @rmhowe425

@rmhowe425
Copy link
Contributor Author

Thanks for all the help guys! Really appreciate it!

@samukweku
Copy link
Contributor

hi team, pls what pandas version is this targeted for? I cant find it on pandas 2.0.3

@rmhowe425
Copy link
Contributor Author

@samukweku 2.1.0, which I believe will be released later today. If you check the "Files Changed", you'll see an entry in whatsnew/v2.1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Excel read_excel, to_excel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: read_excel (xlrd engine) add parameter for ignore_workbook_corruption
5 participants