Allow toxlsx() to add or replace a worksheet. #502

chrullrich · 2020-07-14T17:07:59Z

This PR has the objective of enabling toxlsx() to be used to produce xlsx files with multiple worksheets.

The single unit test I added is successful, but I was not able to get the entire test suite to run.

Changes

Added the overwrite argument to io.xlsx.toxlsx() and added a minimal docstring.

Checklist

Checklist for for pull requests including new code and/or changes to existing code...

coveralls · 2020-07-14T17:33:15Z

Coverage increased (+0.01%) to 91.925% when pulling 80e38e1 on chrullrich:toxlsx-add-sheet into 9ae3fc4 on petl-developers:master.

alimanfoo

Thanks @chrullrich, this looks good to me. Just had one question.

alimanfoo · 2020-07-15T16:44:24Z

petl/io/xlsx.py

@@ -86,15 +87,32 @@ def __iter__(self):
            pass


-def toxlsx(tbl, filename, sheet=None, write_header=True):
+def toxlsx(tbl, filename, sheet=None, write_header=True, overwrite=True):


Perhaps the default should be overwrite=False? This would mean that it's less likely a user will accidentally overwrite an entire workbook when all they wanted to do was replace a single sheet?

Compatibility with existing code that does not know about the new argument.

Setting the default to False would mean that a new sheet would always be added unless there was a sheet name argument that matched a sheet in the workbook. Worst case scenario, but not implausible: Two cron jobs exchange data through Excel files (for some reason that someone, somewhere, will have come up with). One uses toxlsx() without an explicit sheet name, the next one fromxlsx(), also without one. Then petl is upgraded. The file will keep growing and the second job will forevermore read only the first sheet containing the last data written before the upgrade.

I'm all for validating and tracking dependency versions, but that doesn't mean I actually do it ...

Nice improvement! 👍

Some options here:

Maybe change overwrite from True/False to:

overwrite the file

replace the sheet content in the file. If the sheet argument is None recreate the file.

append to the sheet content the file

Perhaps the default should be overwrite='replace' for reducing compatibility problems.

Calls with argument sheet with None will behave as currently.

Calls with argument sheet filled with constant names by multiple jobs will have some impact by only a fixed number of sheets.

Calls with argument sheet filled with random names by multiple jobs will be impacted and will keep growing. However this usage may not be common.

Sounds good. It feels a bit weird to replace a single sheet if one is named, and the entire file if not, but it does make sense. ~~However, replace with no sheet name also replaces overwrite because they are entirely identical, so the latter isn't needed.~~ (Nonsense. overwrite is needed to specify a sheet name.)

Calling the operation that adds sheets to existing files append invites confusion with appendxlsx() that adds rows to existing sheets, so I'd prefer to call it add.

Updated PR coming up, either today or tomorrow.

The "add" mode is tricky because openpyxl will silently uniqify a colliding sheet name when attempting to create it, i.e. the resulting sheet name may not be the one requested. For predictability, the exact name is enforced.

"replace" with a sheet name expects the file to either not be there, or to be a valid .xlsx file, not the empty file it used to get here.

juarezr · 2020-07-20T21:09:25Z

petl/io/xlsx.py

-    wb = openpyxl.Workbook(write_only=True)
-    ws = wb.create_sheet(title=sheet)
+
+    if not os.path.exists(filename) or mode == "overwrite" \


if not os.path.exists(filename)

Since v1.6.0 petl allows reading and writing files to remote sources like remote servers or cloud providers when also using fsspec package.

However, this code will work only with local files.

Until now that is not a problem because the codes of fromxls() and toxls() are not working with remote sources.

But the PR #506 adds support for remote sources and also conflict with this PR.

Any ideas of what is the best way to conciliate these two changes? 😃

So if I had sent the PR four weeks earlier, it would be your problem now? Thanks, I think. :-)

I have no clue at all how the remote sources code works, and cannot make any sense of it right now. I'll try to find the time.

Hi @chrullrich,

It's absolutely not a problem at all. 😃

When using remote sources, petl only call open/read/write on python file objects.

So os.path.exists() does not work when reading or writing to HTTP, S3, SMB remote sources for example.

Do you think there is any trick for not using os.path.exists() in this functionality?

My current plan is something like:

Waith for @alimanfoo reviewing Allow toxlsx() to add or replace a worksheet. #502

Find a workaround for os.path.exists()

Merge Allow reading and writing Excel files in remote sources #506

Adapt and merge this PR (Allow toxlsx() to add or replace a worksheet. #502)

juarezr · 2020-08-06T16:27:29Z

Hi @chrullrich ,

What you think about this possible resolution ?

juarezr · 2020-08-13T23:42:20Z

Closing this PR because it's changes were merged in #509.

Allow toxlsx() to add or replace a worksheet.

c34113b

alimanfoo approved these changes Jul 15, 2020

View reviewed changes

tradlux-chul added 2 commits July 20, 2020 11:55

Add "mode" argument as per discussion.

1d61dfa

The "add" mode is tricky because openpyxl will silently uniqify a colliding sheet name when attempting to create it, i.e. the resulting sheet name may not be the one requested. For predictability, the exact name is enforced.

Fix failing tests.

80e38e1

"replace" with a sheet name expects the file to either not be there, or to be a valid .xlsx file, not the empty file it used to get here.

juarezr reviewed Jul 20, 2020

View reviewed changes

juarezr mentioned this pull request Aug 6, 2020

Allow reading and writing Excel files in remote sources #506

Closed

14 tasks

juarezr mentioned this pull request Aug 7, 2020

Allow toxlsx() to overwrite add or replace a worksheet #509

Merged

14 tasks

juarezr closed this Aug 13, 2020

chrullrich deleted the toxlsx-add-sheet branch March 26, 2021 20:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow toxlsx() to add or replace a worksheet. #502

Allow toxlsx() to add or replace a worksheet. #502

chrullrich commented Jul 14, 2020 •

edited by juarezr

Loading

coveralls commented Jul 14, 2020 •

edited

Loading

alimanfoo left a comment

alimanfoo Jul 15, 2020

chrullrich Jul 15, 2020

juarezr Jul 15, 2020

chrullrich Jul 18, 2020 •

edited

Loading

juarezr Jul 20, 2020 •

edited

Loading

chrullrich Jul 23, 2020

juarezr Jul 24, 2020

juarezr commented Aug 6, 2020

juarezr commented Aug 13, 2020

Allow toxlsx() to add or replace a worksheet. #502

Allow toxlsx() to add or replace a worksheet. #502

Conversation

chrullrich commented Jul 14, 2020 • edited by juarezr Loading

Changes

Checklist

coveralls commented Jul 14, 2020 • edited Loading

alimanfoo left a comment

Choose a reason for hiding this comment

alimanfoo Jul 15, 2020

Choose a reason for hiding this comment

chrullrich Jul 15, 2020

Choose a reason for hiding this comment

juarezr Jul 15, 2020

Choose a reason for hiding this comment

chrullrich Jul 18, 2020 • edited Loading

Choose a reason for hiding this comment

juarezr Jul 20, 2020 • edited Loading

Choose a reason for hiding this comment

chrullrich Jul 23, 2020

Choose a reason for hiding this comment

juarezr Jul 24, 2020

Choose a reason for hiding this comment

juarezr commented Aug 6, 2020

juarezr commented Aug 13, 2020

chrullrich commented Jul 14, 2020 •

edited by juarezr

Loading

coveralls commented Jul 14, 2020 •

edited

Loading

chrullrich Jul 18, 2020 •

edited

Loading

juarezr Jul 20, 2020 •

edited

Loading