-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow toxlsx() to add or replace a worksheet. #502
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @chrullrich, this looks good to me. Just had one question.
petl/io/xlsx.py
Outdated
@@ -86,15 +87,32 @@ def __iter__(self): | |||
pass | |||
|
|||
|
|||
def toxlsx(tbl, filename, sheet=None, write_header=True): | |||
def toxlsx(tbl, filename, sheet=None, write_header=True, overwrite=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps the default should be overwrite=False
? This would mean that it's less likely a user will accidentally overwrite an entire workbook when all they wanted to do was replace a single sheet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compatibility with existing code that does not know about the new argument.
Setting the default to False would mean that a new sheet would always be added unless there was a sheet name argument that matched a sheet in the workbook. Worst case scenario, but not implausible: Two cron jobs exchange data through Excel files (for some reason that someone, somewhere, will have come up with). One uses toxlsx() without an explicit sheet name, the next one fromxlsx(), also without one. Then petl is upgraded. The file will keep growing and the second job will forevermore read only the first sheet containing the last data written before the upgrade.
I'm all for validating and tracking dependency versions, but that doesn't mean I actually do it ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice improvement! 👍
Some options here:
- Maybe change
overwrite
fromTrue
/False
to:overwrite
the filereplace
the sheet content in the file. If thesheet
argument isNone
recreate the file.append
to the sheet content the file
- Perhaps the default should be
overwrite='replace'
for reducing compatibility problems.- Calls with argument
sheet
withNone
will behave as currently. - Calls with argument
sheet
filled with constant names by multiplejobs
will have some impact by only a fixed number of sheets. - Calls with argument
sheet
filled with random names by multiplejobs
will be impacted and will keep growing. However this usage may not be common.
- Calls with argument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. It feels a bit weird to replace a single sheet if one is named, and the entire file if not, but it does make sense. However, (Nonsense. replace
with no sheet name also replaces overwrite
because they are entirely identical, so the latter isn't needed.overwrite
is needed to specify a sheet name.)
Calling the operation that adds sheets to existing files append
invites confusion with appendxlsx()
that adds rows to existing sheets, so I'd prefer to call it add
.
Updated PR coming up, either today or tomorrow.
The "add" mode is tricky because openpyxl will silently uniqify a colliding sheet name when attempting to create it, i.e. the resulting sheet name may not be the one requested. For predictability, the exact name is enforced.
"replace" with a sheet name expects the file to either not be there, or to be a valid .xlsx file, not the empty file it used to get here.
wb = openpyxl.Workbook(write_only=True) | ||
ws = wb.create_sheet(title=sheet) | ||
|
||
if not os.path.exists(filename) or mode == "overwrite" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not os.path.exists(filename)
Since v1.6.0 petl
allows reading and writing files to remote sources like remote servers or cloud providers when also using fsspec package.
However, this code will work only with local files.
Until now that is not a problem because the codes of fromxls()
and toxls()
are not working with remote sources.
But the PR #506 adds support for remote sources and also conflict with this PR.
Any ideas of what is the best way to conciliate these two changes? 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I had sent the PR four weeks earlier, it would be your problem now? Thanks, I think. :-)
I have no clue at all how the remote sources code works, and cannot make any sense of it right now. I'll try to find the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @chrullrich,
It's absolutely not a problem at all. 😃
When using remote sources, petl
only call open/read/write on python file objects.
So os.path.exists()
does not work when reading or writing to HTTP
, S3
, SMB
remote sources for example.
Do you think there is any trick for not using os.path.exists()
in this functionality?
My current plan is something like:
- Waith for @alimanfoo reviewing Allow toxlsx() to add or replace a worksheet. #502
- Find a workaround for
os.path.exists()
- Merge Allow reading and writing Excel files in remote sources #506
- Adapt and merge this PR (Allow toxlsx() to add or replace a worksheet. #502)
Hi @chrullrich , What you think about this possible resolution ? |
Closing this PR because it's changes were merged in #509. |
This PR has the objective of enabling toxlsx() to be used to produce xlsx files with multiple worksheets.
The single unit test I added is successful, but I was not able to get the entire test suite to run.
Changes
overwrite
argument to io.xlsx.toxlsx() and added a minimal docstring.Checklist
Checklist for for pull requests including new code and/or changes to existing code...