-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-51067: Add remove() in ZipInfo #19358
base: main
Are you sure you want to change the base?
Conversation
yudilevi2
commented
Apr 4, 2020
•
edited by terryjreedy
Loading
edited by terryjreedy
- Issue: remove/delete method for zipfile/tarfile objects #51067
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA). CLA MissingOur records indicate the following people have not signed the CLA: @nergall2 For legal reasons we need all the people listed to sign the CLA before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue. If you have recently signed the CLA, please wait at least one business day You can check yourself to see if the CLA has been received. Thanks again for the contribution, we look forward to reviewing it! |
This addition would be useful for my projects. Thanks! I think this should change should include unit tests in https://github.com/python/cpython/blob/master/Lib/test/test_zipfile.py -- perhaps a test to remove a middle entry and one to remove the last entry in a zip file. Does removing an entry truncate the zipfile to the new file length? If not, should it? Either way, a comment in the code and a test for verification would be helpful too. Does this code work on both 32-bit and 64-bit zip files? |
@yudilevi2, please sign the CLA as requested in the comment from @the-knights-who-say-ni . Thank you! |
@csabella done |
Hi @yudilevi2, thanks for contributing to Python! It looks like you signed the CLA but @the-knights-who-say-ni cannot find it. Can you make sure that you your GitHub username is listed in your details on https://bugs.python.org (https://cloud.githubusercontent.com/assets/2680980/23276970/d14a380c-f9d1-11e6-883d-e13b6b211239.png)? Also to be accepted, this change will need some tests and documentation. This change will also require a NEWS entry, you can add one by using https://blurb-it.herokuapp.com/ |
@yudilevi2, it seems that you haven't created a user on bugs.python.org (or if you have, then I can't find it.) For our CLA bot to recognize your signature you'll have to have a user there. See the above comments for links with more details. Sorry for requiring this bureaucracy, but please just let us know if you run into any trouble or would like more help with this. |
hey @taleinat my username there is yudilevi |
@yudilevi2, on b.p.o. the username yudilevi has a different GitHub username configured. That's likely why, for this PR, our bot isn't picking up on the fact that you've signed the CLA. It seems that your options are:
|
Could we get an update on this? Is it just being held back by the CLA? |
Sorry, there has been some confusion because I renamed the account.
I did sign the CLA multiple times already so it should be good but I'm not
sure it's all been registered correctly.
How do I check?
…On Thu, Jan 21, 2021 at 4:04 AM Peter DeVita ***@***.***> wrote:
Could we get an update on this? Is it just being held back by the CLA?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#19358 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/APBPVNVUAUZOTQ5FZZKTNC3S26RV3ANCNFSM4L542K3Q>
.
|
@yudilevi2 Checking here https://check-python-cla.herokuapp.com/ says you don't have an account associated with your Github account. |
Was there any progress on this? |
Sorry for the mess, the correct github name is now updated in my python bugs profile. |
As to progress, nothing yet on my end, can't find the time to dig into the code/flows yet in order to write proper tests, will try to get to it soon. |
Since this has gotten a bit stuck, I'm just chiming in to share that I came here while researching a legislative proposal to generate massive bulk data files of American legal documents. One of the sticking points I anticipate with the proposal to create bulk files is that sealed content will need to occasionally be removed from them. Currently, doing so with Python requires decompressing the zip, removing a file, and then recompressing it (doing it with the All this to say, I'm excited to see this PR and hope it can move forward. Thank you all. |
I hope this |
As mentioned, it will not get merged without tests and documentation. |
if i == len(filelist) - 1: | ||
entry_size = self.start_dir - info.header_offset | ||
else: | ||
entry_size = filelist[i + 1].header_offset - info.header_offset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anything in zipfile spec which says the header offsets have to be contiguous?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't matter anyway. The implementation moves all bytes between the start of a section and the start of the next section. If there were to be random bytes between the end of a section and the start of the next section, they will be moved too.
# Make sure we have an info object | ||
if isinstance(member, ZipInfo): | ||
# 'member' is already an info object | ||
zinfo = member |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unsafe. The ZipInfo
instances in self.filelist
will not compare equal with this zinfo even if the filename is matching - they compare by identity.
Additionally, if remove
is able to accept a ZipInfo
instance, then there needs to be handling of the possibility that the instance is not present in the zipfile at all.
I recommend to document that member is a string, and then use self.getinfo(member)
, avoiding these issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparing by identity when passing a zinfo is a feature and is consistent to the behavior of writestr
etc. I don't think this need to change.
It actually is a problem if the zinfo/filename does not exist in the zipfile at all, my revision has added a check to prevent that.