Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-51067: Add remove() in ZipInfo #19358

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

yudilevi2
Copy link

@yudilevi2 yudilevi2 commented Apr 4, 2020

@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

CLA Missing

Our records indicate the following people have not signed the CLA:

@nergall2

For legal reasons we need all the people listed to sign the CLA before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

@yudilevi2 yudilevi2 changed the title enhancement-6818: Add remove() in ZipInfo enhancement 6818: Add remove() in ZipInfo Apr 4, 2020
@yudilevi2 yudilevi2 changed the title enhancement 6818: Add remove() in ZipInfo bpo-6818: Add remove() in ZipInfo Apr 4, 2020
@cmeyer
Copy link

cmeyer commented Apr 4, 2020

This addition would be useful for my projects. Thanks!

I think this should change should include unit tests in https://github.com/python/cpython/blob/master/Lib/test/test_zipfile.py -- perhaps a test to remove a middle entry and one to remove the last entry in a zip file.

Does removing an entry truncate the zipfile to the new file length? If not, should it? Either way, a comment in the code and a test for verification would be helpful too.

Does this code work on both 32-bit and 64-bit zip files?

@csabella
Copy link
Contributor

@yudilevi2, please sign the CLA as requested in the comment from @the-knights-who-say-ni . Thank you!

@yudilevi2
Copy link
Author

@csabella done

@remilapeyre
Copy link
Contributor

Hi @yudilevi2, thanks for contributing to Python!

It looks like you signed the CLA but @the-knights-who-say-ni cannot find it. Can you make sure that you your GitHub username is listed in your details on https://bugs.python.org (https://cloud.githubusercontent.com/assets/2680980/23276970/d14a380c-f9d1-11e6-883d-e13b6b211239.png)?

Also to be accepted, this change will need some tests and documentation.

This change will also require a NEWS entry, you can add one by using https://blurb-it.herokuapp.com/

@taleinat
Copy link
Contributor

@yudilevi2, it seems that you haven't created a user on bugs.python.org (or if you have, then I can't find it.) For our CLA bot to recognize your signature you'll have to have a user there. See the above comments for links with more details.

Sorry for requiring this bureaucracy, but please just let us know if you run into any trouble or would like more help with this.

@yudilevi2
Copy link
Author

hey @taleinat my username there is yudilevi

@taleinat
Copy link
Contributor

taleinat commented Sep 25, 2020

@yudilevi2, on b.p.o. the username yudilevi has a different GitHub username configured. That's likely why, for this PR, our bot isn't picking up on the fact that you've signed the CLA.

It seems that your options are:

  1. Change the GitHub user associated with your b.p.o. account to "yudilevi2"
  2. Create a new "yudilevi2" user on b.p.o., associate it with your yudilevi2 GitHub username, and sign the CLA again with that account.

@pmdevita
Copy link

pmdevita commented Jan 21, 2021

Could we get an update on this? Is it just being held back by the CLA?

@yudilevi2
Copy link
Author

yudilevi2 commented Jan 21, 2021 via email

@peterstanton
Copy link

Sorry, there has been some confusion because I renamed the account. I did sign the CLA multiple times already so it should be good but I'm not sure it's all been registered correctly. How do I check?

@yudilevi2 Checking here https://check-python-cla.herokuapp.com/ says you don't have an account associated with your Github account.

@marchinidavide
Copy link

Was there any progress on this?

@yudilevi2
Copy link
Author

Hi @yudilevi2, thanks for contributing to Python!

It looks like you signed the CLA but @the-knights-who-say-ni cannot find it. Can you make sure that you your GitHub username is listed in your details on https://bugs.python.org (https://cloud.githubusercontent.com/assets/2680980/23276970/d14a380c-f9d1-11e6-883d-e13b6b211239.png)?

Also to be accepted, this change will need some tests and documentation.

This change will also require a NEWS entry, you can add one by using https://blurb-it.herokuapp.com/

Sorry for the mess, the correct github name is now updated in my python bugs profile.

@yudilevi2
Copy link
Author

As to progress, nothing yet on my end, can't find the time to dig into the code/flows yet in order to write proper tests, will try to get to it soon.

@mlissner
Copy link

mlissner commented Dec 8, 2021

Since this has gotten a bit stuck, I'm just chiming in to share that I came here while researching a legislative proposal to generate massive bulk data files of American legal documents. One of the sticking points I anticipate with the proposal to create bulk files is that sealed content will need to occasionally be removed from them. Currently, doing so with Python requires decompressing the zip, removing a file, and then recompressing it (doing it with the zip CLI tool would work sans Python).

All this to say, I'm excited to see this PR and hope it can move forward. Thank you all.

@quyentruong
Copy link

I hope this remove() get merge soon.

@wimglenn
Copy link
Contributor

As mentioned, it will not get merged without tests and documentation.

if i == len(filelist) - 1:
entry_size = self.start_dir - info.header_offset
else:
entry_size = filelist[i + 1].header_offset - info.header_offset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything in zipfile spec which says the header offsets have to be contiguous?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't matter anyway. The implementation moves all bytes between the start of a section and the start of the next section. If there were to be random bytes between the end of a section and the start of the next section, they will be moved too.

# Make sure we have an info object
if isinstance(member, ZipInfo):
# 'member' is already an info object
zinfo = member
Copy link
Contributor

@wimglenn wimglenn Apr 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unsafe. The ZipInfo instances in self.filelist will not compare equal with this zinfo even if the filename is matching - they compare by identity.

Additionally, if remove is able to accept a ZipInfo instance, then there needs to be handling of the possibility that the instance is not present in the zipfile at all.

I recommend to document that member is a string, and then use self.getinfo(member), avoiding these issues.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing by identity when passing a zinfo is a feature and is consistent to the behavior of writestr etc. I don't think this need to change.

It actually is a problem if the zinfo/filename does not exist in the zipfile at all, my revision has added a check to prevent that.

@terryjreedy terryjreedy changed the title bpo-6818: Add remove() in ZipInfo gh-51067: Add remove() in ZipInfo Dec 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.