Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove/delete method for zipfile/tarfile objects #51067

Open
rossmclendon mannequin opened this issue Sep 2, 2009 · 47 comments
Open

remove/delete method for zipfile/tarfile objects #51067

rossmclendon mannequin opened this issue Sep 2, 2009 · 47 comments
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@rossmclendon
Copy link
Mannequin

rossmclendon mannequin commented Sep 2, 2009

BPO 6818
Nosy @loewis, @rhettinger, @terryjreedy, @gustaebel, @tiran, @merwok, @sandrotosi, @serhiy-storchaka, @denfromufa, @yudilevi2
PRs
  • bpo-6818: Add remove() in ZipInfo #19336
  • gh-51067: Add remove() in ZipInfo #19358
  • Files
  • delete.tar.gz: Includes 2 files,the one named delete.py is the main file.
  • zipfile.remove.patch: bugs, docs and tests
  • zipfile.remove.2.patch: improved patch for zipfile.remove
  • tricky.zip
  • mywork.patch
  • zipfile_filter.patch: renamed, fixed comment, removed some debugging code
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2009-09-02.04:42:10.018>
    labels = ['type-feature', 'library']
    title = 'remove/delete method for zipfile/tarfile objects'
    updated_at = <Date 2020-04-04.10:40:28.415>
    user = 'https://bugs.python.org/rossmclendon'

    bugs.python.org fields:

    activity = <Date 2020-04-04.10:40:28.415>
    actor = 'yudilevi'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2009-09-02.04:42:10.018>
    creator = 'rossmclendon'
    dependencies = []
    files = ['15199', '21063', '21188', '38231', '39061', '39063']
    hgrepos = ['305']
    issue_num = 6818
    keywords = ['patch']
    message_count = 36.0
    messages = ['92154', '92155', '92156', '92158', '92164', '92172', '92177', '94475', '94478', '109158', '109160', '130299', '130388', '130463', '130938', '140680', '159418', '160533', '160534', '192574', '229801', '229893', '236565', '236568', '240336', '240345', '240394', '240422', '241181', '241192', '241201', '241346', '266374', '365755', '365756', '365757']
    nosy_count = 17.0
    nosy_names = ['loewis', 'rhettinger', 'terry.reedy', 'lars.gustaebel', 'christian.heimes', 'rossmclendon', 'eric.araujo', 'ubershmekel', 'victorlee129', 'sandro.tosi', 'chroipahtz', 'serhiy.storchaka', 'Arthur.Darcet', 'Dave Sawyer', 'gambl', 'denfromufa', 'yudilevi']
    pr_nums = ['19336', '19358']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue6818'
    versions = ['Python 3.5']

    Linked PRs

    @rossmclendon
    Copy link
    Mannequin Author

    rossmclendon mannequin commented Sep 2, 2009

    It would be most helpful if a method could be included in the TarFile
    class of the tarfile module and the ZipFile class of the zipfile module
    that would remove a particular file (either given by a name or a
    TarInfo/ZipInfo object) from the archive.

    Usage to remove a single file from an archive would be as follows:

    import zipfile
    zipFileObject = zipfile.ZipFile(archiveName,'a')
    zipFileObject.remove(fileToRemove)
    zipFileObject.close()

    Such a method should probably only apply to archives that are in append
    mode as write mode would erase the original archive and read mode should
    render the archive immutable.

    One possible extra to be included is to allow a list of file names or
    ZipInfo/TarInfo objects to be passed into the remove method, such that
    all items in the list would be removed from the archive.

    @rossmclendon rossmclendon mannequin added stdlib Python modules in the Lib dir type-feature A feature request or enhancement topic-IO labels Sep 2, 2009
    @rhettinger
    Copy link
    Contributor

    +1

    @rossmclendon
    Copy link
    Mannequin Author

    rossmclendon mannequin commented Sep 2, 2009

    Slight change to:

    "Such a method should probably only apply to archives that are in append
    mode as write mode would erase the original archive and read mode should
    render the archive immutable."

    The method should probably still apply to an archive in write mode. It
    is conceivable that one may need to delete a file from the archive after
    it has been written but before the archive object has been closed.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Sep 2, 2009

    -1. I don't think this can be implemented in a reasonable way, assuming
    that you want the file to become smaller as a consequence of removal.

    @gustaebel
    Copy link
    Mannequin

    gustaebel mannequin commented Sep 2, 2009

    -1, although I can only speak for tarfile. Removing members from a tar
    archive sounds obvious and easy but it is *not*. A file in an archive is
    stored as a header block (that contains the metadata) followed by a
    number of data blocks (that contain the file's data). New files are
    simply appended to the archive file. There is no central table of
    contents whatsoever. To make things worse, a compressed archive is
    compressed in one go from the beginning right up to the end, it is not
    possible to access a member in the middle of an archive without having
    to decompress all data before it.
    Deleting files from an uncompressed archive is rather straightforward
    implementation-wise but IO intensive and risky. In contrast, there is no
    other way to delete files from a *compressed* tarfile than to make a
    copy of it omitting the unwanted files.

    @rossmclendon
    Copy link
    Mannequin Author

    rossmclendon mannequin commented Sep 2, 2009

    In light of Lars's comment on the matter, perhaps this functionality
    could be added to zip files only. Surely it can be done, considering
    that numerous utilities and even Windows Explorer provide such
    functionality. I must confess that I am unfamiliar with the inner
    workings of file archives and compression, but seeing as it is
    implemented in a number of places already, it seems logical that it
    could be implemented in ZipFile as well. I'll spend some time the next
    few days educating myself about zip files and how this might be
    accomplished.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Sep 2, 2009

    In light of Lars's comment on the matter, perhaps this functionality
    could be added to zip files only. Surely it can be done, considering
    that numerous utilities and even Windows Explorer provide such
    functionality.

    Are you sure they are not creating a new file in order to delete
    content? I recall that early zip tools (e.g. pkzip) had a mode
    where they would merely delete the entry from the directory, but
    leave the actual data in the file. Would you consider that a correct
    implementation? If so, *that* can be done, for zipfiles, AFAIU.

    I'll spend some time the next
    few days educating myself about zip files and how this might be
    accomplished.

    Please do - you'll find that deletion from zipfiles comes in a can
    full of worms.

    @victorlee129
    Copy link
    Mannequin

    victorlee129 mannequin commented Oct 26, 2009

    I done it In a very *violent* way.
    Is it ok for you thought?
    if so, would anybody please fix it into the lib?

    @victorlee129 victorlee129 mannequin removed the topic-IO label Oct 26, 2009
    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Oct 26, 2009

    I done it In a very *violent* way.
    Is it ok for you thought?

    In the form in which you have done it, it is clearly
    unacceptable for inclusion in the library: we don't
    want to add two modules "delete" and "classtools".

    In addition, notice that code is for tarfile, whereas
    the OP was asking for a similar feature for zipfile.

    if so, would anybody please fix it into the lib?

    This is not how this works. If you want us to take
    action, please submit a complete and correct patch.

    @chroipahtz
    Copy link
    Mannequin

    chroipahtz mannequin commented Jul 3, 2010

    I have attempted to implement a ZipFile.remove function. It seems to work fine. I have submitted a patch.

    The method of implementation is: find the file's index in the file list, then sum the lengths of the file entries before it to find its location in the archive. Then simply read in all the bytes after it, write them out at that location, and truncate the file x bytes shorter, where x is the length of the record. This works because the directory listing is created when the file is closed, so there's no harm in truncating.

    I've also made it truncate the zip file after reading in the existing files upon creation, because the directory information is not used after this point.

    This could use some testing on large files.

    This is my first patch, so let me know if I've done anything wrong.

    @chroipahtz
    Copy link
    Mannequin

    chroipahtz mannequin commented Jul 3, 2010

    My patch had some bugs, I'll need to do some more testing. Sorry about that.

    @ubershmekel
    Copy link
    Mannequin

    ubershmekel mannequin commented Mar 8, 2011

    What's the status with this patch? If nobody's looking at it I can try to see if it works and write the test and documentation for it.

    @terryjreedy
    Copy link
    Member

    Please feel free to test, revise, and write. Though 'removed', the file is still accessible via the history list. (Click 'zipfile_remove.patch' and then 'download'.)

    @ubershmekel
    Copy link
    Mannequin

    ubershmekel mannequin commented Mar 9, 2011

    I fixed the bugs I found, added tests and documentation. What do you guys think?

    @ubershmekel
    Copy link
    Mannequin

    ubershmekel mannequin commented Mar 15, 2011

    Fixed the bugs Martin pointed out and added the relevant tests. Sadly I had to move some stuff around, but I think the changes are all for the better. I wasn't sure about the right convention for the 2 constants I added btw.

    @merwok
    Copy link
    Member

    merwok commented Jul 19, 2011

    Martin did a review of the newer patch; maybe you didn’t get the mail (there’s a Rietveld bug when a user name without email is given to the Cc field).

    @ubershmekel
    Copy link
    Mannequin

    ubershmekel mannequin commented Apr 26, 2012

    I'm not sure I understand how http://bugs.python.org/review/6818/show works. I've looked all over and only found remarks for "zipfile.remove.patch" and not for "zipfile.remove.2.patch" which addressed all the aforementioned issues.

    Also, I don't understand how to add myself to the CC of this issue's review page.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented May 13, 2012

    Yuval, can you please submit a contributor agreement? See

    http://www.python.org/psf/contrib/

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented May 13, 2012

    As for adding yourself to the CC list: notice the string "ubershmekel" appearing in the "CC" field of http://bugs.python.org/review/6818/show. It means that you are already on the CC list.

    @tiran
    Copy link
    Member

    tiran commented Jul 7, 2013

    Yuval has submitted a CLA. I'm moving the proposal to 3.4 as 3.3 is in feature freeze mode.

    @ubershmekel
    Copy link
    Mannequin

    ubershmekel mannequin commented Oct 22, 2014

    Ping. Has this been postponed?

    @serhiy-storchaka
    Copy link
    Member

    I agree with Martin and Lars, this issue is not so easy at looks at first glance.

    For ZIP files we should distinct two different operations.

    1. Remove the entry from the central directory (and may be mark local file header as invalid if it is possible). This is easy, fast and safe, but it doesn't change the size of ZIP file.

    2. Physical remove the content of the file from ZIP file. This is so easy as remove a line from the text file. In worst case it has linear complexity from the size of ZIP file.

    2a. The safer way is to create temporary file in the same directory, copy the content of original ZIP file excluding deleted file, and then replace original ZIP file by modified copy. Be aware about file and parent directory permissions, owners, and disk space.

    2b. The faster but less safe way is to "shift" the content of the ZIP file after deleted file by reading it and writing back in the same ZIP file at different position. This way is not safe because when something bad happen at writing, we can lost all data. And of course there are crafty ZIP files in which the order of files doesn't match the order in central directory or even files data overlap.

    For performance may be we should implement (2) not as a method to remove single file, but as a method which takes the filter function and then left in the ZIP file only files for which it returns true.

    Or may be implement (1) and then add a method which cleans up the ZIP archive be removing all files removed from the central directory. We should discuss alternatives.

    And as for concrete patch, zipfile.remove.2.patch can read the content of all ZIP file in the memory. This is not appropriate, because ZIP file can be very large.

    @DaveSawyer
    Copy link
    Mannequin

    DaveSawyer mannequin commented Feb 25, 2015

    I'd be interested in taking up the zip portion at Pycon 2015 this year. I recently had need to delete file(s) from a zipfile.

    To do it today with the existing API requires you to unpack the zip and repack it. The unpack is slow and you need enough free disk space for the uncompressed files.

    My strategy is essentially exactly what msg229893 2a said: copy binary blobs to a tempfile, then overwrite the original when complete. I would use a name filter function to decide what to delete and optional parameter for the temp file (falling back to tempfile.tempfile if None). IIRC, this is the same strategy used in the dotNet zip library.

    @serhiy-storchaka
    Copy link
    Member

    I think it would be better if new method would copy filtered content to other ZIP file. Not alway you want to modifying the original file.

    The question is what to do with the data before the start of the ZIP file or between files in the ZIP file, or if files in the ZIP file are overlapped. Here is a sample of such file.

    @gambl
    Copy link
    Mannequin

    gambl mannequin commented Apr 9, 2015

    Hi,

    I've recently been working on a Python module for the Adobe universal container format (UCF) which extends the zip specification - as part of this I wanted to be able to remove and rename files in an archive.

    I discovered this issue when writing the module so realised there wasn't currently a solution - so I went down the rabbit hole.

    I've attached a patch which supports the removal and renaming of files in a zip archive. You can also look at this python module in a git-repo which is a the same code but separated out into a class that extends ZipFile: https://github.com/gambl/zipextended.

    The patch provides 4 main new "public" functions for the zipfile library:

    • remove(self, zinfo_or_arcname):
    • rename(self, zinfo_or_arcname, filename):
    • commit(self):
    • clone(self, file, filenames_or_infolist=None, ignore_hidden_files=False)

    The patch is in part modelled on the rubyzip solution. Remove and rename will initially only update the ZipFile's infolist. Changes are then persisted via a commit function which can be called manually - or will be called automatically upon close. Commit will then clone the zipfile with the necessary changes to a temporary file and replace the original file when that operation has completed successfully.

    An alternative to remove files without modifying the original is via the clone method directly. This is in the spirit of Serhiy's suggestion of filtering the content and not modifying the original. You can pass a list of filenames or fileinfos of the files to be included in the clone.
    So that clone can be performed without decompressing and then recompressing the files in the archive I have added two functions write_compressed and read_compressed.

    I have also attempted to address Serhiy's concern with respect to the tricky.zip - "hidden files" in between members of the archive. The clone method will by default retain any hidden files and maintain the same relative order in the archive. You can also elect to ignore the hidden files, and clone with just the files listed in the central directory.

    I did have to modify the tricky.zip attached to this issue manually as the CRC of file two (with file three embedded) was incorrect - and would therefore fail testzip(). I'm not actually sure how one would create such an archive - but I think that it's valid according to the zip spec. I've actually included the modified version in the patch for a few of the tests.

    I appreciate that this is a large-ish patch and may take some time to review - but as suggested in the comments - this wasn't as straight forward as is seems!

    Look forward to your comments.

    The signatures of the main functions are described below:

    remove(self, zinfo_or_arcname):

    Remove a member from the archive.
    
    Args:
      zinfo_or_arcname (ZipInfo, str) ZipInfo object or filename of the
        member.
    
    Raises:
      RuntimeError: If attempting to modify an Zip archive that is closed.
    

    ---

    rename(self, zinfo_or_arcname, filename):

    Rename a member in the archive.
    
    Args:
      zinfo_or_arcname (ZipInfo, str): ZipInfo object or filename of the
        member.
      filename (str): the new name for the member.
    
    Raises:
      RuntimeError: If attempting to modify an Zip archive that is closed.
    

    clone(self, file, filenames_or_infolist=None, ignore_hidden_files=False):

    Clone the a zip file using the given file (filename or filepointer).
    
    Args:
      file (File, str): file-like object or filename of file to write the
        new zip file to.
      filenames_or_infolist (list(str), list(ZipInfo), optional): list of
        members from this zip file to include in the new zip file.
      ignore_hidden_files (boolean): flag to indicate wether hidden files
        (data inbetween managed memebers of the archive) should be included.
    
    Returns:
        A new ZipFile object of the cloned zipfile open in append mode.
    
        If copying hidden files then clone will attempt to maintain the
        relative order between the files and members in the archive
    

    commit(self):
    Commit any inline modifications (removal and rename) to the zip archive.

     This makes use of a temporary file to create a new zip archive with the
     required modifications and then replaces the original.
    
     This therefore requires write access to either the directory where the
     original zipfile lives, or to python's default tempfile location.
    

    @terryjreedy
    Copy link
    Member

    All of you who have or might submit patches -- Victorlee, Troy, Mathew, or anyone else, please sign a PSF contributor agreement. We should not even look at a patch from you before you do.

    Info: https://www.python.org/psf/contrib/
    Form: https://www.python.org/psf/contrib/contrib-form/

    Receipt is official when a * appears after your name. This usually takes about a week.

    @DaveSawyer
    Copy link
    Mannequin

    DaveSawyer mannequin commented Apr 17, 2015

    Maybe it takes a little longer than a week. I have a final signed agreement from Ewa (https://secure.echosign.com/public/viewAgreement?aid=X88L4EVP5IXC289&eid=X88M6DGQ93J5K38&)

    signed on
    04/17/2014 6:48 PM
    Wow, exactly one year ago!

    @denfromufa
    Copy link
    Mannequin

    denfromufa mannequin commented May 25, 2016

    has this been merged?

    @yudilevi2
    Copy link
    Mannequin

    yudilevi2 mannequin commented Apr 4, 2020

    I wasn't aware of this issue and opened another one which I just closed as a duplicate -
    https://bugs.python.org/issue40175

    I also submitted a PR for the other one -
    #19336

    The implementation does involve shifting the bytes of the files and updating the offset of each file in the central directory.
    I do handle the case in which the central directory order and the order of the actual files in the archive isn't the same simply by sorting the central directory in memory before execution.

    The only issue I see here is the fact that if something happens during the process, the archive might get corrupted - but IMO it's worth it.
    Many tools provide that functionality (Windows, Total Commander) and as far as I know they do it in place.
    In the worst case, it's possible to implement it by creating a temp archive like Serhiy suggested - not too difficult to implement either since it simply requires duplicating the file and operating on it.

    @yudilevi2
    Copy link
    Mannequin

    yudilevi2 mannequin commented Apr 4, 2020

    Closed my initial PR and opened this one instead -
    #19336

    @yudilevi2
    Copy link
    Mannequin

    yudilevi2 mannequin commented Apr 4, 2020

    Sorry, typo, here's the new PR -
    #19358

    @buhtz
    Copy link

    buhtz commented Dec 15, 2022

    I'm a bit overwhelmed about all the comments and linked Issues and PR's.
    Does anyone has an overview about the current status of that Issue?

    @merwok
    Copy link
    Member

    merwok commented Dec 15, 2022

    Read the four longest from recent comments and ignore the small ones about process.

    issues with possible approaches: #51067 (comment) (which summarizes previous comments)
    one approach: #51067 (comment)
    another opnion: #51067 (comment)
    another approach with a noted flaw: #51067 (comment)

    The one PR that’s linked and open is from the author of the latest comment: #19358

    So the status is to convince core devs that the approach is worth it, then ensure the PR contains complete docs and tests and applies to the current main branch, and get reviews.

    danny0838 added a commit to danny0838/cpython that referenced this issue Mar 24, 2023
    danny0838 added a commit to danny0838/cpython that referenced this issue Mar 24, 2023
    danny0838 added a commit to danny0838/cpython that referenced this issue Mar 24, 2023
    danny0838 added a commit to danny0838/cpython that referenced this issue Mar 25, 2023
    @danny0838
    Copy link

    danny0838 commented Mar 25, 2023

    I have updated the code based on yudilevi2's version and added tests and docs. Should I discuss in that thread or open a new PR?

    https://github.com/danny0838/cpython/commits/gh-51067

    @merwok
    Copy link
    Member

    merwok commented Mar 25, 2023

    Discussions about the bug itself and the general idea of how to fix it should be here.
    PR discussions can focus on whether the code is doing everything right to implement the solution agreed on 🙂

    danny0838 pushed a commit to danny0838/cpython that referenced this issue Mar 25, 2023
    danny0838 pushed a commit to danny0838/cpython that referenced this issue Mar 25, 2023
    danny0838 added a commit to danny0838/cpython that referenced this issue Mar 25, 2023
    danny0838 added a commit to danny0838/cpython that referenced this issue Mar 25, 2023
    @danny0838
    Copy link

    danny0838 commented Mar 25, 2023

    Some folks are concerned about that, if a remove implementation handles file shrinking (which implies moving the data of internal file entries), and something bad happens during the moving, will lead to file corruption, as @serhiy-storchaka, @DaveSawyer, and @yudilevi2's comment have mentioned.

    I will argue that this will also happen in most in-place file modification scenarios. A good similar example is appending a file to a ZIP, in which case the appended file data will first overwrite the central directory, and the ZIP will be corrupted if there is an interruption before the final new central directory data is written. In other words: if one isn't so concerned that appending a file may corrupt the ZIP, he shouldn't be so concerned about removing, either.

    Although there is always a more sophisticated (and likely more indirect) technique to make an "in-place" file modification more secure, such as cloning data to an intermediate temporary file, it should be reasonable enough for a basic official implementation to work in this way.

    danny0838 added a commit to danny0838/cpython that referenced this issue Mar 25, 2023
    danny0838 added a commit to danny0838/cpython that referenced this issue Mar 26, 2023
    @danny0838
    Copy link

    danny0838 commented Apr 1, 2023

    Pardon. Is there still something that prevents the patch from implemented? It seems that lacking tests and docs is the only thing for the previous one?

    @distantvale
    Copy link

    can someone share the lastest PR for this? I also have a copy applied to the latest version of zipfile that I can create a new PR for if we don't have anything concrete

    @merwok
    Copy link
    Member

    merwok commented May 9, 2023

    See the top post for PR link

    @distantvale
    Copy link

    @merwok yup, sorry - so indeed what else is missing? I can work on that

    @narugo1992
    Copy link

    narugo1992 commented May 13, 2023

    This code not work for me on python3.8.1 ubuntu20.04

    When you use mode a to remove the file inside, you'll find the FFF.md which should be delete is still in the f.filelist.

    import zipfile
    
    
    # just copied from current commit(f3450f1)'s change
    class ZFile(zipfile.ZipFile):
        def remove(self, zinfo_or_arcname):
            """Remove a member from the archive."""
    
            if self.mode not in ('w', 'x', 'a'):
                raise ValueError("remove() requires mode 'w', 'x', or 'a'")
            if not self.fp:
                raise ValueError(
                    "Attempt to write to ZIP archive that was already closed")
            if self._writing:
                raise ValueError(
                    "Can't write to ZIP archive while an open writing handle exists"
                )
    
            # Make sure we have an existing info object
            if isinstance(zinfo_or_arcname, zipfile.ZipInfo):
                zinfo = zinfo_or_arcname
                # make sure zinfo exists
                if zinfo not in self.filelist:
                    raise KeyError(
                        'There is no item %r in the archive' % zinfo_or_arcname)
            else:
                # get the info object
                zinfo = self.getinfo(zinfo_or_arcname)
    
            return self._remove_members({zinfo})
    
        def _remove_members(self, members, *, remove_physical=True, chunk_size=2 ** 20):
            """Remove members in a zip file.
            All members (as zinfo) should exist in the zip; otherwise the zip file
            will erroneously end in an inconsistent state.
            """
            fp = self.fp
            entry_offset = 0
            member_seen = False
    
            # get a sorted filelist by header offset, in case the dir order
            # doesn't match the actual entry order
            filelist = sorted(self.filelist, key=lambda x: x.header_offset)
            for i in range(len(filelist)):
                info = filelist[i]
                is_member = info in members
    
                if not (member_seen or is_member):
                    continue
    
                # get the total size of the entry
                try:
                    offset = filelist[i + 1].header_offset
                except IndexError:
                    offset = self.start_dir
                entry_size = offset - info.header_offset
    
                if is_member:
                    member_seen = True
                    entry_offset += entry_size
    
                    # update caches
                    self.filelist.remove(info)
                    try:
                        del self.NameToInfo[info.filename]
                    except KeyError:
                        pass
                    continue
    
                # update the header and move entry data to the new position
                if remove_physical:
                    old_header_offset = info.header_offset
                    info.header_offset -= entry_offset
                    read_size = 0
                    while read_size < entry_size:
                        fp.seek(old_header_offset + read_size)
                        data = fp.read(min(entry_size - read_size, chunk_size))
                        fp.seek(info.header_offset + read_size)
                        fp.write(data)
                        fp.flush()
                        read_size += len(data)
    
            # Avoid missing entry if entries have a duplicated name.
            # Reverse the order as NameToInfo normally stores the last added one.
            for info in reversed(self.filelist):
                self.NameToInfo.setdefault(info.filename, info)
    
            # update state
            if remove_physical:
                self.start_dir -= entry_offset
            self._didModify = True
    
            # seek to the start of the central dir
            fp.seek(self.start_dir)
    
    
    with ZFile('test_zip.zip', 'w') as f:
        f.write('README.md', '1/R.md')
        f.write('README.md', 'FFF.md')
    
    with ZFile('test_zip.zip', 'a') as f:
        f.remove('FFF.md')
    
    with ZFile('test_zip.zip', 'r') as f:
        print(f.getinfo('1/R.md'))
        print(f.getinfo('FFF.md'))

    It should raise a KeyError when calling f.getinfo('FFF.md'), but it actually not.

    But when I remove a file in the first with block, it's okay.

    with ZFile('test_zip.zip', 'w') as f:
        f.write('README.md', '1/R.md')
        f.write('README.md', 'FFF.md')
        f.remove('FFF.md')
    
    with ZFile('test_zip.zip', 'r') as f:
        print(f.filelist)
        print(f.getinfo('1/R.md'))
        print(f.getinfo('FFF.md'))  # FFF.md is no longer here this time

    @danny0838
    Copy link

    danny0838 commented May 13, 2023

    @narugo1992 This is due to a bug in Python < 3.8.7 that zipfile.ZipFile is not truncated in 'a' mode when closed. You have to patch ZipFile._write_end_record to use in an affected Python version like this:

    if sys.version_info < (3, 8, 7):
        # Fix an issue causing zip not truncated
        class ZipFile(zipfile.ZipFile):
            def _write_end_record(self):
                super()._write_end_record()
                if self.mode == "a":
                    self.fp.truncate()
                self.fp.flush()
        zipfile.ZipFile = ZipFile

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    Status: No status
    Status: No status
    Development

    No branches or pull requests

    9 participants