Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zipfile can't extract file #51088

Open
NewerCookie mannequin opened this issue Sep 4, 2009 · 61 comments
Open

zipfile can't extract file #51088

NewerCookie mannequin opened this issue Sep 4, 2009 · 61 comments
Labels
3.7 (EOL) end of life OS-windows stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@NewerCookie
Copy link
Mannequin

NewerCookie mannequin commented Sep 4, 2009

BPO 6839
Nosy @birkenfeld, @terryjreedy, @gpshead, @ronaldoussoren, @amauryfa, @ncoghlan, @berkerpeksag, @JimJJewett, @serhiy-storchaka, @MorganRamsay
PRs
  • bpo-6839: removed unnecessary file name encoding test from ZipFile.open() #14212
  • Files
  • test.zip: mildly corrupt zipfile to test error handling
  • zlib_forward_slash.patch
  • zipfile_276_filename_mismatch_v2.patch: patch with warnings against 2.7.6
  • zipfile_340_filename_mismatch_v3.patch: patch with warnings against 3.4.0
  • zipfile_276_filename_mismatch_v3.patch: patch with print against 2.7.6
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2009-09-04.19:56:49.966>
    labels = ['3.7', 'type-bug', 'library', 'OS-windows']
    title = "zipfile can't extract file"
    updated_at = <Date 2019-06-18.21:36:54.889>
    user = 'https://bugs.python.org/NewerCookie'

    bugs.python.org fields:

    activity = <Date 2019-06-18.21:36:54.889>
    actor = 'python-dev'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)', 'Windows']
    creation = <Date 2009-09-04.19:56:49.966>
    creator = 'NewerCookie'
    dependencies = []
    files = ['14832', '14934', '35115', '35117', '35120']
    hgrepos = []
    issue_num = 6839
    keywords = ['patch']
    message_count = 61.0
    messages = ['92265', '92297', '92309', '92326', '92330', '92335', '92516', '92874', '116384', '116385', '200165', '201842', '208970', '208973', '208975', '208982', '208983', '208985', '208987', '209023', '211562', '217533', '217546', '217551', '217554', '217556', '217558', '217561', '217569', '217570', '217571', '217572', '217573', '217574', '217575', '217576', '217578', '217616', '217624', '217625', '217627', '217634', '217635', '217636', '217638', '217641', '217642', '217643', '217647', '217648', '217659', '217660', '217661', '217743', '217753', '217754', '217756', '217778', '217932', '217933', '345999']
    nosy_count = 15.0
    nosy_names = ['georg.brandl', 'terry.reedy', 'gregory.p.smith', 'ronaldoussoren', 'amaury.forgeotdarc', 'ncoghlan', 'NewerCookie', 'chuck', 'francismb', 'berker.peksag', 'Jim.Jewett', 'serhiy.storchaka', 'apolkosnik', 'Sean Goodwin', 'MorganRamsay']
    pr_nums = ['14212']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue6839'
    versions = ['Python 2.7', 'Python 3.4', 'Python 3.5', 'Python 3.6', 'Python 3.7']

    @NewerCookie NewerCookie mannequin added stdlib Python modules in the Lib dir OS-windows type-bug An unexpected behavior, bug, or error labels Sep 4, 2009
    @NewerCookie
    Copy link
    Mannequin Author

    NewerCookie mannequin commented Sep 4, 2009

    The following exception occured when I tried to extract on Windows.

    "zipfile.BadZipfile: File name in directory "test\test2.txt" and header
    "test/test2.txt" differ."

    It seems like problem about slash.
    I tested using by zipfile Revision 72893.

    @NewerCookie
    Copy link
    Mannequin Author

    NewerCookie mannequin commented Sep 6, 2009

    P.S
    I tested extraction by using 7-zip.
    It works fine.

    @ronaldoussoren
    Copy link
    Contributor

    The zipfile is technically incorrect, the zipfile specification prescribes
    that all filenames use '/' as the directory separator.

    Even without that caveat the file is corrupt because the zipfile directory
    header and the per-file header don't agree on the name of the file.

    That said: IMHO the current code in zipfile.ZipFile.open is too strict, it
    shouldn't raise an error when the two names aren't exactly the same
    because there are valid reasons for them to be different (such as renaming
    a file in the zipfile without rewriting the entire zipfile).

    @alanmcintyre
    Copy link
    Mannequin

    alanmcintyre mannequin commented Sep 6, 2009

    FileRoller doesn't complain about the mismatched slashes either. Where
    did the ZIP come from, by the way? I seem to recall that there have
    been other instances in which ZIP applications were more "forgiving"
    than the zipfile module. How far should zipfile go in bending the
    interpretation of the ZIP standard?

    As far as the renaming goes, it seems the standard says the header name
    should be used if the two names are different. If nobody else has time
    to make a patch and tests I can take a stab at it in the next few days.

    @ronaldoussoren
    Copy link
    Contributor

    alan: I don't quite understand which filename you want to use when the
    name in the per-file header and the central directory don't match.

    Where in the standard is this prescribed? I couldn't find anything in
    the PKWare zipfile appnote [1]

    My preference would be to use the central directory as the canonical
    value because scanning the entire zipfile to read the per-file header
    would give a significant overhead. This might not be very noticable with
    small zipfiles, but I regularly use zipfiles with over 100K files in
    them in those files a scan of the zipfile is prohibitively expensive.

    Furthermore, when the two are different the most reasonably explaination
    is that an in-place edit of the zipfile changed the directory without
    rewriting the entire zipfile (just like you can "delete" files from a
    zipfile by dumping them from the directory rather than completely
    rewriting the entire archive)

    [1]
    APPNOTE.TXT - .ZIP File Format Specification Version: 6.3.2
    Revised: September 28, 2007
    Copyright (c) 1989 - 2007 PKWARE Inc., All Rights Reserved.

    @alanmcintyre
    Copy link
    Mannequin

    alanmcintyre mannequin commented Sep 6, 2009

    Sorry about the confusion--I think I confused myself by looking at the
    bit about CRC checksums in the "Info-ZIP Unicode Path Extra Field"
    section before I posted. I meant to say that the central directory name
    looks preferred over the per-file header.

    n section J, under "file name (Variable)" there's a bit that says:

    "If input came from standard input, there is no file name field. If
    encrypting the central directory and general purpose bit flag 13 is set
    indicating masking, the file name stored in the Local Header will not be
    the actual file name. A masking value consisting of a unique
    hexadecimal value will be stored."

    So in these cases the central directory name has to be used. And, as
    you pointed out, some operations like "deleting" a member from the
    archive are implemented by editing the central directory, so it would
    seem that the central directory should be used if there's a conflict.

    @terryjreedy
    Copy link
    Member

    In the case at issue, the file name is the same (contrary to the error
    message). The two representations of the *path* are different, but
    equivalent. There is no ambiguity: the file should be put in directory
    'test' and named 'test2.txt'. So I think zipfile should do what 7zip
    does and do just that.

    An actual filename difference might be argued differently.

    @chuck
    Copy link
    Mannequin

    chuck mannequin commented Sep 19, 2009

    I added a patch to replace back slashes by forward slashes in three
    places, only one if them actually relevant to the errors in the attached
    .zip file.

    I kept the exception for mismatching filenames, but if you think it is
    appropriate to remove it I could do that as well.

    @tjguk tjguk self-assigned this Aug 6, 2010
    @amauryfa
    Copy link
    Member

    I agree with the change, but the code should be factorized in a function (normalize_filename for example)

    @ronaldoussoren
    Copy link
    Contributor

    I'd prefer if the code no longer checked if the filename in the directory matches the name in the per-file header.

    The reason of that is that the two don't have to match: it is relatively cheap to rename a file in the zipfile by rewriting the directory while rewriting the entire zipfile can be pretty expensive when zipfiles get large.

    It's probably worthwhile to test what other zipfile tools do in the respect (e.g., create a zipfile where the filename in the header doesn't match the name in the directory and extract that zip using a number of popular tools).

    (I have a slightly odd perspective on this because I regularly deal with zipfiles containing over 100K files and over 10GByte of data).

    @tjguk tjguk removed their assignment Sep 28, 2012
    @apolkosnik
    Copy link
    Mannequin

    apolkosnik mannequin commented Oct 17, 2013

    I've got bitten by a different variation of this bug.

    In my case the issue can be summarized by:
    zipfile.BadZipfile: File name in directory "Windows\TEMP\test.tmp" and header "C:\Windows\TEMP\test.tmp" differ.

    Attached is a patch for Python27/lib/zipfile.py. I understand that it might not be the best approach, but at least we just compare the filenames without caring much about those pesky paths preceding them.

    @apolkosnik
    Copy link
    Mannequin

    apolkosnik mannequin commented Oct 31, 2013

    Just tested my patch on mac, and it appears that it didn't work on OSX (and likely on other unix platforms too).

    Conclusion... os.path.basename() will not do anything to windows paths when running on unix.

    I'm thinking that instead of bailing at 'File name in directory "%s" and header "%s" differ.', the library should just print a warning, and continue.

    @apolkosnik
    Copy link
    Mannequin

    apolkosnik mannequin commented Jan 23, 2014

    I'm in a similar situation, my test file raises this:

    File name in directory "windows\TEMP\\test123.txt" and header "C:\windows\TEMP\\test123.txt" differ.

    It turns out that I can't find any cross platform procedures for processing the paths between the different platforms. And there are other things like doing it in portable way; os.path.split() nor os.path.basename() won't touch windows paths on un*x, etc...

    So, I'd like to propose an easy way, just allow the process to extract the files (and print a warning message) rather that just raising an exception (raise BadZipfile,...) and stopping the extraction altogether.

    @apolkosnik
    Copy link
    Mannequin

    apolkosnik mannequin commented Jan 23, 2014

    This one has the parentheses for print, so that it works in python 3.x. Also, the default fallback behavior in this case is to use the filename from the zips' directory (the first path in the warning).

    @ronaldoussoren
    Copy link
    Contributor

    As I wrote in msg116385 I'd prefer to drop the consistency check completely because updating data like the filename in the central directory is a cheap way to rename files without completely rewriting the zip file.

    @apolkosnik
    Copy link
    Mannequin

    apolkosnik mannequin commented Jan 23, 2014

    Can we get this simple "fix" implemented in time for the next 2.7.x release?!

    Thank you!

    @birkenfeld
    Copy link
    Member

    print() is not a good way to emit the warning; please use the warnings module.

    @serhiy-storchaka
    Copy link
    Member

    As I wrote in msg116385 I'd prefer to drop the consistency check completely
    because updating data like the filename in the central directory is a cheap
    way to rename files without completely rewriting the zip file.

    It should at least left as debugging print.

    It can't be a warning, because it depends not on user's actions, but on
    external data. But user still should be able to investigate uncommon zipfiles
    by setting the debug attribute.

    @apolkosnik
    Copy link
    Mannequin

    apolkosnik mannequin commented Jan 23, 2014

    Excellent, please see my third attempt.

    @terryjreedy
    Copy link
    Member

    Adam this is not a security issue (2.6, 3.1, 3.2), nor a future issue that must wait for 3.5.

    @apolkosnik
    Copy link
    Mannequin

    apolkosnik mannequin commented Feb 18, 2014

    It might not be a regular "security" issue, but it is not extracting some files that it should. There's a possible scenario, where it can be a security issue.

    @apolkosnik
    Copy link
    Mannequin

    apolkosnik mannequin commented Apr 29, 2014

    Gentlemen,

    Is there's any way this fix can be included in any version?
    Currently, the fact that the exception is thrown makes extracting some zip files impossible with this library, and rolling your own is a bit painful. (either using a wrapper around 7zip to handle those or just provide cloned/patched versions for every major python version).

    This ridiculous behavior is really not consistent with other ZIP implementations (7zip just ignores the mismatch).

    Thank you for your time and effort.

    @terryjreedy
    Copy link
    Member

    Adam P, please don't screw around with the version headers. If you want to claim that this is a security issue of the type we care about (threats to the public internet) for patching old releases, and severe enough that we should do anything about it, send a detailed explanation with links to evidence to security response team. Simple writing 'a possible scenario' is insufficient.

    @apolkosnik
    Copy link
    Mannequin

    apolkosnik mannequin commented Apr 29, 2014

    For the version headers, I've added the versions featuring the broken behavior. That's all.

    I'm not saying that this is

    I'm extracting malware from the Central Quarantine files, and the vendor's implementation is broken and is causing this issue for me on every single file inside the archive.

    Let's say, I've got a wrapper script that feeds the contents of a zip file to be scanned with this, because of this behavior, the wrapper will error out... Customers will say your product sucks, etc.

    Does this really take an act of god to fix this?

    @apolkosnik
    Copy link
    Mannequin

    apolkosnik mannequin commented Apr 29, 2014

    Also, this behavior is present on all platforms and all versions of Python (zipfile Library), so maybe the headers should be adjusted there too.

    I'm not saying that this is necessarily a big freaking hole, but by using this, one can prevent files from being extracted using this simple trick.

    @apolkosnik
    Copy link
    Mannequin

    apolkosnik mannequin commented Apr 30, 2014

    Patch against zipfile 3.4.0 attached.

    @apolkosnik
    Copy link
    Mannequin

    apolkosnik mannequin commented Apr 30, 2014

    update

    @apolkosnik
    Copy link
    Mannequin

    apolkosnik mannequin commented Apr 30, 2014

    Once again patch against 2.7.6

    @gpshead
    Copy link
    Member

    gpshead commented Apr 30, 2014

    Don't use print (to stdout) or sys.stderr directly. There are already many other uses of warnings.warn within the zipfile module. Be consistent with those.

    Existing zipfile warnings seem to favor lazily importing warnings when its needed rather than a top level 'import warnings'. While I find that annoying, there are sometimes reasons to do it and the minimally invasive change that is consistent with the rest of the existing code is to do the same thing here.

    something similar to:

    + if self.debug and fname != zinfo.orig_filename:
    + import warnings
    + warnings.warn(
    + 'Warning: Filename in directory "%s" and header "%s" differ.' % (
    + zinfo.orig_filename, fname))

    @ncoghlan
    Copy link
    Contributor

    As Greg suggested, the important thing is to follow the precedent set by
    other debug messages in the module.

    @apolkosnik
    Copy link
    Mannequin

    apolkosnik mannequin commented Apr 30, 2014

    Attached is a patch with warnings against 2.7.6

    @apolkosnik
    Copy link
    Mannequin

    apolkosnik mannequin commented Apr 30, 2014

    Attached is a patch with warnings against 3.4.0

    @apolkosnik
    Copy link
    Mannequin

    apolkosnik mannequin commented Apr 30, 2014

    Attached is a patch with warnings against 2.7.6 (this one should be good to go)

    @berkerpeksag
    Copy link
    Member

    --- a/zipfile.py	Wed Apr 30 11:27:16 2014
    +++ b/zipfile.py	Wed Apr 30 11:27:01 2014
    @@ -1174,8 +1174,9 @@
                 else:
                     fname_str = fname.decode("cp437")
     
    -            if fname_str != zinfo.orig_filename:
    -                raise BadZipFile(
    +            if self.debug and fname_str != zinfo.orig_filename:
    +                import warnings
    +                warnings.warn(
                         'File name in directory %r and header %r differ.'
                         % (zinfo.orig_filename, fname))

    Also, you need to add stacklevel=2 to warnings.warn().

    @apolkosnik
    Copy link
    Mannequin

    apolkosnik mannequin commented Apr 30, 2014

    3.4.0 pathc with stacklevel=2

    @jimjjewett
    Copy link
    Mannequin

    jimjjewett mannequin commented Apr 30, 2014

    I'm leaving it as "needs patch" because it isn't clear exactly what a committer should do.

    I think the current intent is to make the changes listed in zipfile_???_filename_mismatch_v2.patch (which are not listed as reviewable -- but the changes are indeed sufficiently straightforward that the the files -- if need be -- could be edited by hand as if they were made originally by the committer.)

    This change is small enough (warning instead of raise) that a test case is probably not strictly required, but it would be helpful.

    test.zip would presumably be useful data for a test case.

    There is dispute over whether this would be an enhancement (more generous with what to accept), a bug fix, or a security *regression* because it still allows old vulnerable files to stick around unreplaced (or to hide from a malware scanner), but no longer raises an Exception to get attention. (warnings are often ignored)

    zlib_forward_slash.patch would also be good (and might even be a security fix, by allowing the new versions to be installed), but is not ready to be committed, as
    (A) it repeats the logic inline instead of using the newly defined helper method
    (B) it doesn't have a test case (test1.zip should help when creating one)
    (C) it has neither a doc change nor an explicit (and dubious) statement that this is just a bug fix and wouldn't need to be listed in the versionchanged.

    There is also a question of how general the filename correction should be, particularly with respect to windows drives and capitalization. The one in this patch seems to be the minimal change, and is explicitly supported by the zip spec.

    @jimjjewett
    Copy link
    Mannequin

    jimjjewett mannequin commented Apr 30, 2014

    Presumably the stacklevel applies to all versions; verifying that it warns about the right code location is important enough to require a test case.

    @apolkosnik
    Copy link
    Mannequin

    apolkosnik mannequin commented Apr 30, 2014

    I just looked through 2.7.6 version of zipfile, and the the error handling there is either through using raise() or print(). So, inline with the guidance provided for 2.7.6, perhapswe should stick with print() instead of warning.warn(). I'll post that a bit later.

    test.zip up there is the test case for this change. Is there any other test case needed?

    @ethanfurman
    Copy link
    Member

    Adam, please stop deleting the files. It makes for a lot of noise to those on the nosy list, and is unnecessary.

    Just make sure you increment the version number on the files you upload and it will be fine.

    Thanks.

    @apolkosnik
    Copy link
    Mannequin

    apolkosnik mannequin commented Apr 30, 2014

    Jim,

    I've got some test cases where the zlib_forward_slash.patch doesn't cut it. That was the reason for trying a broader approach with filename_mismatch patches.

    @francismb
    Copy link
    Mannequin

    francismb mannequin commented Apr 30, 2014

    A small question related to: "zipfile_276_filename_mismatch_v3.patch"

    --- a/zipfile.py	Wed Apr 30 11:44:38 2014
    +++ b/zipfile.py	Wed Apr 30 15:10:38 2014
    @@ -970,10 +970,10 @@
                 if fheader[_FH_EXTRA_FIELD_LENGTH]:
                     zef_file.read(fheader[_FH_EXTRA_FIELD_LENGTH])
     
    -            if fname != zinfo.orig_filename:
    -                raise BadZipfile, \
    +            if self.debug and fname != zinfo.orig_filename:
    +                print( \
                             'File name in directory "%s" and header "%s" differ.' % (
    -                            zinfo.orig_filename, fname)
    +                            zinfo.orig_filename, fname))

    Shouldn't a change from raising an exception to a print be somewhere documented?

    Thanks

    @gpshead
    Copy link
    Member

    gpshead commented Apr 30, 2014

    The bug was that BadZipFile was being raised when it shouldn't be so I wouldn't worry about documenting the behavior change other than in the Misc/NEWS entry that the ultimate commiter writes up.

    @apolkosnik
    Copy link
    Mannequin

    apolkosnik mannequin commented Apr 30, 2014

    Is there anything else that you need me to provide?

    @jimjjewett
    Copy link
    Mannequin

    jimjjewett mannequin commented Apr 30, 2014

    On Wed, Apr 30, 2014 at 3:05 PM, Adam Polkosnik wrote:

    test.zip up there is the test case for this change. Is there any other test case needed?

    ah; I see the confusion. test.zip is test *data*. When I asked for a
    test *case*, I meant something that ensures the data will be used to
    actually run the test automatically.

    Typically, that would involve adding something to
    Lib/test/test_zipfile.py. I'm guessing it would be easiest to add a
    new class inheriting from unittest.TestCase and opening test.zip in
    the setUp, then using a bunch of assert* methods to verify that the
    file was read and interpreted correctly.

    -jJ

    @jimjjewett
    Copy link
    Mannequin

    jimjjewett mannequin commented Apr 30, 2014

    On Wed, Apr 30, 2014 at 3:11 PM, Adam Polkosnik

    I've got some test cases where the zlib_forward_slash.patch doesn't cut it.

    My recommendation (and I could be convinced otherwise) would be to replace

        if fname_str != zinfo.orig_filename:
            raise ...

    with something more like

        self.filename_check(fname_str,  zinfo.orig_filename)

    and a default implementation of filename_check that does nothing if
    they're equal; calls the slash replace (since the standard supports
    that correction); does nothing else if they're now equal; emits a
    warning (or prints, in 2.7.6) otherwise.

    In 2.7.6, you would have to keep the new methods private, but in 3.5,
    users could override filename_check to handle the windows path
    normalization, or whatever other problems you have documented.

    @apolkosnik
    Copy link
    Mannequin

    apolkosnik mannequin commented May 2, 2014

    Jim,

    The problems documented here are related to two cases (both apparently arriving from world of windows):

    1. two relative paths with inverted slash in one of them (test\test2.txt vs test/test2.txt)
    2. relative path vs absolute path (windows\temp\test.txt vs c:\windows\temp\test.txt)

    The extraction part seems to be doing a good job at writing the files into sane locations.

    IMHO, there's no point in trying to replace slashes or otherwise "normalize", as this would fix the cases where the presence of an inverted slashes should be noted in debug output.
    By the same token stripping the drive letter from the absolute path part would just deprive us from noticing such intricacies in these special zip files.

    @jimjjewett
    Copy link
    Mannequin

    jimjjewett mannequin commented May 2, 2014

    On Fri, May 2, 2014 at 1:14 AM, Adam Polkosnik

    The problems documented here are related to two cases (both apparently arriving from world of windows):

    Good! I had thought you had even more!

    1. two relative paths with inverted slash in one of them (test\test2.txt vs test/test2.txt)

    My understanding from earlier -- and I may have been reading too much
    into some of the comments -- is that the standard defined \filename as
    an inferior alias for /filename and supported the fix.

    Notably, if you're extracting on windows with windows conventions,
    then windows will treat them identically anyhow.

    If you're extracting a windows file to a unix environment, then \t
    really should be translated to /t.

    1. relative path vs absolute path (windows\temp\test.txt vs c:\windows\temp\test.txt)

    These really are different, as leaving off the "C:" should mean
    "current drive", which will often (but not always) be C:

    This (and differing capitalization) are among the reasons to do the
    filename fix in a separate method, so that subclasses with more local
    knowledge can more easily do the right thing.

    Note that for python 3.4 and newer, pathlib <URL:
    https://docs.python.org/3/library/pathlib.html\> may be helpful. It
    would probably even be possible to backport the essential parts as an
    implementation detail. But I'm not sure if that could be done
    compatibly with maintenance releases, or how much work it would take.

    The extraction part seems to be doing a good job at writing the files into sane locations.
    IMHO, there's no point in trying to replace slashes or otherwise "normalize", as this would fix the cases where the presence of an inverted slashes should be noted in debug output.

    My understanding had been that it was failing to extract entirely. So
    exactly what is the problem?

    @apolkosnik
    Copy link
    Mannequin

    apolkosnik mannequin commented May 2, 2014

    Extraction works fine, the issue was that raise() was creating an exception, and stoping the whole extraction process. When replaced with a warning, everything works fine.

    @ethanfurman
    Copy link
    Member

    Adam Polkasnik said:
    --------------------

    Extraction works fine, the issue was that raise() was creating an exception, and
    stopping the whole extraction process.

    That doesn't make sense. If an exception was "stopping the whole extraction process" then extraction was not working fine.

    Questions:

    • Are the names with '\' in them in the central directory, or the per-file header?

    • If in the central directory (which is the name we are going to use, yes?) how do
      we tell if the '\' should be a '/' or an escape? (such as '\t')

    @apolkosnik
    Copy link
    Mannequin

    apolkosnik mannequin commented May 2, 2014

    Ethan,
    I'd refer you to msg92309...

    And
    When testing with WinZip it looks like this:
    No errors detected in compressed data of C:\Downloads\test.zip.
    Testing ...
    Testing test\ OK
    Testing test\test2.txt OK
    Testing test1.txt OK

    Then in python:
    Python 3.4.0 (v3.4.0:04f714765c13, Mar 16 2014, 19:25:23) [MSC v.1600 64 bit (AM
    D64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import zipfile
    >>> zf =  zipfile.ZipFile('test.zip')
    >>> namelist = zf.namelist()
    >>> namelist
    ['test/', 'test/test2.txt', 'test1.txt']
    >>> for af in namelist:
    ...     zf.read(af)
    ...
    Traceback (most recent call last):
      File "<stdin>", line 2, in <module>
      File "c:\Python34\lib\zipfile.py", line 1117, in read
        with self.open(name, "r", pwd) as fp:
      File "c:\Python34\lib\zipfile.py", line 1180, in open
        % (zinfo.orig_filename, fname))
    zipfile.BadZipFile: File name in directory 'test\\' and header b'test/' differ.

    So, based on that everything is already converted to forward slashes for the extraction.

    @ethanfurman
    Copy link
    Member

    Ah, so when you (Adam) said "extraction works fine", what you meant was "extraction works fine *in other programs*". Okay.

    @apolkosnik
    Copy link
    Mannequin

    apolkosnik mannequin commented May 5, 2014

    Both. Other programs, and in python scripts when raise() is removed in zipfile.py. Unless your results are different.

    @MorganRamsay
    Copy link
    Mannequin

    MorganRamsay mannequin commented Jun 18, 2019

    The encoding test in ZipFile.open() is highly opinionated and has no purpose beyond itself. Testing for encoding issues should be done outside this library in the user's own code.

    Using the 3.7.2 version of ZipFile, this is my proposal:

    https://gist.github.com/MorganRamsay/696e89450e0f172c16ac8dfc016eb79f/revisions?diff=unified

    Currently, I'm subclassing ZipFile with this patch and I've had no issues with extracting thousands of different ZIP files on Windows. I can't attest to this solution's applicability on other platforms.

    @MorganRamsay MorganRamsay mannequin added the 3.7 (EOL) end of life label Jun 18, 2019
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life OS-windows stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    Status: No status
    Development

    No branches or pull requests

    10 participants