Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The zipfile module does not check files' CRCs, including in ZipFile.testzip #51716

Closed
dougturk mannequin opened this issue Dec 10, 2009 · 12 comments
Closed

The zipfile module does not check files' CRCs, including in ZipFile.testzip #51716

dougturk mannequin opened this issue Dec 10, 2009 · 12 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@dougturk
Copy link
Mannequin

dougturk mannequin commented Dec 10, 2009

BPO 7467
Nosy @warsaw, @pitrou
Files
  • test.zip: Example corrupted zip file
  • fix_zipfile_crc_issue_7467.patch: Patch against trunk
  • zipcrc.patch
  • zipcrc2.patch
  • zipcrc.patch: 2.6 patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/pitrou'
    closed_at = <Date 2010-08-13.19:13:33.141>
    created_at = <Date 2009-12-10.06:59:30.207>
    labels = ['type-bug', 'library']
    title = "The zipfile module does not check files' CRCs, including in ZipFile.testzip"
    updated_at = <Date 2010-08-13.19:13:33.122>
    user = 'https://bugs.python.org/dougturk'

    bugs.python.org fields:

    activity = <Date 2010-08-13.19:13:33.122>
    actor = 'barry'
    assignee = 'pitrou'
    closed = True
    closed_date = <Date 2010-08-13.19:13:33.141>
    closer = 'barry'
    components = ['Library (Lib)']
    creation = <Date 2009-12-10.06:59:30.207>
    creator = 'dougturk'
    dependencies = []
    files = ['15515', '17984', '18461', '18484', '18492']
    hgrepos = []
    issue_num = 7467
    keywords = ['patch']
    message_count = 12.0
    messages = ['96194', '109989', '113428', '113477', '113493', '113494', '113669', '113680', '113682', '113684', '113719', '113817']
    nosy_count = 5.0
    nosy_names = ['barry', 'pitrou', 'nirai', 'dougturk', 'BreamoreBoy']
    pr_nums = []
    priority = None
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue7467'
    versions = ['Python 2.6']

    @dougturk
    Copy link
    Mannequin Author

    dougturk mannequin commented Dec 10, 2009

    The zipfile module does not calculate the CRC of files in a zipfile as
    they are read as of Python 2.6. The old ZipFile.read function in Python
    2.5 used to do this, and ZipFile.testzip appears to rely on ZipFile.read
    to check the CRC. This means that ZipFile.testzip does not check the CRC
    as it is documented to.

    It would be useful if ZipExtFile could check the CRC once it had read
    all the data, because this would mean that e.g. ZipFile.extract would
    also automatically check the CRC. Perhaps ZipExtFile.read could raise a
    BadZipFile exception if it reaches EOF and the CRC is wrong.

    Steps to reproduce:
    -Create a zip file, then change a byte in a file's contents (easiest if
    files are stored, not deflated).
    -Run ZipFile.testzip on the edited file. Note that it does not report a
    CRC failure, but other zip tools do.

    I can provide a patch in a couple of days if that's useful.

    @dougturk dougturk mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Dec 10, 2009
    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Jul 11, 2010

    Douglas could you please provide a patch.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 9, 2010

    Patch looks good to me.

    @pitrou pitrou self-assigned this Aug 9, 2010
    @pitrou
    Copy link
    Member

    pitrou commented Aug 9, 2010

    Actually, there are two places where the internal buffer is trimmed from consumed data:

                self._readbuffer = self._readbuffer[self._offset:] + data
                self._offset = 0

    At this point, it seems self._crc_offset should also be reset to zero, otherwise some data will be forgotten on the next read() call.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 9, 2010

    This is an updated patch with fixed (hopefully) CRC logic in the face of buffering, and additional tests.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 9, 2010

    Nir, want to take a look?

    @nirai
    Copy link
    Mannequin

    nirai mannequin commented Aug 12, 2010

    I think patch may be simplified. Instead of keeping track of CRC offset, invoke it directly on the 'data' variable being added to _readbuffer.

    Also the call to _update_crc() before the return from read1() looks redundant.

    Finally, is it possible to determine end of file if length of 'data' (computed for crc) is 0?

    @pitrou
    Copy link
    Member

    pitrou commented Aug 12, 2010

    Ok, here is a new patch with a simpler logic. I've also added tests with a deflated zipfile entry with a bad CRC (in addition to the one with a stored zipfile entry).

    Finally, is it possible to determine end of file if length of 'data'
    (computed for crc) is 0?

    I'm not sure I understand the question.

    @nirai
    Copy link
    Mannequin

    nirai mannequin commented Aug 12, 2010

    But you answered my question with code :) self._file_size is now unused and may be removed.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 12, 2010

    Ah, indeed. I've committed an updated patch in r83959 (py3k), r83962 (3.1) and r83961 (2.7). Thank you!

    @pitrou
    Copy link
    Member

    pitrou commented Aug 12, 2010

    At Barry's request, here is a patch for potential backport to 2.6.

    @warsaw
    Copy link
    Member

    warsaw commented Aug 13, 2010

    After chatting with __ap__ on irc, i'm going to reject this patch for 2.6.6.

    @warsaw warsaw closed this as completed Aug 13, 2010
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants