Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-21417: Compression level for zipfile #5385

Merged
merged 13 commits into from Jan 30, 2018

Conversation

Projects
None yet
6 participants
@bbayles
Copy link
Contributor

commented Jan 28, 2018

This PR adds a compresslevel parameter to zipfile.ZipFile(), similar to the
one that exists for tarfile.open().

A few things to point out:

  1. The default compression type and level remain unchanged. In the issue someone thought compresslevel=None might make it look like no compression is being done - this is in fact the case with the default ZIP_STORED.
  2. The ZIP_DEFLATED and ZIP_BZIP2 compression types support different levels - 0 through 9 for the former; 1 through 9 for the latter. Specifying mode='w:bz2' in tarfile.open() leads to a ValueError; the same thing happens here.
  3. You can set compresslevel on mode='r', but it doesn't do anything. This matches tarfile.open()'s behavior - there you can do stuff like compresslevel=9000 and compresslevel='kittens' with no error.
  4. The ZIP_LZMA compression type doesn't support "levels." It has a similar concept called a preset, however we can't set that here because we're using custom filters. These are mutually exclusive.
    • Because of this, if you try to set mode='w:xz' and compresslevel=9 (or whatever) in tarfile.open, you get an obscure TypeError. I will file a new bug for that.
    • Because of this, I've made compresslevel for ZIP_LZMA and ZIP_STORED have no effect for zipfile.ZipFile. I could raise an exception instead, but given that tarfile.open() ignores parameters that don't make sense (e.g. mode='r:gz' and compresslevel=9000) I thought this was appropriate.
  5. I've added tests to exercise this new functionality to the test_zipfile module. In my own testing I used this script, which shows things in action.
  6. I did not touch PyZipFile() for simplicity's sake; I can be persuaded to try it out.

I will add a few more notes inline.

https://bugs.python.org/issue21417


Open a ZIP file, where *file* can be a path to a file (a string), a
file-like object or a :term:`path-like object`.

This comment has been minimized.

Copy link
@bbayles

bbayles Jan 28, 2018

Author Contributor

Per the style guide, I took the liberty of adding line breaks here; the constructor explanation was already quite long.

@@ -295,6 +295,7 @@ class ZipInfo (object):
'filename',
'date_time',
'compress_type',
'_compress_level',

This comment has been minimized.

Copy link
@bbayles

bbayles Jan 28, 2018

Author Contributor

I've prefixed this with an underscore because it's not exposed when reading back. ZipInfo has another "private" attribute, _raw_time below.

writing files to the archive. When using :const:`ZIP_STORED` or
:const:`ZIP_LZMA` it has no effect. When using :const:`ZIP_DEFLATED`
integers ``0`` through ``9`` are accepted. When using :const:`ZIP_BZIP2`
integers ``1`` through ``9`` are accepted.

This comment has been minimized.

Copy link
@vadmium

vadmium Jan 29, 2018

Member

Maybe it is worth linking this to the compresslevel parameters of the :class:`gzip.GzipFile` and :class:`bz2.Bz2File` constructors, which explain the acceptable values.

This comment has been minimized.

Copy link
@bbayles

bbayles Jan 29, 2018

Author Contributor

Hmm, I wonder if zlib.compressobj and bz2.BZ2Compressor would be more appropriate? I don't want to give the false impression that GzipFile or Bz2File are actually used at all.

This comment has been minimized.

Copy link
@bbayles

bbayles Jan 29, 2018

Author Contributor

I went with your suggestion; the link to the zlib.compressobj() function looks weird when rendered.

This comment has been minimized.

Copy link
@vadmium

vadmium Jan 29, 2018

Member

Sorry, yeah zlib.compressobj would have been more appropriate (I was confused; the gzip format is not even used in zip files, just the underlying Deflate algorithm). My point was that you should explain what the different values mean, compared to None. How do the different values control the compression level relative to the default of None?

This comment has been minimized.

Copy link
@bbayles

bbayles Jan 29, 2018

Author Contributor

Right, yeah. I think links to the places where this is already described make sense, as you suggest. (we could add that to tarfile in another PR also; it's missing the explanation)

Current version looks reasonable when rendered:

image


Write the file named *filename* to the archive, giving it the archive name
*arcname* (by default, this will be the same as *filename*, but without a drive
letter and with leading path separators removed). If given, *compress_type*
overrides the value given for the *compression* parameter to the constructor for
the new entry.
the new entry. Similarly, if given, *compress_level* overrides the *compresslevel*
parameter.

This comment has been minimized.

Copy link
@vadmium

vadmium Jan 29, 2018

Member

Spelling is too similar yet inconsistent. Wouldn’t it be better to remove the underscore from all parameters, and differentiate them as method parameters versus the constructor parameter?

This comment has been minimized.

Copy link
@bbayles

bbayles Jan 29, 2018

Author Contributor

I was following the example of compression (constructor's argument) to compress_type (method's argument). It's not clear to me why those are different, but if it's OK for compresslevel to be the same throughout that's fine by me?

self.assertEqual(a_info.compress_type, self.compression)
self.assertEqual(a_info._compress_level, 1)

# Compression level is overriden.

This comment has been minimized.

Copy link
@vadmium

vadmium Jan 29, 2018

Member

overridden

bbayles added some commits Jan 29, 2018

@KNareshseo

This comment has been minimized.

Copy link

commented Jan 29, 2018

The only way is to recompress the zip file with different levels until you find the one that matches the lengths. You could just recompress one of the entries to find the level, on the assumption that the entire zip file used the same level.

Even that only works if you know the tool, and the version of the tool that was used. E.g. 7z, WinZip, Info-ZIP.

@gpshead
Copy link
Member

left a comment

thank you for this!

@gpshead gpshead merged commit ce237c7 into python:master Jan 30, 2018

4 checks passed

bedevere/issue-number Issue number 21417 found
Details
bedevere/news News entry found in Misc/NEWS.d
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.