Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand zipimport to include other compression methods #61206

Open
rhettinger opened this issue Jan 20, 2013 · 11 comments
Open

Expand zipimport to include other compression methods #61206

rhettinger opened this issue Jan 20, 2013 · 11 comments
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@rhettinger
Copy link
Contributor

BPO 17004
Nosy @rhettinger, @gpshead, @pitrou, @briancurtin, @ericsnowcurrently, @serhiy-storchaka

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2013-01-20.18:41:42.824>
labels = ['type-feature', 'library', '3.11']
title = 'Expand zipimport to include other compression methods'
updated_at = <Date 2022-04-06.03:00:34.782>
user = 'https://github.com/rhettinger'

bugs.python.org fields:

activity = <Date 2022-04-06.03:00:34.782>
actor = 'yan12125'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2013-01-20.18:41:42.824>
creator = 'rhettinger'
dependencies = []
files = []
hgrepos = []
issue_num = 17004
keywords = []
message_count = 11.0
messages = ['180307', '180310', '180311', '180313', '180314', '180323', '180324', '180347', '220589', '267527', '325729']
nosy_count = 8.0
nosy_names = ['rhettinger', 'gregory.p.smith', 'pitrou', 'nadeem.vawda', 'brian.curtin', 'eric.snow', 'serhiy.storchaka', 'superluser']
pr_nums = []
priority = 'normal'
resolution = None
stage = 'needs patch'
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue17004'
versions = ['Python 3.11']

@rhettinger
Copy link
Contributor Author

Only a little of the existing logic is tied to the zipfile format. Consider adding support for xz, tar, tar.gz, tar.bz2, etc.

In particular, xz has better compression, resulting in both space savings and faster load times.

@rhettinger rhettinger added the type-feature A feature request or enhancement label Jan 20, 2013
@briancurtin briancurtin added the stdlib Python modules in the Lib dir label Jan 20, 2013
@serhiy-storchaka
Copy link
Member

tar.* is not a good choice because it doesn't allow random access. Bare tar better than zip only in case when you need to save additional file attributes (Unix file access mode, times, owner, group, links). ZIP format supports all this too, but not zipfile module yet.

Adding bz2 or lzma compression to ZIP file shouldn't be too hard.

@serhiy-storchaka
Copy link
Member

Here are some tests.

time 7z a -tzip -mx=0 python-0.zip $(find Lib -type f -name '*.py') >/dev/null
time 7z a -tzip python.zip $(find Lib -type f -name '*.py') >/dev/null
time 7z a -tzip -mx=9 python-9.zip $(find Lib -type f -name '*.py') >/dev/null
time 7z a -tzip -mm=bzip2 python-bzip2.zip $(find Lib -type f -name '*.py') >/dev/null
time 7z a -tzip -mm=bzip2 -mx=9 python-bzip2-9.zip $(find Lib -type f -name '*.py') >/dev/null
time 7z a -tzip -mm=lzma python-lzma.zip $(find Lib -type f -name '*.py') >/dev/null
time 7z a -tzip -mm=lzma -mx=9 python-lzma-9.zip $(find Lib -type f -name '*.py') >/dev/null
time 7z t python-0.zip >/dev/null
time 7z t python.zip >/dev/null
time 7z t python-9.zip >/dev/null
time 7z t python-bzip2.zip >/dev/null
time 7z t python-bzip2-9.zip >/dev/null
time 7z t python-lzma >/dev/null
time 7z t python-lzma.zip >/dev/null
time 7z t python-lzma-9.zip >/dev/null
wc -c python*.zip

Results:

         pack* unpack   size
         time   time    (MB)

store 0.5 0.2 19.42
deflate 6 0.4 4.59
deflate-max 40 0.4 4.52
bzip2 6 2.1 4.45
bzip2-max 79 2.0 4.39
lzma 37 0.7 4.42
lzma-max 62 0.7 4.39

*) For pack time I take user time because 7-zip well parallelize deflate and bzip2 compression.

As you can see, a size difference between maximal compression with different methods only 3%. lzma decompress almost twice slower then deflate, and bzip2 decompress 5 times slower. Python files are too small to get benefit from advanced compression.

@pitrou
Copy link
Member

pitrou commented Jan 20, 2013

Here are some tests.

I think you want to put pyc files in the zip file as well.

@rhettinger
Copy link
Contributor Author

xz will likely be the best win -- it is purported to compress smaller than bz2 while retaining the decompression speed of zip.

As Antoine says, the usual practice is to add py, pyc, and pyo files to the compressed library; otherwise, there is an added cost with Python tries to write a missing pyc/pyo file.

@serhiy-storchaka
Copy link
Member

Well.

./python -m compileall $(find Lib -type f -name '*.py')
./python -O -m compileall $(find Lib -type f -name '*.py')

Tests:

FILES="$(find Lib -name '*.py' -o -name '*.py[co]')"
time 7z a -tzip -mx=0 python-0.zip $FILES >/dev/null
time 7z a -tzip python.zip $FILES >/dev/null
time 7z a -tzip -mx=9 python-9.zip $FILES >/dev/null
time 7z a -tzip -mm=bzip2 python-bzip2.zip $FILES >/dev/null
time 7z a -tzip -mm=bzip2 -mx=9 python-bzip2-9.zip $FILES >/dev/null
time 7z a -tzip -mm=lzma python-lzma.zip $FILES >/dev/null
time 7z a -tzip -mm=lzma -mx=9 python-lzma-9.zip $FILES >/dev/null
time 7z t python-0.zip >/dev/null
time 7z t python.zip >/dev/null
time 7z t python-9.zip >/dev/null
time 7z t python-bzip2.zip >/dev/null
time 7z t python-bzip2-9.zip >/dev/null
time 7z t python-lzma.zip >/dev/null
time 7z t python-lzma-9.zip >/dev/null
wc -c python*.zip

Results:

         pack  unpack   size
         time   time    (MB)

store 1.6 0.5 65.4
deflate 19 0.9 17.5
deflate-max 134 0.9 17.2
bzip2 21 4.2 16.5
bzip2-max 294 4.1 16.3
lzma 120 2.3 15.9
lzma-max 204 2.3 15.8

All numbers are about 3x larger. lzma-max is 8% less than deflate-max but 2.5 times slower. Bzip2 is out of the game.

@pitrou
Copy link
Member

pitrou commented Jan 20, 2013

Agreed it doesn't look very promising.

@brettcannon
Copy link
Member

So this seems like a confluence of both supporting compressed files for loading source code as well as supporting new archive formats (e.g. xz vs. tar); zip just happens to do both implicitly. And there is also the question of if you explicitly plan to do this in C code or in pure Python as I plan to introduce a pure Python version of zipimport into importlib for 3.4 so that it can use zipfile directly and thus all of its full support of zipfile abilities.

And there doesn't have to be any performance cost in trying to write bytecode files; it's very simple to have a loader which simply skips that step entirely.

@ericsnowcurrently
Copy link
Member

related: issue bpo-17630 and issue bpo-5950

@yan12125
Copy link
Mannequin

yan12125 mannequin commented Jun 6, 2016

+1 for that. I like XZ support so that our application size can be reduced.

@serhiy-storchaka
Copy link
Member

zipimport has been rewritten in pure Python (bpo-25711). Now it is easier to add support of other compression methods. Although I don't think that reducing the size by 3-8% is worth complicating the code.

If you still need this, I think that the simplest way is importing the zipfile module and monkey patching the simple ZIP file implementation in the zipimport module with zipfile-based implementation. This can be made only after importing zipfile itself, i.e. in case of zipping the stdlib, the zipfile module and its dependencies should be stored uncompressed or with the deflate compression.

@serhiy-storchaka serhiy-storchaka added the 3.8 only security fixes label Sep 19, 2018
@tiran tiran added 3.11 only security fixes and removed 3.8 only security fixes labels Apr 5, 2022
@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@erlend-aasland erlend-aasland removed the 3.11 only security fixes label Mar 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

8 participants