New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand zipimport to include other compression methods #61206
Comments
Only a little of the existing logic is tied to the zipfile format. Consider adding support for xz, tar, tar.gz, tar.bz2, etc. In particular, xz has better compression, resulting in both space savings and faster load times. |
tar.* is not a good choice because it doesn't allow random access. Bare tar better than zip only in case when you need to save additional file attributes (Unix file access mode, times, owner, group, links). ZIP format supports all this too, but not zipfile module yet. Adding bz2 or lzma compression to ZIP file shouldn't be too hard. |
Here are some tests. time 7z a -tzip -mx=0 python-0.zip $(find Lib -type f -name '*.py') >/dev/null Results:
store 0.5 0.2 19.42 *) For pack time I take user time because 7-zip well parallelize deflate and bzip2 compression. As you can see, a size difference between maximal compression with different methods only 3%. lzma decompress almost twice slower then deflate, and bzip2 decompress 5 times slower. Python files are too small to get benefit from advanced compression. |
I think you want to put pyc files in the zip file as well. |
xz will likely be the best win -- it is purported to compress smaller than bz2 while retaining the decompression speed of zip. As Antoine says, the usual practice is to add py, pyc, and pyo files to the compressed library; otherwise, there is an added cost with Python tries to write a missing pyc/pyo file. |
Well. ./python -m compileall $(find Lib -type f -name '*.py') Tests: FILES="$(find Lib -name '*.py' -o -name '*.py[co]')" Results:
store 1.6 0.5 65.4 All numbers are about 3x larger. lzma-max is 8% less than deflate-max but 2.5 times slower. Bzip2 is out of the game. |
Agreed it doesn't look very promising. |
So this seems like a confluence of both supporting compressed files for loading source code as well as supporting new archive formats (e.g. xz vs. tar); zip just happens to do both implicitly. And there is also the question of if you explicitly plan to do this in C code or in pure Python as I plan to introduce a pure Python version of zipimport into importlib for 3.4 so that it can use zipfile directly and thus all of its full support of zipfile abilities. And there doesn't have to be any performance cost in trying to write bytecode files; it's very simple to have a loader which simply skips that step entirely. |
+1 for that. I like XZ support so that our application size can be reduced. |
zipimport has been rewritten in pure Python (bpo-25711). Now it is easier to add support of other compression methods. Although I don't think that reducing the size by 3-8% is worth complicating the code. If you still need this, I think that the simplest way is importing the zipfile module and monkey patching the simple ZIP file implementation in the zipimport module with zipfile-based implementation. This can be made only after importing zipfile itself, i.e. in case of zipping the stdlib, the zipfile module and its dependencies should be stored uncompressed or with the deflate compression. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: