-
-
Notifications
You must be signed in to change notification settings - Fork 29.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gzip.compress(..., mtime=0) in cpython 3.11+ unexpectedly sets OS byte in gzip header #112346
Comments
In itself I think it may be a good thing that the The problem is just that the change is not explicitly documented, as far as I know. |
to reproduce python/cpython#112346
Hi @dennisvang . This is my fault, as I delegated gzip.compress(mtime=0) to zlib.compress, incorrectly assuming this was the same. The reason is that zlib.compress is faster. But if it leads to behavioral changes, that is not acceptable. I believe this can easily be remedied by removing the codepath. |
I have made a PR. Just now and put Bugfix in the name. Now I hope it will get attention. |
@rhpvorderman Thanks for picking this up. I wonder, if this is the only side-effect, and if the performance gain from using |
Well, as mentioned in the PR, keeping two separate code paths caused issues before. It is best to keep one codepath. There is a mention in the documentation about zlib.compress so users who need the performance can use it themselves. |
@rhpvorderman You're right, that makes sense. |
ping |
For reference, this feature was added in bpo-43613 (gh-87779). It included more optimizations, the only issue with delegating the whole compression to zlib, when mtime is 0. The fix looks correct and it still preserves some speed up. An alternate solution could be to call |
Bug report
description
Using
gzip.compress()
withmtime=0
in 3.8<=cpython<=3.10, theOS
byte, i.e. the 10th byte in the GZIP header, is set to255
"unknown" (also see e.g. #83302):cpython/Lib/gzip.py
Line 599 in dc0adb4
However, in cpython 3.11 and 3.12, the
OS
byte is suddenly set to a "known" value, e.g.3
("Unix") on Ubuntu.This is not mentioned in the changelog for Python 3.11.
This may lead to problems in the context of reproducible builds. In our case, hash checking fails after decompressing and re-compressing a gzipped archive.
how to reproduce
Here's an example, where byte 10 is
\xff
in python 3.10 and\x03
in python 3.11:cause
I guess this is caused by python 3.11 delegating the
gzip.compress()
call tozlib
ifmtime=0
, as mentioned in the docs:and source:
cpython/Lib/gzip.py
Lines 609 to 612 in 89ddea4
Apparently
zlib
does set theOS
byte.CPython versions tested on:
3.8, 3.9, 3.10, 3.11, 3.12
Operating systems tested on:
Linux, macOS, Windows
Linked PRs
The text was updated successfully, but these errors were encountered: