-
-
Notifications
You must be signed in to change notification settings - Fork 31.7k
zipfile.ZipFile() unable to open zip File with a short extra header #58523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
zipfile.ZipFile("bla.apk") crashes, but should not since other tools work perfectly with this file. Python 2.7.2+ (default, Oct 4 2011, 20:06:09)
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import zipfile
>>> zipfile.ZipFile("bla.apk")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/zipfile.py", line 710, in __init__
self._GetContents()
File "/usr/lib/python2.7/zipfile.py", line 744, in _GetContents
self._RealGetContents()
File "/usr/lib/python2.7/zipfile.py", line 803, in _RealGetContents
x._decodeExtra()
File "/usr/lib/python2.7/zipfile.py", line 369, in _decodeExtra
tp, ln = unpack('<HH', extra[:4])
struct.error: unpack requires a string argument of length 4 |
.apk is primarily used for Android Package files A 'crash' is a segfault or equvalent. A traceback is a graceful shutdown and not a crash. I reproduced the traceback on 3.3.0a1. It appears to say that there are not enough bytes. import struct
struct.unpack('<HH', b'abc')
#produces the same error
struct.error: unpack requires a bytes object of length 4 The 3.x message is clearer in that the second arg is a bytestring, not a char string. It would not hurt it it said how many bytes it did get. However, the behavior that leads to the message baffles me. The _decodeExtra code looks like this, with some prints that I added. def _decodeExtra(self):
# Try to decode the extra field.
extra = self.extra
unpack = struct.unpack
while extra:
print(extra)
tp, ln = unpack('<HH', extra[:4])
print(tp,ln)
if tp == 1:
pass # substitute for actual code not called
extra = extra[ln+4:]
print(extra) The output is
b'\xfe\xca\x00\x00'
51966 0
b''
b'\x00'
Traceback (most recent call last): ... So it looks like extra was properly formatted and properly snipped to 0 bytes, but somehow gained a null byte before being tested again. 1 byte is not 4 bytes and hence the error, but how? |
This is print from *other* _decodeExtra call. Add print('start') at start of _decodeExtra and you will see it. start |
Thanks. There must be multiple extra fields allowed. This output suggests that a) zipfile mis-parses the extra field in some peculiar case (which I doubt) or b) the file either gives a bad field length or third extra field is mal-formed and that other tools catch the error and ignore the bad extra field whereas zipfile does not. |
This is padding. Other implimentations just ignore it and Python must. |
Anyone can look at the patch? This same problem reported in http://bugs.python.org/msg73317 . |
If we're modifying zipfile, please consider adding the reviewed patch for ZipFile.remove at http://bugs.python.org/issue6818 |
Serhiy: can you please provide a unit test? The OP's test case is unsuitable because of both size and copyright. As for the actual issue: the extra data (type 0xcafe) is apparently added by jar tools to make the jar file executable: ISTM that the extra 0 byte is in clear violation of the ZIP specification, which says that there always be type+length headers, followed by length data. So if the length is 0, there ought not to be any data. I don't buy the "padding" argument, since a) the spec is silent on any padding, and b) for alignment reasons, it's not plausible to add any padding, since it is aligned fine without this padding, and unaligned with the padding. I can sympathize with the desire to accept the zipfile, anyway (i.e. despite it being broken). At the same time, I also think that Python should not let errors pass silently. So as a way out, I propose that the ZipFile class gains a "strict" attribute, indicating whether "acceptable" violations of the spec are ignored or reported as exceptions. So -1 on the patch as it stands (but thanks for the analysis). |
I provided the test several minutes ago. |
I do not know other implementation of ZIP, which output an error or a
It is a not easy task (and unnecessary, I suppose). Now zipfile ignores |
Why do you consider this difficult? Just add a "strict" flag, make I'm not asking that you implement a complete validity check for all |
If the module does not actually checks the correctness of all Note that the parameter "strict" in htmlparser is considered to be |
This is definitely *not* a padding issue. $ zipinfo -v bla.apk ... Central directory entry #3: There are an extra 16 bytes preceding this file. res/raw/ice.png offset of local header from start of archive: 1190 (000004A6h) bytes The central-directory extra field contains: There is no file comment. Our implementation can't deal with that because of the line that Terry
As Martin pointed out, the standard says that things must be in |
FWIW, I think it will OK to just ignore extra fields that we can't |
This is definitely a padding issue. All uncompressed files are located
More precisely, the extra field must have at least 4-bytes length to fit
De jure the record is non-portable. De facto the record is portable |
On Mon, May 14, 2012 at 12:31 PM, Serhiy Storchaka
So? Someone may be using the extra fields to pad things, but for the purpose Did you look at the decoding I sent? The extra length field length is clearly
How is it insufficiently defined at all? It says [1]:
The ellipsis is just a standard convention for indicating a repeating
Like I said before, I am all for dropping extra fields we can not |
That can't possibly be the reason. mmap requires 4k (4096) alignment (on Perhaps there is a reason for 4-alignment - but it can't be mmap- |
I would still like to see the zipfile module to be able to detect |
This may be the reason to mmap the entire file and then read aligned |
Agreed. We already do produce some informational messages via the I like the idea of reusing what we already have. Although, when it |
Any chance to commit the patch before final feature freeze? |
Strictly speaking, the feature freeze has passed already, so you'll need the time machine. However, this is not the *final* feature freeze - there will be a 3.4 release of Python also. |
This issue is currently marked as a bugfix for current releases also. Since the proposal seems to be to introduce a new flag, should it be re-classified as enhancement (for 3.4)? |
I think this is a bugfix. I also don't think that "strict" attribute |
I still don't see the bug; the module is behaving correctly - it is the zipfile that is buggy. Supporting this specific kind of buggy zipfiles is a new feature. For reasons discussed, I also disagree with the proposed patch (file24902). If this helps to advance this issue, I can formally reject it if desired. |
Given that the overall trend in the stdlib seems to be to be more lenient in parsing, as with htmlparser, and that the not-really-strict 'strict' parameter for that is being removed, (bpo-15114) I do not understand the objection to accepting one more malformation in zip files and the desire to add a not-really-strict 'strict' parameter here. Perhaps this philosophical issue should be discussed on pydev. |
I agree. The module is behaving correctly. |
Yes, please.
. . . [;pause;] Totally agree. But at the very least it should check the size and raise a proper exception (e.g. BadZipFile). FWIW, the code is already avoiding proper bounds checking using the built-in behavior of slicing --
-- which would otherwise throw exceptions if slice didn't innately and silently fail (due to an index being out-of-bounds).
You realize that you are calling user-controlled data buggy, right? |
New changeset 6dd5e9556a60 by Gregory P. Smith in branch '2.7': New changeset 33843896ce4e by Gregory P. Smith in branch '3.4': New changeset 89be1419472c by Gregory P. Smith in branch 'default': |
This was never an enhancement. zipfile was failing to properly deal with real world data that other zip file tools on the planet were perfectly happy to deal with. That's a bug. Fixed. Practicality beats purity. The zipfile module is not meant to be a zip "standard" validation tool. The other discussions in this bug about adding actual features such as a "strict" mode flag could be done but should really go in feature requests of their own. The zipfile module is... not a wonderful body of code (understatement). Example: It is still quite possible to get non zipfile.BadZipFile exceptions such as struct.error based on various arrangements of input data. |
Just for information: the padding in APK files is added by the zipalign utility [1]. [1] https://developer.android.com/studio/command-line/zipalign |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: