-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
you should probably mention this library is not actually zip standards compliant #7
Comments
sunzip does read and use the central directory. It does not finalize the extraction of the files until it has found them in the central directory. sunzip is in fact compliant with the zip standard. It simply changes the order of operations so as to be able to read the zip file as a stream. If you have a bug to report, it would be polite to provide an example file manifesting the bug. |
Here you go. Unzips with all the normal TOC reading zip utils but not sunzip |
Thank you! |
That is not a compliant zip file. There are two end-of-central-directory records. Per the appnote: 'A ZIP file MUST have only one "end of central directory record".' |
It is compliant. It was made with Info-Zip, It is read correctly even by pkzip2.04g in a dosbox from 1993. MacOS's built in zip support in finder reads it correctly. Windows built in zip support in explorer reads it correct. Winzip reads it correct. WinRaR reads it correctly. But in any case here's a another without the middle central directory, works all the readers above, fails in sunzip. |
Thank you for the example. However you overwrote the central directory header instead of the end-of-central-directory record. You don't need to provide another one. I'll overwrite the correct thing. How do you make these with Info-ZIP's zip? Note that software being able to read it has nothing to do with compliance with the document. By the way, that's not why the central directory is at the end. You don't need to consider adding to an existing zip file. Just consider making a new zip file. You don't want to put the central directory at the start because you don't know how big to make it until you have written all the entries. To put it at the beginning, you would need to know how many entries there will be and what all their file name lengths will be, needing two scans of the file system you are compressing from. (Not much memory back then.) Also interestingly, there is nothing in the appnote that says that the central directory needs to be at the end, and in fact there are zip files with the central directory elsewhere. Only the end-of-central-directory record is expected to be at the end by the software you listed, though oddly the appnote doesn't require that either. |
From the app note
From Info-Zip
What's in that self-extractor is entirely arbitrary. |
If you're curious I contacted pkware, unfortunately their answers were wishy-washy. They did seem to confirm that the end-of-central-directory-record is supposed to be at the end of the file
So if you saw a zip file with the end-of-central-directory-record not at the end of the file then it was not a valid zip file They confirmed that a self extracting zip is a valid zip file
They didn't confirm whether or not having anything in the self extracting part that appears to be any of the zip records like the end-of-central-directory-record is a valid or not.I asked for clarification of that point but have not heard back. But, it should be obvious that just writing code like
in the self extractor will possibly end up with that value in the self extractor and a forward scanner is likely to misinterpret it. As a test I just built Info-Zip from source on linux, created a zip, prepended the info-zip self extractor, ran |
Thanks for the investigation. I will be updating sunzip to address many zip file variants and ill-formed zip files, as noted by your report. In the meantime, consider this beauty that I just constructed: There are no constraints on the contents of the zip file comment in the end-of-central-directory record. In the linked file, I put a zip file in the comment. If that zip file is searched from the end, like unzip does, it finds the zip file in the comment, with the entry named "1". There are 100 extra bytes at the start that could be considered to be a self-extractor, making it a valid zip file. If that zip file is read from the beginning, as sunzip does, it finds that the entire file is a valid zip file, with the single entry named "2". Its end record has a 100-byte comment at the end, ending that record exactly at the end of the file. Looked at either way, the zip file is entirely valid. So does that zip file contain the entry "1" or the entry "2"? |
I agree the APPNOTE is poorly written. This is just speculation. I assume a comment is a string and back when PKZIP was invented it's likely the Phil Katz only thought about readable ASCII in that area. But I agree with you it's problematic in that the spec doesn't' cover the case that the data inside the comment happens to look like a zip file. 0x50 0x4b 0x05 0x06 are valid ASCII ¯\(ツ)/¯ Another issue is can there be gaps between records. Clearly since a self extracting chunk fits in front there is certainly precedent. For example, following PKWare's claim that the diagram of chunks in the APPNOTE show the end-of-central-directory-record is at the end of the file, the same claim can be made that that diagram shows no self extracting portion in front. There's this from my correspondence which doesn't help either
That confirms to me that you can just update the TOC. But it doesn't answer if there can be gaps between records or unused records. It does answer that pkunzip would not care if there are gaps between records or unused records. Some more poorly thought out spec
Saying "Files MAY be added or replaced within a ZIP file, or deleted" has no place in the spec unless the replaced or deleted data is still in the file. Otherwise there is no such thing. There are just new zip files. There is just file FOO with ABC in it, and file FOO2 with ABCD (D added) and file FOO3 with BCD (A deleted). Where as that sentence does make sense IF you have to read the TOC and there can be records in the file the TOC doesn't reference (files deleted from the TOC, files not referenced by the TOC because a new version of the same file was appended and only that version is referenced by the TOC) Sigh......... 😭 |
The reason other libraries read the TOC is because it's the source of truth about the contents of the file. Just like a hard drive has tables of contents but scanning every sector you might find lost files, so is zip. The entire reason the TOC is at the end of the file is so you can easily update small files by just appending data to the end of the zip file and writing a new TOC, no need to read the entire file and write a one. That is by design since zip was designed in the era of floppy disks when re-writing files, especially across multiple disks, could take 20-60 minutes.
It would be polite to tell people this library will fail on some percentage of zip files because it's not correctly reading the TOC.
The text was updated successfully, but these errors were encountered: