Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong value in general purpose bit flag #4

Closed
marhop opened this issue Sep 27, 2018 · 10 comments
Closed

Wrong value in general purpose bit flag #4

marhop opened this issue Sep 27, 2018 · 10 comments

Comments

@marhop
Copy link
Contributor

marhop commented Sep 27, 2018

Problem Description

I zipped a non-empty text file with the zip64 command line tool. When I unzipped it with another tool (UnZip 6.00 of 20 April 2009, by Debian. Original by Info-ZIP.), the extracted text file was empty. The same problem occurs with files created by SIARD Suite, so this is not limited to the command line tool but the bug lurks somewhere in the library.

Analysis

(See example file below.)

I think there is a problem with the general purpose bit flag in the local file header and in the central directory file header. If I am not mistaken (which may be the case because little endian byte order makes my head hurt), 0408 resolves to 0x0804 = 0000100000000100, so bits 2 and 11 are set. (Counting starts at 0.) According to [APPNOTE section 4.4.4] bit 2 is relevant only for a specific compression type that's not used here, so it counts as undefined.

Maybe it was intended to set bit 3 (0x0808) instead of 2, announcing the data descriptor that follows down below? (Bit 11 denotes UTF-8 file names and comments, so that's OK.)

If bit 3 is set instead of bit 2 the zipped file is extracted correctly.

Example File

This is the hex representation of a ZIP file created with Zip64File 2.0.60 (looks the same with 2.1.34) using the following command where hello.txt contains the string "hello world":

$ zip64 n a.zip hello.txt

If you'd like to play with the hex data (like changing the general purpose bit flags), a binary ZIP file can be compiled from it using this tool.

Local File Header

Note the general purpose bit flag!

504b 0304              # local file header signature
2d00                   # version needed to extract
0408                   # general purpose bit flag → change to 0808!
0000                   # compression method
a97e                   # last mod file time
264d                   # last mod file date
0000 0000              # crc-32
ffff ffff              # compressed size
ffff ffff              # uncompressed size
0900                   # file name length
1400                   # extra field length
6865 6c6c 6f2e 7478 74 # file name "hello.txt"

0100                   # header ID "Zip64 extended information extra field"
1000                   # data size
0000 0000 0000 0000    # uncompressed size
0000 0000 0000 0000    # compressed size

File Data

ASCII string "hello world", uncompressed. This looks fine.

6865 6c6c 6f20 776f 726c 640a

Data Descriptor

Looks OK.

504b 0708              # data descriptor signature
2d3b 08af              # crc-32
0c00 0000 0000 0000    # compressed size
0c00 0000 0000 0000    # uncompressed size

Central Directory Header

Note the general purpose bit flag!

504b 0102              # central file header signature
2d00                   # version made by
2d00                   # version needed to extract
0408                   # general purpose bit flag → change to 0808!
0000                   # compression method
a97e                   # last mod file time
264d                   # last mod file date
2d3b 08af              # crc-32
0c00 0000              # compressed size
0c00 0000              # uncompressed size
0900                   # file name length
0000                   # extra field length
0000                   # file comment length
0000                   # disk number start
0000                   # internal file attributes
0000 0000              # external file attributes
0000 0000              # relative offset of local header
6865 6c6c 6f2e 7478 74 # file name "hello.txt"

End Of Central Directory Record

Looks OK.

504b 0506              # end of central directory signature
0000                   # number of this disk
0000                   # number of the disk with start of central directory
0100                   # number of entries in central directory on this disk
0100                   # number of entries in central directory
3700 0000              # size of central directory
5f00 0000              # offset of start of central directory
0000                   # ZIP file comment length
@HartwigThomas
Copy link
Collaborator

As mentioned in the documentation, zip64file only writes ZIP files in the ZIP64 format.
(It does, however, read ZIP32 as well as ZIP64 files.)
If I remember correctly, Info-ZIP still could not handle the ZIP64 format correctly.
The only trustworthy benchmark is pkzip from pkware.

@marhop
Copy link
Contributor Author

marhop commented Sep 27, 2018

Info-ZIP UnZip supports ZIP64 since version 6.0, see their website. BTW, I updated my issue text that originally had the version number for zip, not unzip.

I do not own a copy of pkzip, so I cannot check this. Could you?

But anyway, even without access to pkzip, I'm pretty sure that according to the ZIP specification by PKWARE section 4.4.4, the general purpose bit flag should have bit 3 set, not bit 2 (NB, counting these bits starts at 0).

@HartwigThomas
Copy link
Collaborator

I know they (Info-ZIP) claim it, but it is not true.
Because PKWARE software costs something, I have put zip64file into the public domain, so there is at least one free tool which can be used for unzipping zip64 files.
The command-line syntax is described in the manual.

Do you have a sample file, which I could use to test with PKZIP?

I shall compare section 4.4.4, but I must insist, that I have run literally 1000s of tests against PKZIP.
(Also java.util.zip claims ZIP64 compatibility but does not deliver. Otherwise I would have dropped zip64file from my activities.)

@marhop
Copy link
Contributor Author

marhop commented Sep 28, 2018

Here are some sample files, thanks for checking!

  • a.zip - a ZIP file made with the zip64 command line tool (the one represented as hex above)
  • b.zip - the same, but with the general purpose bit flags in the local file header and in the central directory header adapted as described (0808 instead of 0408)

Can you point out any specific problems with Info-ZIP's ZIP64 support? I found nothing suspicious in the files created with their tool yet.

@HartwigThomas
Copy link
Collaborator

General Purpose Flag (local file header, offset 6 in file)

zip64file: (big-endian 0808 = little-endian 0808)
Bit 0: 0
Bit 1: 0
Bit 2: 0
Bit 3: 1
Bit 4: 0
Bit 5: 0
Bit 6: 0
Bit 7: 0
Bit 8: 0
Bit 9: 0
Bit 10: 0
Bit 11: 1
Bit 12: 0
Bit 13: 0
Bit 14: 0
Bit 15: 0

According to APNNOTE.txt this means:
bit 2: normal deflate (only compresssion supported by Zip64File) or uncompressed
bit 3: deferred crc, size, compressed size:
"If this bit is set, the fields crc-32, compressed
size and uncompressed size are set to zero in the
local header. The correct values are put in the
data descriptor immediately following the compressed
data. (Note: PKZIP version 2.04g for DOS only
recognizes this bit for method 8 compression, newer
versions of PKZIP recognize this bit for any compression method.)" (from APPNOTE.txt)
Zip64File prefers to write the file size AFTER compressing the file.
(Rereading a stream is often difficult. Using the stream interface has massive advantages.
We avoid storing the compressed file locally, which might get us into lack of free space
problems on the disk ...)
bit 11: EFS Flag Unicode is used for text fields (e.g. comment)

Info-Zip: 0408 big-endian = 0804 little-endian
Bit 0: 0
Bit 1: 0
Bit 2: 1
Bit 3: 0
Bit 4: 0
Bit 5: 0
Bit 6: 0
Bit 7: 0
Bit 8: 0
Bit 9: 0
Bit 10: 0
Bit 11: 1
Bit 12: 0
Bit 13: 0
Bit 14: 0
Bit 15: 0

Bit 2: fast compression
Bit 3: Info-ZIP writes file size before file data (which is OK)
Bit 11: Info-ZIP also opts for Unicode

Both codings appear to be valid. Both ZIP files can be read by PKZIP.
Both ZIP files can be read the Zip64File. But Zip64File issues a wrong warning,
because it handles the deferred bit incorrectly.

Info-ZIP, however, cannot handle the 0 deferred file size in the header and fails reading a.zip.

This, among others, documents my problems with Info-ZIP.

Conclusion: On reading Zip64File should respect the flag and not issue a warning.

@HartwigThomas
Copy link
Collaborator

In the above comment I confused a.zip and b.zip. You are completely right: in FileEntry.java I had
/** flag for deferred data (crc, size, compressed size) in local header: Bit 3 */
public static final int iFLAG_DEFERRED = 0x00000004;

(The comment is OK, the value is wrong.) It should be:
/** flag for deferred data (crc, size, compressed size) in local header: Bit 3 */
public static final int iFLAG_DEFERRED = 0x00000008;

It is interesting, that PKWARE did not have any problems with this. Apparently they just always read the extended values after the file without worrying about the flag.

@marhop
Copy link
Contributor Author

marhop commented Oct 1, 2018

OK, cool! Will you fix this yourself or should I prepare a pull request?

It is interesting, that PKWARE did not have any problems with this. Apparently they just always read the extended values after the file without worrying about the flag.

Yeah, that's weird. Or maybe they read the size from the central directory header instead? Given the amount of redundancy in a ZIP file there are lots of possibilities ...

@HartwigThomas
Copy link
Collaborator

I had to change getDataDescriptor() such, that it would read the old SIARD files with the flag error as well as new SIARD files with the correct flag set. This should make future SIARD files accessible to Info-ZIP.
The new version will be available in the dev/enter branch tonight.

@marhop
Copy link
Contributor Author

marhop commented Oct 1, 2018

Ah, good idea! So it's still possible to open and edit "old" SIARD files with SIARD Suite and when saving them the correct flag will be written? That's nice.

@HartwigThomas
Copy link
Collaborator

fixed and tested it in branch dev/enter. Will merge with master branch after handling other issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants