Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8313765: Invalid CEN header (invalid zip64 extra data field size) #15273

Closed

Conversation

LanceAndersen
Copy link
Contributor

@LanceAndersen LanceAndersen commented Aug 14, 2023

This PR updates the extra field validation added as part of JDK-8302483 to deal with issues seen with 3rd party tools/libraries where a ZipException may be encountered when opening select APK, ZIP or JAR files. Please see refer to the links provided at the end the description for the more information ::

ZipException: Invalid CEN header (invalid zip64 extra data field size)

  1. Extra field includes padding :
----------------#1--------------------
[Central Directory Header]
  0x3374: Signature    : 0x02014b50
  0x3378: Created Zip Spec :    0xa [1.0]
  0x3379: Created OS   :    0x0 [MS-DOS]
  0x337a: VerMadeby    :    0xa [0, 1.0]
  0x337b: VerExtract   :    0xa [1.0]
  0x337c: Flag      :   0x800
  0x337e: Method     :    0x0 [STORED]
  0x3380: Last Mod Time  : 0x385ca437 [Thu Feb 28 20:33:46 EST 2008]
  0x3384: CRC       : 0x694c6952
  0x3388: Compressed Size :   0x624
  0x338c: Uncompressed Size:   0x624
  0x3390: Name Length   :   0x1b
 0x3392: Extra Length  :    0x7
		[tag=0xcafe, sz=0, data= ]
				->[tag=cafe, size=0]
  0x3394: Comment Length :    0x0
  0x3396: Disk Start   :    0x0
  0x3398: Attrs      :    0x0
  0x339a: AttrsEx     :    0x0
  0x339e: Loc Header Offset:    0x0
  0x33a2: File Name    : res/drawable/size_48x48.jpg

The extra field tag of 0xcafe has its data size set to 0. and the extra length is 7. It is expected that you can use the tag's data size to traverse the extra fields.

  1. The BND tool added problematic data to the extra field:
----------------#359--------------------
[Central Directory Header]
   0x600b4: Signature        : 0x02014b50
   0x600b8: Created Zip Spec :       0x14 [2.0]
   0x600b9: Created OS       :        0x0 [MS-DOS]
   0x600ba: VerMadeby        :       0x14 [0, 2.0]
   0x600bb: VerExtract       :       0x14 [2.0]
   0x600bc: Flag             :      0x808
   0x600be: Method           :        0x8 [DEFLATED]
   0x600c0: Last Mod Time    : 0x2e418983 [Sat Feb 01 17:12:06 EST 2003]
   0x600c4: CRC              : 0xd8f689cb
   0x600c8: Compressed Size  :      0x23e
   0x600cc: Uncompressed Size:      0x392
   0x600d0: Name Length      :       0x20
   0x600d2: Extra Length     :        0x8
		[tag=0xbfef, sz=61373, data=        
  0x600d4: Comment Length   :        0x0
   0x600d6: Disk Start       :        0x0
   0x600d8: Attrs            :        0x0
   0x600da: AttrsEx          :        0x0
   0x600de: Loc Header Offset:    0x4f2fe
   0x600e2: File Name        : net/n3/nanoxml/CDATAReader.class

In the above example, the extra length is 0x8 and the tag size is 61373 which exceeds the extra length.

zip -T would also report an error:

zip -T foo.jar
net/n3/nanoxml/CDATAReader.class bad extra-field entry:
EF block length (61373 bytes) exceeds remaining EF data (4 bytes)
test of foo.jar FAILED

  1. Some releases of Ant and commons-compress create CEN Zip64 extra headers with a size of 0 when Zip64 mode is required :
----------------#63--------------------
[Central Directory Header]
  0x2fded9: Signature        : 0x02014b50
  0x2fdedd: Created Zip Spec :       0x2d [4.5]
  0x2fdede: Created OS       :        0x3 [UNIX]
  0x2fdedf: VerMadeby        :      0x32d [3, 4.5]
  0x2fdee0: VerExtract       :       0x2d [4.5]
  0x2fdee1: Flag             :      0x800
  0x2fdee3: Method           :        0x8 [DEFLATED]
  0x2fdee5: Last Mod Time    : 0x43516617 [Thu Oct 17 12:48:46 EDT 2013]
  0x2fdee9: CRC              :        0x0
  0x2fdeed: Compressed Size  :        0x2
  0x2fdef1: Uncompressed Size:        0x0
  0x2fdef5: Name Length      :       0x22
  0x2fdef7: Extra Length     :        0x4
       [tag=0x0001, sz=0, data= ]
         ->ZIP64: 
  0x2fdef9: Comment Length   :        0x0
  0x2fdefb: Disk Start       :        0x0
  0x2fdefd: Attrs            :        0x0
  0x2fdeff: AttrsEx          : 0x81a40000
  0x2fdf03: Loc Header Offset:     0x1440
  0x2fdf07: File Name        : .xdk_version_12.1.0.2.0_production

[Local File Header]
    0x1440: Signature   :   0x04034b50
    0x1444: Version     :         0x2d    [4.5]
    0x1446: Flag        :        0x800
    0x1448: Method      :          0x8    [DEFLATED]
    0x144a: LastMTime   :   0x43516617    [Thu Oct 17 12:48:46 EDT 2013]
    0x144e: CRC         :          0x0
    0x1452: CSize       :   0xffffffff
    0x1456: Size        :   0xffffffff
    0x145a: Name Length :         0x22    [.xdk_version_12.1.0.2.0_production]
    0x145c: ExtraLength :         0x14
       [tag=0x0001, sz=16, data= 00 00 00 00 00 00 00 00 02 00 00 00 00 00 00 00 ]
           ->ZIP64: size *0x0 csize *0x2 *0x2d04034b500003 
    0x145e: File Name  : [.xdk_version_12.1.0.2.0_production]

Notice the CEN Extra length differs for the same tag in the LOC.

As we are validating the Zip64 extra fields, we are not expecting the data size to be 0.

Mach5 tiers 1-6 and the relevant JCK tests continue to pass with the above changes.

The following 3rd party tools have (or have pending) fixes to address the issues highlighted above:


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8313765: Invalid CEN header (invalid zip64 extra data field size) (Bug - P2)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/15273/head:pull/15273
$ git checkout pull/15273

Update a local copy of the PR:
$ git checkout pull/15273
$ git pull https://git.openjdk.org/jdk.git pull/15273/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 15273

View PR using the GUI difftool:
$ git pr show -t 15273

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/15273.diff

Webrev

Link to Webrev Comment

@LanceAndersen LanceAndersen changed the title Fix for JDK-8313765 8313765: Invalid CEN header (invalid zip64 extra data field size) Aug 14, 2023
@bridgekeeper
Copy link

bridgekeeper bot commented Aug 14, 2023

👋 Welcome back lancea! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Aug 14, 2023

@LanceAndersen The following labels will be automatically applied to this pull request:

  • core-libs
  • nio

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added nio nio-dev@openjdk.org core-libs core-libs-dev@openjdk.org labels Aug 14, 2023
@LanceAndersen LanceAndersen marked this pull request as ready for review August 14, 2023 15:52
@openjdk openjdk bot added the rfr Pull request is ready for review label Aug 14, 2023
@mlbridge
Copy link

mlbridge bot commented Aug 14, 2023

@shipilev
Copy link
Member

Please merge from master to get clean GHA runs.

@AlanBateman
Copy link
Contributor

It's unfortunate that there are tools and plugins in the eco system that have these issues. I think you've got the right balance here, meaning tolerating a zip64 extra block with a block size of 0 and rejecting corrupted extra blocks added by older versions of the BND plugin.

Copy link
Member

@simonis simonis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Lance,
In general it looks good, but I have some suggestion that I think could slightly improve the patch.

@@ -1342,14 +1361,15 @@ private static boolean isZip64ExtBlockSizeValid(int blockSize) {
/*
* As the fields must appear in order, the block size indicates which
* fields to expect:
* 0 - May be written out by Ant and Apache Commons Compress Library
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like that isZip64ExtBlockSizeValid() still accepts 0 as valid input. I think we should fully handle the zero case in checkZip64ExtraFieldValues() (also see my comments there).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Volker,

I understand your point and I had done that previously but decided I did not like the flow of the code that way which is why I moved the check. I prefer to leave it as is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a question of "taste" because isZip64ExtBlockSizeValid() suggests that the method will check for valid sizes and to my understanding 0 is not a valid input. This method might also be called from other places in the future which do not handle the zero case appropriately.

In any case, I'm ready to accept this as a case of "Disagree and Commit" :) but in that case please update at least the comment below to something like "..Note we do not need to check blockSize is >= 8 as we know its length is at least 8 by now" because "..from the call to isZip64ExtBlockSizeValid()" isn't true any more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree with Volker that it would be better if isZip64ExtBlockSizeValid continued to return false for block size 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I have made the suggest change that you both prefer.

Thank you for your input

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also happy to see isZip64ExtBlockSizeValid rejecting 0. This logic could be useful when implementing support for valid Zip64 fields for small entries in ZipInputStream, like #12524 attempted to do. (The PR was closed by the bots in the end).

I guess this method could be moved to ZipUtils if JDK-8303866 is ever implemented.

@@ -1307,6 +1317,15 @@ private void checkZip64ExtraFieldValues(int off, int blockSize, long csize,
if (!isZip64ExtBlockSizeValid(blockSize)) {
zerror("Invalid CEN header (invalid zip64 extra data field size)");
}
// if ZIP64_EXTID blocksize == 0, validate csize and size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you put this block in front of the call to isZip64ExtBlockSizeValid() we don't have to handle the blockSize == 0 case in isZip64ExtBlockSizeValid().

This will also make the following comment true again:

            // Note we do not need to check blockSize is >= 8 as
            // we know its length is at least 8 from the call to
            // isZip64ExtBlockSizeValid()

}
switch (tag) {
case EXTID_ZIP64 :
// Check to see if we have a valid block size
if (!isZip64ExtBlockSizeValid(sz)) {
throw new ZipException("Invalid CEN header (invalid zip64 extra data field size)");
}
// if ZIP64_EXTID blocksize == 0, validate csize, size and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Just put this block before the call to isZip64ExtBlockSizeValid() than you don't have to handle the sz == 0 case there.

* 8 - uncompressed size
* 16 - uncompressed size, compressed size
* 24 - uncompressed size, compressed sise, LOC Header offset
* 28 - uncompressed size, compressed sise, LOC Header offset,
* and Disk start number
*/
return switch(blockSize) {
case 8, 16, 24, 28 -> true;
case 0, 8, 16, 24, 28 -> true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need to handle the zero case here if you rearrange the code in readExtra() as suggested above.

@mrserb
Copy link
Member

mrserb commented Aug 14, 2023

It's unfortunate that there are tools and plugins in the eco system that have these issues. I think you've got the right balance here, meaning tolerating a zip64 extra block with a block size of 0 and rejecting corrupted extra blocks added by older versions of the BND plugin.

I think I already asked this question, but it disappeared in the latest PR: Why our code has an assumption that the extended block has some kind of limitation of the size, like 9,16,24,28, there are no such limitations in the zip specification:
https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT

     4.5.3 -Zip64 Extended Information Extra Field (0x0001):

      The following is the layout of the zip64 extended 
      information "extra" block. If one of the size or
      offset fields in the Local or Central directory
      record is too small to hold the required data,
      a Zip64 extended information record is created.
      The order of the fields in the zip64 extended 
      information record is fixed, but the fields MUST
      only appear if the corresponding Local or Central
      directory record field is set to 0xFFFF or 0xFFFFFFFF.

      Note: all fields stored in Intel low-byte/high-byte order.

        Value      Size       Description
        -----      ----       -----------
(ZIP64) 0x0001     2 bytes    Tag for this "extra" block type
        Size       2 bytes    Size of this "extra" block
        Original 
        Size       8 bytes    Original uncompressed file size
        Compressed
        Size       8 bytes    Size of compressed data
        Relative Header
        Offset     8 bytes    Offset of local header record
        Disk Start
        Number     4 bytes    Number of the disk on which
                              this file starts 

      This entry in the Local header MUST include BOTH original
      and compressed file size fields. If encrypting the 

It probably comes from the Wiki page: https://en.wikipedia.org/wiki/ZIP_(file_format) but it is not a spec.

Note the spec also says that an extended block should be created at least in this case

     " size or
      offset fields in the Local or Central directory
      record is too small to hold the required data,
      a Zip64 extended information record is created."

It does not say that the block cannot be empty or have any other size if all fields in the body of the zip file are correct/valid.

For example, take a look at the code in the ZipEntry where we accept any size of that block and just checked that it has required data in it.

throw new ZipException("Invalid CEN header (invalid zip64 extra data field size)");
}
break;
}
if (size == ZIP64_MINVAL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we always increase "pos" only in case of "_MINVAL". If the values of size and csize are correct/valid in the "body" of the zip file and only locoff is negative then we should skip two fields in the extra block and read the third one. Otherwise, we may read some random values and throw an exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I (quite) understand your question completely..

How ZIpFS::readExtra has navigated these fields has not changed

If you have a tool that creates a zip/jar that demonstrates an issue that might need further examination, please provide a test case, the tool that created the zip/jar in question and open a new bug.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 8302483 changed this code to throw an exception, this is why I am looking into it.
You can compare the code in this file and the same code in the ZipFile in the checkZip64ExtraFieldValues method or the code in the ZipEntry#setExtra0, where we do not increase the "off" but instead checks for "off+8" or "off + 16". So if we need to read only the third field we should read "pos+16" but for the current implementation we will read it at "pos+0" since the pos was not bumped by the code for two other fields.

fmt.format("%n };%n");
return sb.toString();
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No newline at end of the file.

@LanceAndersen
Copy link
Contributor Author

It's unfortunate that there are tools and plugins in the eco system that have these issues. I think you've got the right balance here, meaning tolerating a zip64 extra block with a block size of 0 and rejecting corrupted extra blocks added by older versions of the BND plugin.

I think I already asked this question, but it disappeared in the latest PR: Why our code has an assumption that the extended block has some kind of limitation of the size, like 9,16,24,28, there are no such limitations in the zip specification: https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT

     4.5.3 -Zip64 Extended Information Extra Field (0x0001):

      The following is the layout of the zip64 extended 
      information "extra" block. If one of the size or
      offset fields in the Local or Central directory
      record is too small to hold the required data,
      a Zip64 extended information record is created.
      The order of the fields in the zip64 extended 
      information record is fixed, but the fields MUST
      only appear if the corresponding Local or Central
      directory record field is set to 0xFFFF or 0xFFFFFFFF.

      Note: all fields stored in Intel low-byte/high-byte order.

        Value      Size       Description
        -----      ----       -----------
(ZIP64) 0x0001     2 bytes    Tag for this "extra" block type
        Size       2 bytes    Size of this "extra" block
        Original 
        Size       8 bytes    Original uncompressed file size
        Compressed
        Size       8 bytes    Size of compressed data
        Relative Header
        Offset     8 bytes    Offset of local header record
        Disk Start
        Number     4 bytes    Number of the disk on which
                              this file starts 

      This entry in the Local header MUST include BOTH original
      and compressed file size fields. If encrypting the 

It probably comes from the Wiki page: https://en.wikipedia.org/wiki/ZIP_(file_format) but it is not a spec.

Note the spec also says that an extended block should be created at least in this case

     " size or
      offset fields in the Local or Central directory
      record is too small to hold the required data,
      a Zip64 extended information record is created."

It does not say that the block cannot be empty or have any other size if all fields in the body of the zip file are correct/valid.

I am not understanding your point. There is a specific order for the Zip64 fields based on which fields have the Magic value. the spec does also not suggest that an empty Zip64 extra field can be written to the CEN when there is a Zip64 with data written to the LOC.

If you have a zip which demonstrates an issue not addressed, Please provide an a test case, with the tool created the zip and it be can looked at.

@mrserb
Copy link
Member

mrserb commented Aug 14, 2023

I am not understanding your point. There is a specific order for the Zip64 fields based on which fields have the Magic value. the spec does also not suggest that an empty Zip64 extra field can be written to the CEN when there is a Zip64 with data written to the LOC.

Yes, there is a specific order of fields that should be stored in the extended block if some of the data in the "body" is negative. But as you pointed out in this case the empty block or block bigger than necessary to store the size/csize/locoff is not prohibited by the spec. For example, take a look at the code in the ZipEntry where we accept any size of that block and just checked that it has required data in it.

If you disagree then point to the part of the spec which blocks such sizes.

@mrserb
Copy link
Member

mrserb commented Aug 14, 2023

I am not understanding your point. There is a specific order for the Zip64 fields based on which fields have the Magic value. the spec does also not suggest that an empty Zip64 extra field can be written to the CEN when there is a Zip64 with data written to the LOC.

Yes, there is a specific order of fields that should be stored in the extended block if some of the data in the "body" is negative. But as you pointed out in this case the empty block or block bigger than necessary to store the size/csize/locoff is not prohibited by the spec. For example, take a look at the code in the ZipEntry where we accept any size of that block and just checked that it has required data in it.

If you disagree then point to the part of the spec which blocks such sizes.

This is how it is implemented by the "unzip"
https://github.com/madler/zlib/blob/04f42ceca40f73e2978b50e93806c2a18c1281fc/contrib/minizip/unzip.c#L1035C68-L1035C76 , the dataSize is accepted as is.

@simonis
Copy link
Member

simonis commented Aug 14, 2023

There's one final thing I want to mention. Your current test cases (i.e. the corrupted zip-files in ReadNonStandardExtraHeadersTest.) only reproduce the problem with ZipFile but not with ZipFileSystem. That's because of a slightly different logic in ZipFileSystem$Entry::readExtra() because of which the error will only be triggered if the ZIP64_EXTID Extra Block will be followed by another Extra Block. So we need a Zip file with a 0-length ZIP64_EXTID Extra Block followed by another Extra Block in order to trigger the issue.

I've therefore attached a zip file (ZeroLengthZIP64EXTID.zip) which was created by ANT 1.10.7 from a directory containing a single, zero-length file and the following build.xml file:

<project name="ZipFiles" default="zip">
    <target name="zip">
        <zip destfile="/tmp/output.zip" zip64Mode="always" createUnicodeExtraFields="always">
            <fileset dir="/tmp/testfiles" includes="**/*" />
        </zip>
    </target>
</project>
  • zip64Mode="always" is required in order to produce a ZIP64_EXTID Extra Field with zero length (which triggers the problem).
  • createUnicodeExtraFields="always" is required in order to produce a second Extra Field after ZIP64_EXTID.

I think it would be good if you could add that as test case as well.
ZeroLengthZIP64EXTID.zip

@LanceAndersen
Copy link
Contributor Author

I am not understanding your point. There is a specific order for the Zip64 fields based on which fields have the Magic value. the spec does also not suggest that an empty Zip64 extra field can be written to the CEN when there is a Zip64 with data written to the LOC.

Yes, there is a specific order of fields that should be stored in the extended block if some of the data in the "body" is negative. But as you pointed out in this case the empty block or block bigger than necessary to store the size/csize/locoff is not prohibited by the spec. For example, take a look at the code in the ZipEntry where we accept any size of that block and just checked that it has required data in it.
If you disagree then point to the part of the spec which blocks such sizes.

This is how it is implemented by the "unzip" https://github.com/madler/zlib/blob/04f42ceca40f73e2978b50e93806c2a18c1281fc/contrib/minizip/unzip.c#L1035C68-L1035C76 , the dataSize is accepted as is.

4.6.2 Third-party Extra Fields MUST include a Header ID using
the format defined in the section of this document
titled Extensible Data Fields (section 4.5).

The Data Size field indicates the size of the following
data block. Programs can use this value to skip to the
next header block, passing over any data blocks that are
not of interest.

Zip -T would also report errors with a BND modified jar:

zip -T bad.jar

net/n3/nanoxml/CDATAReader.class bad extra-field entry:
EF block length (61373 bytes) exceeds remaining EF data (4 bytes)
test of bad.jar FAILED

zip error: Zip file invalid, could not spawn unzip, or wrong unzip (original files unmodified)

zipdetails would also fail with the above jar

@mrserb
Copy link
Member

mrserb commented Aug 14, 2023

net/n3/nanoxml/CDATAReader.class bad extra-field entry:
EF block length (61373 bytes) exceeds remaining EF data (4 bytes)
test of bad.jar FAILED
zip error: Zip file invalid, could not spawn unzip, or wrong unzip (original files unmodified)

zipdetails would also fail with the above jar

It seems that error " EF block length (30837 bytes) exceeds remaining EF data" caused by the fact the size was too big for the actual zipfile, which I think is a different issue, but you can try to unzip that file, and you will get a result w/o errors. unzip implementation is linked above.

// Create the Zip file to read
Files.write(VALID_APK, VALID_APK_FILE);
Files.write(VALID_APACHE_COMPRESS_JAR, COMMONS_COMPRESS_JAR);
Files.write(VALID_ANT_JAR, ANT_ZIP64_UNICODE_EXTRA_JAR);
Files.write(VALID_ANT_JAR, ANT_ZIP64_UNICODE_EXTRA_ZIP);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably read VALID_ANT_ZIP instead of VALID_ANT_JAR.

}

/**
* Zip and Jars files to validate we can open
*/
private static Stream<Path> zipFilesToTest() {
return Stream.of(VALID_APK, VALID_APACHE_COMPRESS_JAR);
return Stream.of(VALID_APK, VALID_APACHE_COMPRESS_JAR, VALID_ANT_JAR);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here you probably want to add VALID_ANT_ZIP in addition to VALID_ANT_JAR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, already caught that typo, forgot to save before I committed :-)

@mrserb
Copy link
Member

mrserb commented Aug 15, 2023

TEST.zip

try this example, zip -T passed, unzip works fine, but openjdk rejects it.

Copy link
Member

@simonis simonis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing the additional changes. This looks good to me now.

@openjdk
Copy link

openjdk bot commented Aug 15, 2023

@LanceAndersen This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8313765: Invalid CEN header (invalid zip64 extra data field size)

Reviewed-by: simonis, alanb, coffeys

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 33 new commits pushed to the master branch:

  • b80001d: 8314209: Wrong @SInCE tag for RandomGenerator::equiDoubles
  • ef6db5c: 8314211: Add NativeLibraryUnload event
  • 49ddb19: 8313760: [REDO] Enhance AES performance
  • d46f0fb: 8313720: C2 SuperWord: wrong result with -XX:+UseVectorCmov -XX:+UseCMoveUnconditionally
  • 38687f1: 8314262: GHA: Cut down cross-compilation sysroots deeper
  • a602624: 8314020: Print instruction blocks in byte units
  • 0b12480: 8314233: C2: assert(assertion_predicate_has_loop_opaque_node(iff)) failed: unexpected
  • e1fdef5: 8314324: "8311557: [JVMCI] deadlock with JVMTI thread suspension" causes various failures
  • 2bd2fae: 4346610: Adding JSeparator to JToolBar "pushes" buttons added after separator to edge
  • 6a15860: 8314163: os::print_hex_dump prints incorrectly for big endian platforms and unit sizes larger than 1
  • ... and 23 more: https://git.openjdk.org/jdk/compare/4b2703ad39f8160264eb30c797824cc93a6b56e2...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Aug 15, 2023
@simonis
Copy link
Member

simonis commented Aug 15, 2023

Other than that there are no limitation on the size of extended block, it could be 0, 20, 100 , etc. But it should contain correct data if necessary and should not be larger than the surrounding "chunk".

This seems to be a very "free" interpretation of the specification to me. According to my understanding, the valid sizes of 8, 16, 24 or 28 as described in the Wikipedia article are a direct consequence of the specification which only allows for a fixed set of entries in the ZIP64 Extra Field. Already the zero-length case is questionable because a ZIP64 Extra Field should only be created if required, however we have to handle it here for backward compatibility reasons.

// size, and locoff to make sure the fields != ZIP64_MAGICVAL
if (sz == 0) {
if ( csize == ZIP64_MINVAL || size == ZIP64_MINVAL ||
locoff == ZIP64_MINVAL) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit but you can drop the space in "( csize)" and put the third condition on L3099 to make it easier to read.

For the comment, it looks like it is missing a comma after "== 0". Either that or change it to start with "Some older version of Apache Ant and Apache Commons ...".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the latest update. Thank you!

@mrserb
Copy link
Member

mrserb commented Aug 15, 2023

This seems to be a very "free" interpretation of the specification to me. According to my understanding, the valid sizes of 8, 16, 24 or 28 as described in the Wikipedia article are a direct consequence of the specification.

I have provided a test.zip file above which passed the zip integrity test via "zip -T" and can be unzip w/o errors, but rejected by the openjdk. That zip was created based on the actual specification, and not on the wiki.

@simonis
Copy link
Member

simonis commented Aug 15, 2023

I have provided a test.zip file above which passed the zip integrity test via "zip -T" and can be unzip w/o errors, but rejected by the openjdk. That zip was created based on the actual specification, and not on the wiki.

Did you create that zip file manually or was it created by a tool and if by a tool than which one? I think we must differentiate here between functional compatibility with a tool like "zip", compatibility with a specification and the compatibility with existing zip files and zip files created by common tools.

The latter is important and required in order to avoid regressions (and I think that's exactly what we're fixing with this PR). Compatibility with a specification is great as long as it doesn't collide with the previous point. Behavioral compatibility with a tool like "zip" is the least important in this list and I think as long as the file in question is not an artifact commonly created by popular tools it is fine to behave different for edge cases.

@mrserb
Copy link
Member

mrserb commented Aug 15, 2023

Did you create that zip file manually or was it created by a tool and if by a tool than which one? I think we must differentiate here between functional compatibility with a tool like "zip", compatibility with a specification and the compatibility with existing zip files and zip files created by common tools.

That was created manually and then repacked by the zip.

The latter is important and required in order to avoid regressions (and I think that's exactly what we're fixing with this PR). Compatibility with a specification is great as long as it doesn't collide with the previous point. Behavioral compatibility with a tool like "zip" is the least important in this list and I think as long as the file in question is not an artifact commonly created by popular tools it is fine to behave different for edge cases.

That file is accepted by zip, by the latest JDK8u382, by the JDK20 GA, and rejected by the 20.0.2. That is a regression in the latest update of JDK11+ which we trying to solve here.

@simonis
Copy link
Member

simonis commented Aug 15, 2023

That was created manually and then repacked by the zip.

That file is accepted by zip, by the latest JDK8u382, by the JDK20 GA, and rejected by the 20.0.2. That is a regression in the latest update of JDK11+ which we trying to solve here.

In my opinion we should resolve the regression for existing zip files and zip files which are commonly created by popular tools.

As far as I understand you can manually create "artificial" zip files which can be processed by the zip tool and previous versions of the JDK but not by new ones. As long as these kind of files aren't automatically generated by common tools, I don't see that as a real regression. I'm not even sure if we should fix that at all because hardly anybody is manually creating such zip files except maybe for attackers who intend to break the JDK.

I recommend we should instead fix the real problem as quickly as possible and create a new issue for potential additional improvements if you think that's necessary.

@mrserb
Copy link
Member

mrserb commented Aug 16, 2023

As far as I understand you can manually create "artificial" zip files which can be processed by the zip tool and previous versions of the JDK but not by new ones.

It can be processed by the new/latest version of JDK8.

As long as these kind of files aren't automatically generated by common tools, I don't see that as a real regression.

It is clearly a regression. All that new checks should be proved to be based on some statement from the specification otherwise - such checks should be changed or deleted. As of now the strict check of the size does not based on the spec nor the behavior of the zip cmd.

@mrserb
Copy link
Member

mrserb commented Aug 16, 2023

My overall point is that it will be unfortunate if users will be able to open some files on Linux/macOS/Windows using default programs but will not be able to do that using Java.

Copy link
Contributor

@AlanBateman AlanBateman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest changes looks okay.

@AlanBateman
Copy link
Contributor

AlanBateman commented Aug 16, 2023

That file is accepted by zip, by the latest JDK8u382, by the JDK20 GA, and rejected by the 20.0.2. That is a regression in the latest update of JDK11+ which we trying to solve here.

@mrserb Have you tested your ZIP file with -Djdk.util.zip.disableZip64ExtraFieldValidation=true? That's the system property to disable the additional checking and is the "get out of jail card" for anyone running into issues. As always with changes like this, or other changes that tighten up checking, there is a risk that it will break something, hence the system property to give existing deployments a workaround to continue. In this case, the original change exposed an issue with a number of Apache projects (see the linked bugs in their issue trackers) and a bad bug in the BND tool that was fixed a few years ago. The system property is the temporary workaround until the deployment has versions of the libraries produced with updated versions of these tools, or a JDK update that tolerates a 0 block size.

I think the main lesson with all this is that the JDK doesn't have enough "interop" tests in this area. There are dozens of tools and plugins that generate their own ZIP or JAR files. The addition of the ZIP64 extensions a few number of years ago ushered in a lot of interop issues due to different interpretations of the spec. The changes in PR expands the tests a bit but I think a follow on work will be required to massively expand the number of sample ZIP and JAR files that the JDK is tested with.

@coffeys
Copy link
Contributor

coffeys commented Aug 16, 2023

nice work Lance, thanks for the comprehensive write up also.

@mrserb
Copy link
Member

mrserb commented Aug 16, 2023

@mrserb Have you tested your ZIP file with -Djdk.util.zip.disableZip64ExtraFieldValidation=true? That's the system property to disable the additional checking and is the "get out of jail card" for anyone running into issues. As always with changes like this, or other changes that tighten up checking, there is a risk that it will break something, hence the system property to give existing deployments a workaround to continue. In this case, the original change exposed an issue with a number of Apache projects (see the linked bugs in their issue trackers) and a bad bug in the BND tool that was fixed a few years ago. The system property is the temporary workaround until the deployment has versions of the libraries produced with updated versions of these tools, or a JDK update that tolerates a 0 block size.

I disagree for a few reasons, using that property will completely disable the appropriate patch for a fix in the CPU, and it will be possible to have/accept some malicious zip files which may trigger some unfortunate behavior. That is not what we would like to recommend doing. Validation of the negative values is much more important.

  • The bug fixed by the BND was clearly a bug when some "random" value was used as the size of the component which was unrelated to the size of the chunk nor the size of the zip file.
  • The bug we discussed here related to the size of the block which is properly set, for some reason an additional validation was added for it, and it is still not mentioned from where that validation has come, there is no such thing in the spec nor in the behavior of the common tools such as zip/unzip, Windows Explorer, macOS Archive Utility and it passed integrity test. So why these checks are forced so hard?

@AlanBateman
Copy link
Contributor

I disagree for a few reasons, using that property will completely disable the appropriate patch for a fix in the CPU, and it will be possible to have/accept some malicious zip files which may trigger some unfortunate behavior. That is not what we would like to recommend doing. Validation of the negative values is much more important.

Changes that introduce new checks or dial up validation are often risky changes. The JDK has a long history of introducing such changes with a system property or some means to temporarily disable the stricter checking, at least when the spec allows it. You may disagree with this long standing practice but it is a necessary evil to give a temporary workaround for environments that might need a bit of time to fix something after a JDK upgrade. There is of course risk in that but I don't think we can get into that discussion here.

As I think has already been said, we can't engage with you in this PR on the reasons why additional checking was added in a security update.

@LanceAndersen
Copy link
Contributor Author

/integrate

@openjdk
Copy link

openjdk bot commented Aug 16, 2023

Going to push as commit 13f6450.
Since your change was applied there have been 35 commits pushed to the master branch:

  • 24e896d: 8310275: Bug in assignment operator of ReservedMemoryRegion
  • 1925508: 8314144: gc/g1/ihop/TestIHOPStatic.java fails due to extra concurrent mark with -Xcomp
  • b80001d: 8314209: Wrong @SInCE tag for RandomGenerator::equiDoubles
  • ef6db5c: 8314211: Add NativeLibraryUnload event
  • 49ddb19: 8313760: [REDO] Enhance AES performance
  • d46f0fb: 8313720: C2 SuperWord: wrong result with -XX:+UseVectorCmov -XX:+UseCMoveUnconditionally
  • 38687f1: 8314262: GHA: Cut down cross-compilation sysroots deeper
  • a602624: 8314020: Print instruction blocks in byte units
  • 0b12480: 8314233: C2: assert(assertion_predicate_has_loop_opaque_node(iff)) failed: unexpected
  • e1fdef5: 8314324: "8311557: [JVMCI] deadlock with JVMTI thread suspension" causes various failures
  • ... and 25 more: https://git.openjdk.org/jdk/compare/4b2703ad39f8160264eb30c797824cc93a6b56e2...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Aug 16, 2023
@openjdk openjdk bot closed this Aug 16, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Aug 16, 2023
@openjdk
Copy link

openjdk bot commented Aug 16, 2023

@LanceAndersen Pushed as commit 13f6450.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@mrserb
Copy link
Member

mrserb commented Aug 16, 2023

As I think has already been said, we can't engage with you in this PR on the reasons why additional checking was added in a security update.

I think you have an assumption that this check for exact size(8/16/24) bytes are related to the change fixed by the security update, I am pretty sure that's the wrong assumption.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org integrated Pull request has been integrated nio nio-dev@openjdk.org
7 participants