-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8313765: Invalid CEN header (invalid zip64 extra data field size) #15273
8313765: Invalid CEN header (invalid zip64 extra data field size) #15273
Conversation
👋 Welcome back lancea! A progress list of the required criteria for merging this PR into |
@LanceAndersen The following labels will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
Please merge from master to get clean GHA runs. |
It's unfortunate that there are tools and plugins in the eco system that have these issues. I think you've got the right balance here, meaning tolerating a zip64 extra block with a block size of 0 and rejecting corrupted extra blocks added by older versions of the BND plugin. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Lance,
In general it looks good, but I have some suggestion that I think could slightly improve the patch.
@@ -1342,14 +1361,15 @@ private static boolean isZip64ExtBlockSizeValid(int blockSize) { | |||
/* | |||
* As the fields must appear in order, the block size indicates which | |||
* fields to expect: | |||
* 0 - May be written out by Ant and Apache Commons Compress Library |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like that isZip64ExtBlockSizeValid()
still accepts 0
as valid input. I think we should fully handle the zero case in checkZip64ExtraFieldValues()
(also see my comments there).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Volker,
I understand your point and I had done that previously but decided I did not like the flow of the code that way which is why I moved the check. I prefer to leave it as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a question of "taste" because isZip64ExtBlockSizeValid()
suggests that the method will check for valid sizes and to my understanding 0
is not a valid input. This method might also be called from other places in the future which do not handle the zero case appropriately.
In any case, I'm ready to accept this as a case of "Disagree and Commit" :) but in that case please update at least the comment below to something like "..Note we do not need to check blockSize is >= 8 as we know its length is at least 8 by now" because "..from the call to isZip64ExtBlockSizeValid()" isn't true any more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I agree with Volker that it would be better if isZip64ExtBlockSizeValid continued to return false for block size 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I have made the suggest change that you both prefer.
Thank you for your input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also happy to see isZip64ExtBlockSizeValid
rejecting 0. This logic could be useful when implementing support for valid Zip64 fields for small entries in ZipInputStream, like #12524 attempted to do. (The PR was closed by the bots in the end).
I guess this method could be moved to ZipUtils if JDK-8303866 is ever implemented.
@@ -1307,6 +1317,15 @@ private void checkZip64ExtraFieldValues(int off, int blockSize, long csize, | |||
if (!isZip64ExtBlockSizeValid(blockSize)) { | |||
zerror("Invalid CEN header (invalid zip64 extra data field size)"); | |||
} | |||
// if ZIP64_EXTID blocksize == 0, validate csize and size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you put this block in front of the call to isZip64ExtBlockSizeValid()
we don't have to handle the blockSize == 0
case in isZip64ExtBlockSizeValid()
.
This will also make the following comment true again:
// Note we do not need to check blockSize is >= 8 as
// we know its length is at least 8 from the call to
// isZip64ExtBlockSizeValid()
} | ||
switch (tag) { | ||
case EXTID_ZIP64 : | ||
// Check to see if we have a valid block size | ||
if (!isZip64ExtBlockSizeValid(sz)) { | ||
throw new ZipException("Invalid CEN header (invalid zip64 extra data field size)"); | ||
} | ||
// if ZIP64_EXTID blocksize == 0, validate csize, size and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. Just put this block before the call to isZip64ExtBlockSizeValid() than you don't have to handle the sz == 0
case there.
* 8 - uncompressed size | ||
* 16 - uncompressed size, compressed size | ||
* 24 - uncompressed size, compressed sise, LOC Header offset | ||
* 28 - uncompressed size, compressed sise, LOC Header offset, | ||
* and Disk start number | ||
*/ | ||
return switch(blockSize) { | ||
case 8, 16, 24, 28 -> true; | ||
case 0, 8, 16, 24, 28 -> true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need to handle the zero case here if you rearrange the code in readExtra()
as suggested above.
I think I already asked this question, but it disappeared in the latest PR: Why our code has an assumption that the extended block has some kind of limitation of the size, like 9,16,24,28, there are no such limitations in the zip specification:
It probably comes from the Wiki page: https://en.wikipedia.org/wiki/ZIP_(file_format) but it is not a spec. Note the spec also says that an extended block should be created at least in this case
It does not say that the block cannot be empty or have any other size if all fields in the body of the zip file are correct/valid. For example, take a look at the code in the ZipEntry where we accept any size of that block and just checked that it has required data in it. |
throw new ZipException("Invalid CEN header (invalid zip64 extra data field size)"); | ||
} | ||
break; | ||
} | ||
if (size == ZIP64_MINVAL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that we always increase "pos" only in case of "_MINVAL". If the values of size and csize are correct/valid in the "body" of the zip file and only locoff is negative then we should skip two fields in the extra block and read the third one. Otherwise, we may read some random values and throw an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I (quite) understand your question completely..
How ZIpFS::readExtra has navigated these fields has not changed
If you have a tool that creates a zip/jar that demonstrates an issue that might need further examination, please provide a test case, the tool that created the zip/jar in question and open a new bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 8302483 changed this code to throw an exception, this is why I am looking into it.
You can compare the code in this file and the same code in the ZipFile in the checkZip64ExtraFieldValues method or the code in the ZipEntry#setExtra0, where we do not increase the "off" but instead checks for "off+8" or "off + 16". So if we need to read only the third field we should read "pos+16" but for the current implementation we will read it at "pos+0" since the pos was not bumped by the code for two other fields.
fmt.format("%n };%n"); | ||
return sb.toString(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No newline at end of the file.
I am not understanding your point. There is a specific order for the Zip64 fields based on which fields have the Magic value. the spec does also not suggest that an empty Zip64 extra field can be written to the CEN when there is a Zip64 with data written to the LOC. If you have a zip which demonstrates an issue not addressed, Please provide an a test case, with the tool created the zip and it be can looked at. |
Yes, there is a specific order of fields that should be stored in the extended block if some of the data in the "body" is negative. But as you pointed out in this case the empty block or block bigger than necessary to store the size/csize/locoff is not prohibited by the spec. For example, take a look at the code in the ZipEntry where we accept any size of that block and just checked that it has required data in it. If you disagree then point to the part of the spec which blocks such sizes. |
This is how it is implemented by the "unzip" |
There's one final thing I want to mention. Your current test cases (i.e. the corrupted zip-files in I've therefore attached a zip file (
I think it would be good if you could add that as test case as well. |
Zip -T would also report errors with a BND modified jar: zip -T bad.jar
zipdetails would also fail with the above jar |
…a header with a data size
It seems that error " EF block length (30837 bytes) exceeds remaining EF data" caused by the fact the size was too big for the actual zipfile, which I think is a different issue, but you can try to unzip that file, and you will get a result w/o errors. unzip implementation is linked above. |
// Create the Zip file to read | ||
Files.write(VALID_APK, VALID_APK_FILE); | ||
Files.write(VALID_APACHE_COMPRESS_JAR, COMMONS_COMPRESS_JAR); | ||
Files.write(VALID_ANT_JAR, ANT_ZIP64_UNICODE_EXTRA_JAR); | ||
Files.write(VALID_ANT_JAR, ANT_ZIP64_UNICODE_EXTRA_ZIP); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably read VALID_ANT_ZIP
instead of VALID_ANT_JAR
.
} | ||
|
||
/** | ||
* Zip and Jars files to validate we can open | ||
*/ | ||
private static Stream<Path> zipFilesToTest() { | ||
return Stream.of(VALID_APK, VALID_APACHE_COMPRESS_JAR); | ||
return Stream.of(VALID_APK, VALID_APACHE_COMPRESS_JAR, VALID_ANT_JAR); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here you probably want to add VALID_ANT_ZIP
in addition to VALID_ANT_JAR
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, already caught that typo, forgot to save before I committed :-)
try this example, zip -T passed, unzip works fine, but openjdk rejects it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing the additional changes. This looks good to me now.
@LanceAndersen This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 33 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
This seems to be a very "free" interpretation of the specification to me. According to my understanding, the valid sizes of 8, 16, 24 or 28 as described in the Wikipedia article are a direct consequence of the specification which only allows for a fixed set of entries in the ZIP64 Extra Field. Already the zero-length case is questionable because a ZIP64 Extra Field should only be created if required, however we have to handle it here for backward compatibility reasons. |
// size, and locoff to make sure the fields != ZIP64_MAGICVAL | ||
if (sz == 0) { | ||
if ( csize == ZIP64_MINVAL || size == ZIP64_MINVAL || | ||
locoff == ZIP64_MINVAL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit but you can drop the space in "( csize)" and put the third condition on L3099 to make it easier to read.
For the comment, it looks like it is missing a comma after "== 0". Either that or change it to start with "Some older version of Apache Ant and Apache Commons ...".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in the latest update. Thank you!
I have provided a test.zip file above which passed the zip integrity test via "zip -T" and can be unzip w/o errors, but rejected by the openjdk. That zip was created based on the actual specification, and not on the wiki. |
Did you create that zip file manually or was it created by a tool and if by a tool than which one? I think we must differentiate here between functional compatibility with a tool like "zip", compatibility with a specification and the compatibility with existing zip files and zip files created by common tools. The latter is important and required in order to avoid regressions (and I think that's exactly what we're fixing with this PR). Compatibility with a specification is great as long as it doesn't collide with the previous point. Behavioral compatibility with a tool like "zip" is the least important in this list and I think as long as the file in question is not an artifact commonly created by popular tools it is fine to behave different for edge cases. |
That was created manually and then repacked by the zip.
That file is accepted by zip, by the latest JDK8u382, by the JDK20 GA, and rejected by the 20.0.2. That is a regression in the latest update of JDK11+ which we trying to solve here. |
In my opinion we should resolve the regression for existing zip files and zip files which are commonly created by popular tools. As far as I understand you can manually create "artificial" zip files which can be processed by the zip tool and previous versions of the JDK but not by new ones. As long as these kind of files aren't automatically generated by common tools, I don't see that as a real regression. I'm not even sure if we should fix that at all because hardly anybody is manually creating such zip files except maybe for attackers who intend to break the JDK. I recommend we should instead fix the real problem as quickly as possible and create a new issue for potential additional improvements if you think that's necessary. |
It can be processed by the new/latest version of JDK8.
It is clearly a regression. All that new checks should be proved to be based on some statement from the specification otherwise - such checks should be changed or deleted. As of now the strict check of the size does not based on the spec nor the behavior of the zip cmd. |
My overall point is that it will be unfortunate if users will be able to open some files on Linux/macOS/Windows using default programs but will not be able to do that using Java. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest changes looks okay.
@mrserb Have you tested your ZIP file with -Djdk.util.zip.disableZip64ExtraFieldValidation=true? That's the system property to disable the additional checking and is the "get out of jail card" for anyone running into issues. As always with changes like this, or other changes that tighten up checking, there is a risk that it will break something, hence the system property to give existing deployments a workaround to continue. In this case, the original change exposed an issue with a number of Apache projects (see the linked bugs in their issue trackers) and a bad bug in the BND tool that was fixed a few years ago. The system property is the temporary workaround until the deployment has versions of the libraries produced with updated versions of these tools, or a JDK update that tolerates a 0 block size. I think the main lesson with all this is that the JDK doesn't have enough "interop" tests in this area. There are dozens of tools and plugins that generate their own ZIP or JAR files. The addition of the ZIP64 extensions a few number of years ago ushered in a lot of interop issues due to different interpretations of the spec. The changes in PR expands the tests a bit but I think a follow on work will be required to massively expand the number of sample ZIP and JAR files that the JDK is tested with. |
nice work Lance, thanks for the comprehensive write up also. |
I disagree for a few reasons, using that property will completely disable the appropriate patch for a fix in the CPU, and it will be possible to have/accept some malicious zip files which may trigger some unfortunate behavior. That is not what we would like to recommend doing. Validation of the negative values is much more important.
|
Changes that introduce new checks or dial up validation are often risky changes. The JDK has a long history of introducing such changes with a system property or some means to temporarily disable the stricter checking, at least when the spec allows it. You may disagree with this long standing practice but it is a necessary evil to give a temporary workaround for environments that might need a bit of time to fix something after a JDK upgrade. There is of course risk in that but I don't think we can get into that discussion here. As I think has already been said, we can't engage with you in this PR on the reasons why additional checking was added in a security update. |
/integrate |
Going to push as commit 13f6450.
Your commit was automatically rebased without conflicts. |
@LanceAndersen Pushed as commit 13f6450. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
I think you have an assumption that this check for exact size(8/16/24) bytes are related to the change fixed by the security update, I am pretty sure that's the wrong assumption. |
This PR updates the extra field validation added as part of JDK-8302483 to deal with issues seen with 3rd party tools/libraries where a ZipException may be encountered when opening select APK, ZIP or JAR files. Please see refer to the links provided at the end the description for the more information ::
The extra field tag of
0xcafe
has its data size set to0
. and the extra length is7
. It is expected that you can use the tag's data size to traverse the extra fields.In the above example, the extra length is
0x8
and the tag size is61373
which exceeds the extra length.zip -T would also report an error:
Notice the CEN Extra length differs for the same tag in the LOC.
As we are validating the Zip64 extra fields, we are not expecting the data size to be 0.
Mach5 tiers 1-6 and the relevant JCK tests continue to pass with the above changes.
The following 3rd party tools have (or have pending) fixes to address the issues highlighted above:
Apache Commons-compress fix for Empty CEN Zip64 Extra Headers fixed in Commons-compress 1.11 (2016)
Ant fix for Empty CEN Zip64 Extra Headers in process is available in Ant 1.10.14 (2023).
BND issue with writing invalid Extra Headers and is fixed in BND 5.3 (2021)
The maven-bundle-plugin 5.1.5 includes the BND 5.3 patch.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/15273/head:pull/15273
$ git checkout pull/15273
Update a local copy of the PR:
$ git checkout pull/15273
$ git pull https://git.openjdk.org/jdk.git pull/15273/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 15273
View PR using the GUI difftool:
$ git pr show -t 15273
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/15273.diff
Webrev
Link to Webrev Comment