-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8316141: Improve CEN header validation checking #16570
Conversation
👋 Welcome back lancea! A progress list of the required criteria for merging this PR into |
@LanceAndersen The following labels will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command. |
Perhaps the PR/issue title could be more specific in describing what is being validated? Something like "Validate the combined length of CEN header fields"? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, although I'm not sure I follow the underlying motivation of this stricter validation.
What problem is being solved here? APPNOTE.TXT uses the phrase SHOULD NOT
. Even if the spec is not an RFC, RFC2119 defines SHOULD NOT
as:
there may exist valid reasons in particular circumstances when the
particular behavior is acceptable or even useful, but the full
implications should be understood and the case carefully weighed
before implementing any behavior described with this label.
I would expect our producer ZipOutputStream
to be stricter than our consumers in this case, honoring Postel's law. From a implementation robustness perspective, the individual lengths are already validated, it's just the combined clause that is now enforced in this PR.
That said, here are some comments inline:
@@ -1222,16 +1222,17 @@ private int checkAndAddEntry(int pos, int index) | |||
int nlen = CENNAM(cen, pos); | |||
int elen = CENEXT(cen, pos); | |||
int clen = CENCOM(cen, pos); | |||
if (entryPos + nlen > cen.length - ENDHDR) { | |||
long headerSize = (long)CENHDR + nlen + clen + elen; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since CENHDR is 46 and nlen, clen, elen are all unsigned shorts, this sum cannot possibly overflow an int. Is the long conversion necessary?
The specification uses the term "combined length", would it be better to name the local variable combinedLength
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can remove the cast as that was a holdover. I chose to make this a long knowing that it would not overflow but an overflow while unlikely could occur depending on the value of pos
in the statement below
if (headerSize > 0xFFFF || pos + headerSize > cen.length - ENDHDR) {
zerror("invalid CEN header (bad header size)");
}
I could keep headerSize an int and then move the cast to the if statement:
if (headerSize > 0xFFFF || (long)pos + headerSize > cen.length - ENDHDR) {
zerror("invalid CEN header (bad header size)");
}
I decided making headerSize a long might be clearer but do not have a strong preference and will go with the consensus
As far as the name, I don't have a strong preference, but not sure combinedLength is any better
zerror("invalid CEN header (bad extra offset)"); | ||
} | ||
checkExtraFields(pos, (int)extraStartingOffset, elen); | ||
checkExtraFields(pos, entryPos + nlen, elen); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming of entryPos
confused me here. The name indicates it is the position where the CEN header starts, but we already have pos
for that. (It actually contains the position where the encoded name starts)
So perhaps it should be renamed to namePos
or npos
to make future maintenance less confusing?
Also, I actually liked the extraStartingOffset
local variable, having a name makes the code easier to follow than just entryPos + nlen
. But perhaps extraPos
is shorter and more consistent with other uses of pos
?
So perhaps: long extraPos = pos + CENHDR + nlen
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
entryPos
was the name of the field from a previous PR so I did not see a need to change it and decided there was no need to keep extraStartingOffset given the change in validation above.
@@ -1593,8 +1593,13 @@ private byte[] initCEN() throws IOException { | |||
if (method != METHOD_STORED && method != METHOD_DEFLATED) { | |||
throw new ZipException("invalid CEN header (unsupported compression method: " + method + ")"); | |||
} | |||
if (pos + CENHDR + nlen > limit) { | |||
throw new ZipException("invalid CEN header (bad header size)"); | |||
long headerSize = (long)CENHDR + nlen + clen + elen; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the corresponding ZipFile local variable is renamed, this should also be updated.
@@ -1660,7 +1665,7 @@ private void checkExtraFields( byte[] cen, int cenPos, long size, long csize, | |||
|
|||
int tagBlockSize = SH(cen, currentOffset); | |||
currentOffset += Short.BYTES; | |||
int tagBlockEndingOffset = currentOffset + tagBlockSize; | |||
long tagBlockEndingOffset = (long)currentOffset + tagBlockSize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my ZipFile comment also applies here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the comments. See my replies below.
Regarding you comment about checking whether or not to check if the combined length of the CEN header + name length + comment length + extra length > 65K bytes, I chose to add this given the strong wording given this is a really old spec. That being said, I do not object to removing the validation if that is the overall preference.
zerror("invalid CEN header (bad extra offset)"); | ||
} | ||
checkExtraFields(pos, (int)extraStartingOffset, elen); | ||
checkExtraFields(pos, entryPos + nlen, elen); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
entryPos
was the name of the field from a previous PR so I did not see a need to change it and decided there was no need to keep extraStartingOffset given the change in validation above.
@@ -1222,16 +1222,17 @@ private int checkAndAddEntry(int pos, int index) | |||
int nlen = CENNAM(cen, pos); | |||
int elen = CENEXT(cen, pos); | |||
int clen = CENCOM(cen, pos); | |||
if (entryPos + nlen > cen.length - ENDHDR) { | |||
long headerSize = (long)CENHDR + nlen + clen + elen; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can remove the cast as that was a holdover. I chose to make this a long knowing that it would not overflow but an overflow while unlikely could occur depending on the value of pos
in the statement below
if (headerSize > 0xFFFF || pos + headerSize > cen.length - ENDHDR) {
zerror("invalid CEN header (bad header size)");
}
I could keep headerSize an int and then move the cast to the if statement:
if (headerSize > 0xFFFF || (long)pos + headerSize > cen.length - ENDHDR) {
zerror("invalid CEN header (bad header size)");
}
I decided making headerSize a long might be clearer but do not have a strong preference and will go with the consensus
As far as the name, I don't have a strong preference, but not sure combinedLength is any better
I can't claim to have a particularly strong opinion on this, the following is mostly me thinking aloud:
|
Yes we have a corpus search available and have exercised this patch (along with your ZipInputStream patch) without any regressions. Given where we are in the JDK 22 cycle, going to hold off on finalizing the PR until we fork for JDK 23 and look to move this forward early on allowing for additional time to bake |
Tightening validation always comes with risk. Doing it early in JDK 23 to allow time for course correction if needed seems a good plan. |
Another benefit is that if we should decide to validate LOC headers similarly in |
While investigating an unrelated issue, I noticed that Android's A page size will typically be 4K, so this will probably not push the 0XFFFF combined limit clause. But interesting to note that there are tools out there taking advantage of the extra field in non-obvious ways. I also noticed there are Java ports of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the zip changes are okay. As per our discussion here, the compatibility impact can be evaluated later in JDK 23 to gauge whether there it is too strict.
@LanceAndersen This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 436 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
/integrate |
Going to push as commit 0eb299a.
Your commit was automatically rebased without conflicts. |
@LanceAndersen Pushed as commit 0eb299a. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
Please review this PR which enhances the existing CEN header validation checking to ensure that the
size of the CEN Header + name length + comment length + extra length do not exceed 65,535 bytes per the PKWare APP.NOTE 4.4.10, 4.4.11, & 4.4.12. Also check that current CEN header will not exceed the length of the CEN array.
Mach 5 tiers 1-3 are clean with this change.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16570/head:pull/16570
$ git checkout pull/16570
Update a local copy of the PR:
$ git checkout pull/16570
$ git pull https://git.openjdk.org/jdk.git pull/16570/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 16570
View PR using the GUI difftool:
$ git pr show -t 16570
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16570.diff
Webrev
Link to Webrev Comment