Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8316141: Improve CEN header validation checking #16570

Closed
wants to merge 1 commit into from

Conversation

LanceAndersen
Copy link
Contributor

@LanceAndersen LanceAndersen commented Nov 8, 2023

Please review this PR which enhances the existing CEN header validation checking to ensure that the
size of the CEN Header + name length + comment length + extra length do not exceed 65,535 bytes per the PKWare APP.NOTE 4.4.10, 4.4.11, & 4.4.12. Also check that current CEN header will not exceed the length of the CEN array.

Mach 5 tiers 1-3 are clean with this change.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8316141: Improve CEN header validation checking (Bug - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16570/head:pull/16570
$ git checkout pull/16570

Update a local copy of the PR:
$ git checkout pull/16570
$ git pull https://git.openjdk.org/jdk.git pull/16570/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 16570

View PR using the GUI difftool:
$ git pr show -t 16570

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16570.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Nov 8, 2023

👋 Welcome back lancea! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Nov 8, 2023
@openjdk
Copy link

openjdk bot commented Nov 8, 2023

@LanceAndersen The following labels will be automatically applied to this pull request:

  • core-libs
  • nio

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added nio nio-dev@openjdk.org core-libs core-libs-dev@openjdk.org labels Nov 8, 2023
@mlbridge
Copy link

mlbridge bot commented Nov 8, 2023

Webrevs

@eirbjo
Copy link
Contributor

eirbjo commented Nov 8, 2023

Perhaps the PR/issue title could be more specific in describing what is being validated? Something like "Validate the combined length of CEN header fields"?

Copy link
Contributor

@eirbjo eirbjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, although I'm not sure I follow the underlying motivation of this stricter validation.

What problem is being solved here? APPNOTE.TXT uses the phrase SHOULD NOT. Even if the spec is not an RFC, RFC2119 defines SHOULD NOT as:

   there may exist valid reasons in particular circumstances when the
   particular behavior is acceptable or even useful, but the full
   implications should be understood and the case carefully weighed
   before implementing any behavior described with this label.

I would expect our producer ZipOutputStream to be stricter than our consumers in this case, honoring Postel's law. From a implementation robustness perspective, the individual lengths are already validated, it's just the combined clause that is now enforced in this PR.

That said, here are some comments inline:

@@ -1222,16 +1222,17 @@ private int checkAndAddEntry(int pos, int index)
int nlen = CENNAM(cen, pos);
int elen = CENEXT(cen, pos);
int clen = CENCOM(cen, pos);
if (entryPos + nlen > cen.length - ENDHDR) {
long headerSize = (long)CENHDR + nlen + clen + elen;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since CENHDR is 46 and nlen, clen, elen are all unsigned shorts, this sum cannot possibly overflow an int. Is the long conversion necessary?

The specification uses the term "combined length", would it be better to name the local variable combinedLength instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove the cast as that was a holdover. I chose to make this a long knowing that it would not overflow but an overflow while unlikely could occur depending on the value of pos in the statement below

if (headerSize > 0xFFFF || pos + headerSize > cen.length - ENDHDR) {
                zerror("invalid CEN header (bad header size)");
 }

I could keep headerSize an int and then move the cast to the if statement:

if (headerSize > 0xFFFF || (long)pos + headerSize > cen.length - ENDHDR) {
                zerror("invalid CEN header (bad header size)");
 }

I decided making headerSize a long might be clearer but do not have a strong preference and will go with the consensus

As far as the name, I don't have a strong preference, but not sure combinedLength is any better

zerror("invalid CEN header (bad extra offset)");
}
checkExtraFields(pos, (int)extraStartingOffset, elen);
checkExtraFields(pos, entryPos + nlen, elen);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming of entryPos confused me here. The name indicates it is the position where the CEN header starts, but we already have pos for that. (It actually contains the position where the encoded name starts)

So perhaps it should be renamed to namePos or npos to make future maintenance less confusing?

Also, I actually liked the extraStartingOffset local variable, having a name makes the code easier to follow than just entryPos + nlen. But perhaps extraPos is shorter and more consistent with other uses of pos?

So perhaps: long extraPos = pos + CENHDR + nlen ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

entryPos was the name of the field from a previous PR so I did not see a need to change it and decided there was no need to keep extraStartingOffset given the change in validation above.

@@ -1593,8 +1593,13 @@ private byte[] initCEN() throws IOException {
if (method != METHOD_STORED && method != METHOD_DEFLATED) {
throw new ZipException("invalid CEN header (unsupported compression method: " + method + ")");
}
if (pos + CENHDR + nlen > limit) {
throw new ZipException("invalid CEN header (bad header size)");
long headerSize = (long)CENHDR + nlen + clen + elen;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the corresponding ZipFile local variable is renamed, this should also be updated.

@@ -1660,7 +1665,7 @@ private void checkExtraFields( byte[] cen, int cenPos, long size, long csize,

int tagBlockSize = SH(cen, currentOffset);
currentOffset += Short.BYTES;
int tagBlockEndingOffset = currentOffset + tagBlockSize;
long tagBlockEndingOffset = (long)currentOffset + tagBlockSize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my ZipFile comment also applies here.

Copy link
Contributor Author

@LanceAndersen LanceAndersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the comments. See my replies below.

Regarding you comment about checking whether or not to check if the combined length of the CEN header + name length + comment length + extra length > 65K bytes, I chose to add this given the strong wording given this is a really old spec. That being said, I do not object to removing the validation if that is the overall preference.

zerror("invalid CEN header (bad extra offset)");
}
checkExtraFields(pos, (int)extraStartingOffset, elen);
checkExtraFields(pos, entryPos + nlen, elen);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

entryPos was the name of the field from a previous PR so I did not see a need to change it and decided there was no need to keep extraStartingOffset given the change in validation above.

@@ -1222,16 +1222,17 @@ private int checkAndAddEntry(int pos, int index)
int nlen = CENNAM(cen, pos);
int elen = CENEXT(cen, pos);
int clen = CENCOM(cen, pos);
if (entryPos + nlen > cen.length - ENDHDR) {
long headerSize = (long)CENHDR + nlen + clen + elen;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove the cast as that was a holdover. I chose to make this a long knowing that it would not overflow but an overflow while unlikely could occur depending on the value of pos in the statement below

if (headerSize > 0xFFFF || pos + headerSize > cen.length - ENDHDR) {
                zerror("invalid CEN header (bad header size)");
 }

I could keep headerSize an int and then move the cast to the if statement:

if (headerSize > 0xFFFF || (long)pos + headerSize > cen.length - ENDHDR) {
                zerror("invalid CEN header (bad header size)");
 }

I decided making headerSize a long might be clearer but do not have a strong preference and will go with the consensus

As far as the name, I don't have a strong preference, but not sure combinedLength is any better

@eirbjo
Copy link
Contributor

eirbjo commented Nov 16, 2023

Regarding you comment about checking whether or not to check if the combined length of the CEN header + name length + comment length + extra length > 65K bytes, I chose to add this given the strong wording given this is a really old spec. That being said, I do not object to removing the validation if that is the overall preference.

I can't claim to have a particularly strong opinion on this, the following is mostly me thinking aloud:

  • Given Hyrum's Law, it is conceivable that someone is currently using the extra or comment fields to attach up to 65535+65535 bytes of metadata for entires. The proposed validation will break such schemes. Does Oracle have a ZIP file corpus which could be used to identify files currently exceeding the combined length clause, just to get a sense of how rare or common this is?
  • The actual benefits of adding this validation after all these years is not quite clear to me. I don't see how this improves security, robustness, compatibility, maintainability or other ilities (apart from strictly-following-the-spec-ility :-)
  • I created a ZIP file with an entry with an extra field of the maximal expressable length of 0xFFFF. Info-ZIP's unzip command on MacOS did not output any warning or error when processing this file.

@LanceAndersen
Copy link
Contributor Author

Regarding you comment about checking whether or not to check if the combined length of the CEN header + name length + comment length + extra length > 65K bytes, I chose to add this given the strong wording given this is a really old spec. That being said, I do not object to removing the validation if that is the overall preference.

I can't claim to have a particularly strong opinion on this, the following is mostly me thinking aloud:

  • Given Hyrum's Law, it is conceivable that someone is currently using the extra or comment fields to attach up to 65535+65535 bytes of metadata for entires. The proposed validation will break such schemes. Does Oracle have a ZIP file corpus which could be used to identify files currently exceeding the combined length clause, just to get a sense of how rare or common this is?
  • The actual benefits of adding this validation after all these years is not quite clear to me. I don't see how this improves security, robustness, compatibility, maintainability or other ilities (apart from strictly-following-the-spec-ility :-)
  • I created a ZIP file with an entry with an extra field of the maximal expressable length of 0xFFFF. Info-ZIP's unzip command on MacOS did not output any warning or error when processing this file.

Yes we have a corpus search available and have exercised this patch (along with your ZipInputStream patch) without any regressions.

Given where we are in the JDK 22 cycle, going to hold off on finalizing the PR until we fork for JDK 23 and look to move this forward early on allowing for additional time to bake

@AlanBateman
Copy link
Contributor

Given where we are in the JDK 22 cycle, going to hold off on finalizing the PR until we fork for JDK 23 and look to move this forward early on allowing for additional time to bake

Tightening validation always comes with risk. Doing it early in JDK 23 to allow time for course correction if needed seems a good plan.

@eirbjo
Copy link
Contributor

eirbjo commented Nov 28, 2023

Doing it early in JDK 23 to allow time for course correction if needed seems a good plan.

Another benefit is that if we should decide to validate LOC headers similarly in ZipInputStream, delaying until 23 will allow us to introduce these very similar changes in the same release.

@eirbjo
Copy link
Contributor

eirbjo commented Dec 1, 2023

While investigating an unrelated issue, I noticed that Android's zipalign tool processes zip files and injects data into the extra field to make the beginning of the file data be word-aligned or page-aligned. This in order to memory-map data directly from within the ZIP file, load shared libraries without unpacking etc.

A page size will typically be 4K, so this will probably not push the 0XFFFF combined limit clause. But interesting to note that there are tools out there taking advantage of the extra field in non-obvious ways.

I also noticed there are Java ports of zipalign in various repositories on Github, so this technique might also be in use outside of Android.

Copy link
Contributor

@AlanBateman AlanBateman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the zip changes are okay. As per our discussion here, the compatibility impact can be evaluated later in JDK 23 to gauge whether there it is too strict.

@openjdk
Copy link

openjdk bot commented Dec 8, 2023

@LanceAndersen This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8316141: Improve CEN header validation checking

Reviewed-by: alanb

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 436 new commits pushed to the master branch:

  • 86623aa: 8320786: Remove ThreadGroup.stop
  • af5c492: 8320532: Remove Thread/ThreadGroup suspend/resume
  • cb7e3d2: 8321560: [BACKOUT] 8299426: Heap dump does not contain virtual Thread stack references
  • 25dc476: 8286827: BogusColorSpace methods return wrong array
  • 11e4a92: 8320597: RSA signature verification fails on signed data that does not encode params correctly
  • 354ea4c: 8299426: Heap dump does not contain virtual Thread stack references
  • 959a443: 8288712: Typo in javadoc in javax.imageio.ImageReader.java
  • 4ed38f5: 8321409: Console read line with zero out should zero out underlying buffer in JLine (redux)
  • fe4c0a2: 8302790: Set FileMapRegion::mapped_base() to null if mapping fails
  • 519ecd3: 8319413: Start of release updates for JDK 23
  • ... and 426 more: https://git.openjdk.org/jdk/compare/e9eb8b98f4dd949c8a0f501189471e11b837d936...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Dec 8, 2023
@LanceAndersen
Copy link
Contributor Author

/integrate

@openjdk
Copy link

openjdk bot commented Dec 8, 2023

Going to push as commit 0eb299a.
Since your change was applied there have been 442 commits pushed to the master branch:

  • b893a2b: 8321597: Use .template consistently for files treated as templates by the build
  • 05f9509: 8321374: Add a configure option to explicitly set CompanyName property in VersionInfo resource for Windows exe/dll
  • 701bc3b: 8295166: IGV: dump graph at more locations
  • 9e48b90: 8310524: C2: record parser-generated LoadN nodes for IGVN
  • bad5edf: 8320959: jdk/jfr/event/runtime/TestShutdownEvent.java crash with CONF=fastdebug -Xcomp
  • f577385: 8316738: java/net/httpclient/HttpClientLocalAddrTest.java failed in timeout
  • 86623aa: 8320786: Remove ThreadGroup.stop
  • af5c492: 8320532: Remove Thread/ThreadGroup suspend/resume
  • cb7e3d2: 8321560: [BACKOUT] 8299426: Heap dump does not contain virtual Thread stack references
  • 25dc476: 8286827: BogusColorSpace methods return wrong array
  • ... and 432 more: https://git.openjdk.org/jdk/compare/e9eb8b98f4dd949c8a0f501189471e11b837d936...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Dec 8, 2023
@openjdk openjdk bot closed this Dec 8, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Dec 8, 2023
@openjdk
Copy link

openjdk bot commented Dec 8, 2023

@LanceAndersen Pushed as commit 0eb299a.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org integrated Pull request has been integrated nio nio-dev@openjdk.org
Development

Successfully merging this pull request may close these issues.

3 participants