Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8303972: (zipfs) Make test/jdk/jdk/nio/zipfs/TestLocOffsetFromZip64EF.java independent of the zip command line #12979

Closed
wants to merge 13 commits into from

Conversation

eirbjo
Copy link
Contributor

@eirbjo eirbjo commented Mar 10, 2023

Please review this PR which brings the currenly problem-listed test TestLocOffsetFromZip64EF back to life.

Instead of calling out to the zip command line, we produce a small-sized Zip64 file in the test itself. This file has the features required to reproduce the ZipFileSystem issue, namely that the 'INFO-ZIP extended timestamp' field must come before the 'Zip64 extended information' field. (This would casue ZipFileSystem to read the LOC at position 0xFFFFFFFF).

This speed up the test (from 50s to 3s on my Macbook Pro), saves 4 GB disk space during builds and removes a brittle dependency on the zip command in OS/distros.

Seee JDK-8301183 for details on the problem-listing.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issues

  • JDK-8303972: (zipfs) Make test/jdk/jdk/nio/zipfs/TestLocOffsetFromZip64EF.java independent of the zip command line (Enhancement - P4)
  • JDK-8301183: (zipfs) jdk/jdk/nio/zipfs/TestLocOffsetFromZip64EF.java failing with ZipException:R0 on OL9 (Bug - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/12979/head:pull/12979
$ git checkout pull/12979

Update a local copy of the PR:
$ git checkout pull/12979
$ git pull https://git.openjdk.org/jdk.git pull/12979/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 12979

View PR using the GUI difftool:
$ git pr show -t 12979

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/12979.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Mar 10, 2023

👋 Welcome back eirbjo! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Mar 10, 2023

@eirbjo The following label will be automatically applied to this pull request:

  • nio

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the nio nio-dev@openjdk.org label Mar 10, 2023
@eirbjo eirbjo marked this pull request as ready for review March 10, 2023 15:32
@openjdk openjdk bot added the rfr Pull request is ready for review label Mar 10, 2023
@mlbridge
Copy link

mlbridge bot commented Mar 10, 2023

Webrevs

@eirbjo eirbjo marked this pull request as draft March 10, 2023 15:44
@eirbjo
Copy link
Contributor Author

eirbjo commented Mar 10, 2023

Converting to draft while I investigate why the updated test does not actually catch the regression

@openjdk openjdk bot removed the rfr Pull request is ready for review label Mar 10, 2023
public void createZipWithZip64Ext() throws IOException {
// Make a ZIP with two entries
try (FileOutputStream fileOutputStream = new FileOutputStream(new File(ZIP_FILE_NAME));
ZipOutputStream zo = new ZipOutputStream(new SparseOutputStream(fileOutputStream))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use FileChannel.open and specify SPARSE in the set of open options, I think that would make it clearer that the file is sparse and remove the need for SparseOutputStream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand. I could use FileChannel.open with SPARSE (which btw is ignored by the implementation:), but that doesn't give me an OutputStream I can pass to ZipOutputStream?

I could use Channels.newOutputStream, but that would not create holes in the sparse files?

My current thinking is that we need the SparseOutputStream to detect that ZIpOutputStream is writing empty bytes and replace that with channel.position(channel.position() + len)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand. I could use FileChannel.open with SPARSE (which btw is ignored by the implementation:)

That is true on platforms where files can be sparse by default. On Windows you'll see this translates to create the file as sparse, it needs to be opt-in when creating the file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opted to create a small-sized Zip64 file instead (see below). This file does not need to be sparse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true on platforms where files can be sparse by default. On Windows you'll see this translates to create the file as sparse, it needs to be opt-in when creating the file.

Thanks for this info, Alan!

While this PR no longer makes use of sparse files, I have applied your comments about FileChannel.open for NTFS in another PR (which does make use of sparse files):

#12948

Copy link
Contributor

@LanceAndersen LanceAndersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was leveraging a file created by info-zip intentionally so care needs to be taken to ensure that the zip that is being created is similar LOC and CEN output

@eirbjo
Copy link
Contributor Author

eirbjo commented Mar 10, 2023

This test was leveraging a file created by info-zip intentionally so care needs to be taken to ensure that the zip that is being created is similar LOC and CEN output

Yes, I just noticed that and currently looking into it. I think in particular it is sensitive to the ordering of the extended time stamp and the Zip64 entries, which is not the same as ZipOutputStream.

@eirbjo
Copy link
Contributor Author

eirbjo commented Mar 10, 2023

While it is possible to use a sparse file here, there is a problem that when ZipFileSystem reads inside a 'hole', it will skip ahead to the next data which turned out to be a LOC header. We could carefully select an entry size such that 0XFFFFFFFF does not not fall into in a hole, but that feels a bit too magic.

Instead, I opted for creating a small ZIP file with a Zip64-formatted CEN. Since this issue is sensitive to ordering of extra fields, the 'extended timestamp' needs to come before the 'Zip64 extended information field'.

The way I solve this is to add an opaque extra field, right sized for 'Zip64'. This uses a tag ID unknown to ZipEntry.setExtra, such that it will not be parsed as Zip64 but simply passed through. ZipOutputStream will output this after the extended timestamp.

After the ZIP is produced, the Zip64 entry is updated with the correct tag.

I have verified that this causes ZipFileSystem (before JDK-8255380) to attempt to read the LOC from 0xFFFFFFFF. With a small file size, this causes a slightly different error message because of the attempt to read beyond the end of the file.

----------System.out:(160/10450)----------
config TestLocOffsetFromZip64EF.setUp(): success
test TestLocOffsetFromZip64EF.walkZipFSTest({}): failure
java.util.zip.ZipException: loc: reading failed
	at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.readExtra(ZipFileSystem.java:3127)
	at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.readCEN(ZipFileSystem.java:2831)
	at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.<init>(ZipFileSystem.java:2767)
	at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem.getFileAttributes(ZipFileSystem.java:550)

While the symptom is slightly different, the cause is the same: ZipFileSystem tries to read the LOC at offset 0xFFFFFFFF because it has not read the Zip64 entry yet.

@eirbjo eirbjo changed the title 8303972: TestLocOffsetFromZip64EF could use a sparse file 8303972: Make test/jdk/jdk/nio/zipfs/TestLocOffsetFromZip64EF.java independent of the zip command line Mar 10, 2023
@eirbjo eirbjo marked this pull request as ready for review March 10, 2023 21:17
@openjdk openjdk bot added the rfr Pull request is ready for review label Mar 10, 2023
Copy link
Contributor

@LanceAndersen LanceAndersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your work here Eirik, I think we are close.

I think I would include an additional entry in the zip just as an extra sanity check that we can navigate an additional entry

It is also worth noting that the LOC will still contain an extra field with the unknown tag of 0x9902 while the CEN is being changed to use Zip64, 0x1

A few more minor comments below.

I ran the current version of the test and it runs quickly on all platforms via mach5

Best
Lance

buffer.position(extraOff + elen - ZIP64_SIZE);

// Update the Zip64 field with the real tag
buffer.putShort((short) 0x1); // Tag for Zip64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably make 0x1 a constant like the unknown tag.

It would be good to beef up the comment as the byte array written by ZipEntry::setExtra is set to the unknown tag for the Tag and the length is set to 28 which includes all of the Zip64 extra fields

Copy link
Contributor Author

@eirbjo eirbjo Mar 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably make 0x1 a constant like the unknown tag.

Extracted a constant with a comment.

It would be good to beef up the comment as the byte array written by ZipEntry::setExtra is set to the unknown tag for the Tag and the length is set to 28 which includes all of the Zip64 extra fields

Opted to extract a method to create the opaque extra field since that allows for more comments and declutters the calling method.

I dropped the 'disk start' field since it is not relevant and we don't set the corresponding magic value. So the Zip64 data size is now 24 instead of 28.

byte[] zip64 = new byte[ZIP64_SIZE];
ByteBuffer buffer = ByteBuffer.wrap(zip64).order(ByteOrder.LITTLE_ENDIAN);
buffer.putShort(UNKNOWN_TAG); // Opaque tag makes ZipEntry.setExtra ignore it
buffer.putShort((short) (zip64.length - 2 * Short.BYTES)); // Data size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The length of the zip64 extra field is going to be 28. I would suggest making this a bit clearer and adding a field/constant which is set to 28 for clarity to make it easier for those who might look at this code later

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The length of the zip64 extra field is going to be 28. I would suggest making this a bit clearer and adding a field/constant which is set to 28 for clarity to make it easier for those who might look at this code later

I believe this should be well-described now between the constants and the makeOpaqueExtraField method.

…nd extract method makeOpaqueExtraField to allow some more in-depth commenting. Move extra field processing until after the LOC headers are written, such that this information only occurs in the CEN
@eirbjo
Copy link
Contributor Author

eirbjo commented Mar 17, 2023

I think I would include an additional entry in the zip just as an extra sanity check that we can navigate an additional entry

Added an additional entry as suggested and checked that is was navigated.

It is also worth noting that the LOC will still contain an extra field with the unknown tag of 0x9902 while the CEN is being changed to use Zip64, 0x1

This stray opaque extra field should not be any issue, but just in case, I moved the ZipFIle.setExtra until after the LOC is written, so now only the CEN will have the Zip64 extra field.

@eirbjo
Copy link
Contributor Author

eirbjo commented Mar 17, 2023

I ran the latest version of the test against a reverted ZipFileSystem to confirmed that it still catches the regression it was made for.

@eirbjo
Copy link
Contributor Author

eirbjo commented Mar 17, 2023

Something worth mentioning:

This test currently does not check that the last modified time stamp is actually the one read from the LOC and not from the CEN. And there is a discrepancey here, ZipFileSystem reads it from the LOC, while ZipFile reads it from the CEN.

We could test this by setting different last modified timestamps in the LOC and CEN and use assertEquals to verify that the expected time stamp is parsed.

@LanceAndersen Do you think this is worth pursuing in this PR? I have a sketch ready, but I fear will clutter the test with too many concerns. Perhaps better to test in a separate test which could be independent from Zip64 and local header offset concerns?

@mlbridge
Copy link

mlbridge bot commented Mar 17, 2023

Mailing list message from Lance Andersen on nio-dev:

On Mar 17, 2023, at 3:32 PM, Eirik Bjorsnos <duke at openjdk.org<mailto:duke at openjdk.org>> wrote:

On Fri, 17 Mar 2023 18:36:02 GMT, Eirik Bjorsnos <duke at openjdk.org<mailto:duke at openjdk.org>> wrote:

Please review this PR which brings the currenly problem-listed test TestLocOffsetFromZip64EF back to life.

Instead of calling out to the `zip` command line, we produce a small-sized Zip64 file in the test itself. This file has the features required to reproduce the ZipFileSystem issue, namely that the 'INFO-ZIP extended timestamp' field must come before the 'Zip64 extended information' field. (This would casue ZipFileSystem to read the LOC at position 0xFFFFFFFF).

This speed up the test (from 50s to 3s on my Macbook Pro), saves 4GB disk space during builds removes a dependency on the `zip` command in OS/distros.

Seee [JDK-8301183](https://bugs.openjdk.org/browse/JDK-8301183) for details on the problem-listing.

Eirik Bjorsnos has updated the pull request incrementally with one additional commit since the last revision:

Add extra entry as a sanity check. Extract some constants for sizes and extract method makeOpaqueExtraField to allow some more in-depth commenting. Move extra field processing until after the LOC headers are written, such that this information only occurs in the CEN

Something worth mentioning:

This test currently does not check that the last modified time stamp is actually the one read from the LOC and not from the CEN. And there is a discrepancey here, `ZipFileSystem` reads it from the LOC, while `ZipFile` reads it from the CEN.

We could test this by setting different last modified timestamps in the LOC and CEN and use assertEquals to verify that the expected time stamp is parsed.

@LanceAndersen Do you think this is worth pursuing in this PR? I have a sketch ready, but I fear will clutter the test with too many concerns. Perhaps better to test in a separate test which could be independent from Zip64 and local header offset concerns?

Hi Eirik,

Please treat the timestamp issue above separately so we are not morphing issues.

Best
Lance

-------------

PR: https://git.openjdk.org/jdk/pull/12979

[cid:E1C4E2F0-ECD0-4C9D-ADB4-B16CA7BCB7FC at home]

Lance Andersen | Principal Member of Technical Staff | +1.781.442.2037
Oracle Java Engineering
1 Network Drive
Burlington, MA 01803
Lance.Andersen at oracle.com<mailto:Lance.Andersen at oracle.com>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/nio-dev/attachments/20230317/90f48398/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: oracle_sig_logo.gif
Type: image/gif
Size: 658 bytes
Desc: oracle_sig_logo.gif
URL: <https://mail.openjdk.org/pipermail/nio-dev/attachments/20230317/90f48398/oracle_sig_logo.gif>

Copy link
Contributor

@LanceAndersen LanceAndersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good overall. Please see the comment below and then I can run the test one more time and you should be good to go

byte[] zip64 = makeOpaqueExtraField();
e.setExtra(zip64);

zo.finish(); // Write out CEN and END records
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might add one more entry after this one, perhaps with a compression method of DEFLATED

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a third DEFLATED entry, added asserts that ZipFile and ZipFileSystem finds the expected entries. The LOC traversal no longer works, so the CEN offset is now looked up from the EOC header instead.

@openjdk
Copy link

openjdk bot commented Mar 23, 2023

@eirbjo This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8303972: (zipfs) Make test/jdk/jdk/nio/zipfs/TestLocOffsetFromZip64EF.java independent of the zip command line
8301183: (zipfs) jdk/jdk/nio/zipfs/TestLocOffsetFromZip64EF.java failing with ZipException:R0 on OL9

Reviewed-by: lancea

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 183 new commits pushed to the master branch:

  • c2d9fa2: 8326000: Remove obsolete comments for class sun.security.ssl.SunJSSE
  • f50df10: 8299023: TestPLABResize.java and TestPLABPromotion.java are failing intermittently
  • cf13086: 8317697: refactor-encapsulate x86 VM_Version::CpuidInfo
  • 3b76372: 8325687: SimpleJavaFileObject specification would benefit from implSpec
  • b5df2f4: 8323170: j2dbench is using outdated javac source/target to be able to build by itself
  • 267780b: 8324680: Replace NULL with nullptr in JVMTI generated code
  • f3073db: 8321408: Add Certainly roots R1 and E1
  • 4c7b313: 8325908: Finish removal of IntlTest and CollatorTest
  • 5310250: 8325941: Remove unused Space::block_size
  • 7a76252: 8325990: Remove use of snippet @replace annotation in java.base
  • ... and 173 more: https://git.openjdk.org/jdk/compare/692c9f8821e220560927dd6bbedfea9ddfe312f6...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Mar 23, 2023
…ND header instead of skipping LOC headers. Add asserts that verifies that the ZIP has the expected entries.
@bridgekeeper
Copy link

bridgekeeper bot commented May 1, 2023

@eirbjo This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@bridgekeeper
Copy link

bridgekeeper bot commented May 30, 2023

@eirbjo This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

@bridgekeeper bridgekeeper bot closed this May 30, 2023
@eirbjo
Copy link
Contributor Author

eirbjo commented Feb 8, 2024

/open

@openjdk openjdk bot reopened this Feb 8, 2024
@openjdk
Copy link

openjdk bot commented Feb 8, 2024

@eirbjo This pull request is now open

@eirbjo
Copy link
Contributor Author

eirbjo commented Feb 8, 2024

This PR was approved by @LanceAndersen in March 2023, but for some reason it was never integrated. Most probably I was waiting for a final approval following the last review comments from Lance. The test was rewritten to JUnit in October 2023.

The test is useful in that it covers an aspect sensitive to the order of ZIP64 extended fields which is produced by Info-ZIP, but which ZipOutputStream cannot produce.

I think it would be good to bring this problem-listed test back to life and also make it platform independent.

@eirbjo eirbjo changed the title 8303972: Make test/jdk/jdk/nio/zipfs/TestLocOffsetFromZip64EF.java independent of the zip command line 8303972: (zipfs) Make test/jdk/jdk/nio/zipfs/TestLocOffsetFromZip64EF.java independent of the zip command line Feb 8, 2024
@eirbjo
Copy link
Contributor Author

eirbjo commented Feb 8, 2024

/issue add 8301183

@openjdk
Copy link

openjdk bot commented Feb 8, 2024

@eirbjo
Adding additional issue to issue list: 8301183: (zipfs) jdk/jdk/nio/zipfs/TestLocOffsetFromZip64EF.java failing with ZipException:R0 on OL9.

@jaikiran
Copy link
Member

Hello Eirik, before integrating this I'll run this against a specific version of OS to verify that the test that is being removed from the problem listing does indeed work fine on that OS. Given that we no longer depend on a OS specific tool to generate the zip, I think it should work fine, but I'll do one run to be sure.

@jaikiran
Copy link
Member

I ran this test-only change against our CI plus specifically against Oracle Linux 9, where this test was previously failing. All tests passed. So this looks good to me.

@eirbjo
Copy link
Contributor Author

eirbjo commented Feb 18, 2024

I ran this test-only change against our CI plus specifically against Oracle Linux 9, where this test was previously failing. All tests passed. So this looks good to me.

Thanks @jaikiran. This PR was approved by Lance March last year. There have been some minor updates after that. It would be good to see an approval of this PR in its current state before I integrate.

@eirbjo
Copy link
Contributor Author

eirbjo commented Feb 18, 2024

/integrate

@openjdk
Copy link

openjdk bot commented Feb 18, 2024

Going to push as commit 7004c27.
Since your change was applied there have been 183 commits pushed to the master branch:

  • c2d9fa2: 8326000: Remove obsolete comments for class sun.security.ssl.SunJSSE
  • f50df10: 8299023: TestPLABResize.java and TestPLABPromotion.java are failing intermittently
  • cf13086: 8317697: refactor-encapsulate x86 VM_Version::CpuidInfo
  • 3b76372: 8325687: SimpleJavaFileObject specification would benefit from implSpec
  • b5df2f4: 8323170: j2dbench is using outdated javac source/target to be able to build by itself
  • 267780b: 8324680: Replace NULL with nullptr in JVMTI generated code
  • f3073db: 8321408: Add Certainly roots R1 and E1
  • 4c7b313: 8325908: Finish removal of IntlTest and CollatorTest
  • 5310250: 8325941: Remove unused Space::block_size
  • 7a76252: 8325990: Remove use of snippet @replace annotation in java.base
  • ... and 173 more: https://git.openjdk.org/jdk/compare/692c9f8821e220560927dd6bbedfea9ddfe312f6...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Feb 18, 2024
@openjdk openjdk bot closed this Feb 18, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Feb 18, 2024
@openjdk
Copy link

openjdk bot commented Feb 18, 2024

@eirbjo Pushed as commit 7004c27.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integrated Pull request has been integrated nio nio-dev@openjdk.org
Development

Successfully merging this pull request may close these issues.

4 participants