-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8303972: (zipfs) Make test/jdk/jdk/nio/zipfs/TestLocOffsetFromZip64EF.java independent of the zip command line #12979
Conversation
…utputStream instead of calling out to the ZIP command line
👋 Welcome back eirbjo! A progress list of the required criteria for merging this PR into |
Webrevs
|
Converting to draft while I investigate why the updated test does not actually catch the regression |
public void createZipWithZip64Ext() throws IOException { | ||
// Make a ZIP with two entries | ||
try (FileOutputStream fileOutputStream = new FileOutputStream(new File(ZIP_FILE_NAME)); | ||
ZipOutputStream zo = new ZipOutputStream(new SparseOutputStream(fileOutputStream))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use FileChannel.open and specify SPARSE in the set of open options, I think that would make it clearer that the file is sparse and remove the need for SparseOutputStream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand. I could use FileChannel.open with SPARSE (which btw is ignored by the implementation:), but that doesn't give me an OutputStream I can pass to ZipOutputStream?
I could use Channels.newOutputStream, but that would not create holes in the sparse files?
My current thinking is that we need the SparseOutputStream to detect that ZIpOutputStream is writing empty bytes and replace that with channel.position(channel.position() + len)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand. I could use FileChannel.open with SPARSE (which btw is ignored by the implementation:)
That is true on platforms where files can be sparse by default. On Windows you'll see this translates to create the file as sparse, it needs to be opt-in when creating the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opted to create a small-sized Zip64 file instead (see below). This file does not need to be sparse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is true on platforms where files can be sparse by default. On Windows you'll see this translates to create the file as sparse, it needs to be opt-in when creating the file.
Thanks for this info, Alan!
While this PR no longer makes use of sparse files, I have applied your comments about FileChannel.open for NTFS in another PR (which does make use of sparse files):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test was leveraging a file created by info-zip intentionally so care needs to be taken to ensure that the zip that is being created is similar LOC and CEN output
Yes, I just noticed that and currently looking into it. I think in particular it is sensitive to the ordering of the extended time stamp and the Zip64 entries, which is not the same as ZipOutputStream. |
… Zip64-formatted CEN header.
While it is possible to use a sparse file here, there is a problem that when ZipFileSystem reads inside a 'hole', it will skip ahead to the next data which turned out to be a LOC header. We could carefully select an entry size such that 0XFFFFFFFF does not not fall into in a hole, but that feels a bit too magic. Instead, I opted for creating a small ZIP file with a Zip64-formatted CEN. Since this issue is sensitive to ordering of extra fields, the 'extended timestamp' needs to come before the 'Zip64 extended information field'. The way I solve this is to add an opaque extra field, right sized for 'Zip64'. This uses a tag ID unknown to ZipEntry.setExtra, such that it will not be parsed as Zip64 but simply passed through. ZipOutputStream will output this after the extended timestamp. After the ZIP is produced, the Zip64 entry is updated with the correct tag. I have verified that this causes ZipFileSystem (before JDK-8255380) to attempt to read the LOC from 0xFFFFFFFF. With a small file size, this causes a slightly different error message because of the attempt to read beyond the end of the file.
While the symptom is slightly different, the cause is the same: ZipFileSystem tries to read the LOC at offset 0xFFFFFFFF because it has not read the Zip64 entry yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your work here Eirik, I think we are close.
I think I would include an additional entry in the zip just as an extra sanity check that we can navigate an additional entry
It is also worth noting that the LOC will still contain an extra field with the unknown tag of 0x9902 while the CEN is being changed to use Zip64, 0x1
A few more minor comments below.
I ran the current version of the test and it runs quickly on all platforms via mach5
Best
Lance
buffer.position(extraOff + elen - ZIP64_SIZE); | ||
|
||
// Update the Zip64 field with the real tag | ||
buffer.putShort((short) 0x1); // Tag for Zip64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably make 0x1 a constant like the unknown tag.
It would be good to beef up the comment as the byte array written by ZipEntry::setExtra is set to the unknown tag for the Tag and the length is set to 28 which includes all of the Zip64 extra fields
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably make 0x1 a constant like the unknown tag.
Extracted a constant with a comment.
It would be good to beef up the comment as the byte array written by ZipEntry::setExtra is set to the unknown tag for the Tag and the length is set to 28 which includes all of the Zip64 extra fields
Opted to extract a method to create the opaque extra field since that allows for more comments and declutters the calling method.
I dropped the 'disk start' field since it is not relevant and we don't set the corresponding magic value. So the Zip64 data size is now 24 instead of 28.
byte[] zip64 = new byte[ZIP64_SIZE]; | ||
ByteBuffer buffer = ByteBuffer.wrap(zip64).order(ByteOrder.LITTLE_ENDIAN); | ||
buffer.putShort(UNKNOWN_TAG); // Opaque tag makes ZipEntry.setExtra ignore it | ||
buffer.putShort((short) (zip64.length - 2 * Short.BYTES)); // Data size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The length of the zip64 extra field is going to be 28. I would suggest making this a bit clearer and adding a field/constant which is set to 28 for clarity to make it easier for those who might look at this code later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The length of the zip64 extra field is going to be 28. I would suggest making this a bit clearer and adding a field/constant which is set to 28 for clarity to make it easier for those who might look at this code later
I believe this should be well-described now between the constants and the makeOpaqueExtraField
method.
…nd extract method makeOpaqueExtraField to allow some more in-depth commenting. Move extra field processing until after the LOC headers are written, such that this information only occurs in the CEN
Added an additional entry as suggested and checked that is was navigated.
This stray opaque extra field should not be any issue, but just in case, I moved the ZipFIle.setExtra until after the LOC is written, so now only the CEN will have the Zip64 extra field. |
I ran the latest version of the test against a reverted ZipFileSystem to confirmed that it still catches the regression it was made for. |
Something worth mentioning: This test currently does not check that the last modified time stamp is actually the one read from the LOC and not from the CEN. And there is a discrepancey here, We could test this by setting different last modified timestamps in the LOC and CEN and use assertEquals to verify that the expected time stamp is parsed. @LanceAndersen Do you think this is worth pursuing in this PR? I have a sketch ready, but I fear will clutter the test with too many concerns. Perhaps better to test in a separate test which could be independent from Zip64 and local header offset concerns? |
Mailing list message from Lance Andersen on nio-dev: On Mar 17, 2023, at 3:32 PM, Eirik Bjorsnos <duke at openjdk.org<mailto:duke at openjdk.org>> wrote: On Fri, 17 Mar 2023 18:36:02 GMT, Eirik Bjorsnos <duke at openjdk.org<mailto:duke at openjdk.org>> wrote: Please review this PR which brings the currenly problem-listed test TestLocOffsetFromZip64EF back to life. Instead of calling out to the `zip` command line, we produce a small-sized Zip64 file in the test itself. This file has the features required to reproduce the ZipFileSystem issue, namely that the 'INFO-ZIP extended timestamp' field must come before the 'Zip64 extended information' field. (This would casue ZipFileSystem to read the LOC at position 0xFFFFFFFF). This speed up the test (from 50s to 3s on my Macbook Pro), saves 4GB disk space during builds removes a dependency on the `zip` command in OS/distros. Seee [JDK-8301183](https://bugs.openjdk.org/browse/JDK-8301183) for details on the problem-listing. Eirik Bjorsnos has updated the pull request incrementally with one additional commit since the last revision: Add extra entry as a sanity check. Extract some constants for sizes and extract method makeOpaqueExtraField to allow some more in-depth commenting. Move extra field processing until after the LOC headers are written, such that this information only occurs in the CEN Something worth mentioning: This test currently does not check that the last modified time stamp is actually the one read from the LOC and not from the CEN. And there is a discrepancey here, `ZipFileSystem` reads it from the LOC, while `ZipFile` reads it from the CEN. We could test this by setting different last modified timestamps in the LOC and CEN and use assertEquals to verify that the expected time stamp is parsed. @LanceAndersen Do you think this is worth pursuing in this PR? I have a sketch ready, but I fear will clutter the test with too many concerns. Perhaps better to test in a separate test which could be independent from Zip64 and local header offset concerns? Hi Eirik, Please treat the timestamp issue above separately so we are not morphing issues. Best ------------- PR: https://git.openjdk.org/jdk/pull/12979 [cid:E1C4E2F0-ECD0-4C9D-ADB4-B16CA7BCB7FC at home] Lance Andersen | Principal Member of Technical Staff | +1.781.442.2037 -------------- next part -------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good overall. Please see the comment below and then I can run the test one more time and you should be good to go
byte[] zip64 = makeOpaqueExtraField(); | ||
e.setExtra(zip64); | ||
|
||
zo.finish(); // Write out CEN and END records |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might add one more entry after this one, perhaps with a compression method of DEFLATED
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a third DEFLATED entry, added asserts that ZipFile and ZipFileSystem finds the expected entries. The LOC traversal no longer works, so the CEN offset is now looked up from the EOC header instead.
@eirbjo This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 183 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
…ND header instead of skipping LOC headers. Add asserts that verifies that the ZIP has the expected entries.
@eirbjo This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
@eirbjo This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the |
/open |
@eirbjo This pull request is now open |
This PR was approved by @LanceAndersen in March 2023, but for some reason it was never integrated. Most probably I was waiting for a final approval following the last review comments from Lance. The test was rewritten to JUnit in October 2023. The test is useful in that it covers an aspect sensitive to the order of ZIP64 extended fields which is produced by Info-ZIP, but which ZipOutputStream cannot produce. I think it would be good to bring this problem-listed test back to life and also make it platform independent. |
/issue add 8301183 |
@eirbjo |
Hello Eirik, before integrating this I'll run this against a specific version of OS to verify that the test that is being removed from the problem listing does indeed work fine on that OS. Given that we no longer depend on a OS specific tool to generate the zip, I think it should work fine, but I'll do one run to be sure. |
I ran this test-only change against our CI plus specifically against Oracle Linux 9, where this test was previously failing. All tests passed. So this looks good to me. |
Thanks @jaikiran. This PR was approved by Lance March last year. There have been some minor updates after that. It would be good to see an approval of this PR in its current state before I integrate. |
/integrate |
Going to push as commit 7004c27.
Your commit was automatically rebased without conflicts. |
Please review this PR which brings the currenly problem-listed test TestLocOffsetFromZip64EF back to life.
Instead of calling out to the
zip
command line, we produce a small-sized Zip64 file in the test itself. This file has the features required to reproduce the ZipFileSystem issue, namely that the 'INFO-ZIP extended timestamp' field must come before the 'Zip64 extended information' field. (This would casue ZipFileSystem to read the LOC at position 0xFFFFFFFF).This speed up the test (from 50s to 3s on my Macbook Pro), saves 4 GB disk space during builds and removes a brittle dependency on the
zip
command in OS/distros.Seee JDK-8301183 for details on the problem-listing.
Progress
Issues
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/12979/head:pull/12979
$ git checkout pull/12979
Update a local copy of the PR:
$ git checkout pull/12979
$ git pull https://git.openjdk.org/jdk.git pull/12979/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 12979
View PR using the GUI difftool:
$ git pr show -t 12979
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/12979.diff
Webrev
Link to Webrev Comment