-
Notifications
You must be signed in to change notification settings - Fork 5.8k
8303923: ZipOutStream::putEntry should include an apiNote to indicate that the STORED compression method should be used when writing directory entries #12899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…compression method and have size and crc set to 0.
👋 Welcome back eirbjo! A progress list of the required criteria for merging this PR into |
Suggested CSR: Compatibility kindnone Compatibility riskminimal Compatibility descriptionThis is a documentation-only change SummaryAdd an ProblemZipOutputStream currently writes directory entries using the default DEFLATE method. This causes file data for a two-byte 'final empty' DEFLATE block to be written, followed by a 16-byte data descriptor. This is in violation of the APPNOTE.txt specification, which mandates that directory entries
Additionally, benchmarks show that the writing of these empty DEFLATED directory entries are ~10X slower compared to an empty STORED entry. While the SolutionAdd an (As an alternative solution, Specification
|
/issue 8303923 |
/csr |
@eirbjo The primary solved issue for a PR is set through the PR title. Since the current title does not contain an issue reference, it will now be updated. |
@eirbjo this pull request will not be integrated until the CSR request JDK-8303925 for issue JDK-8303923 has been approved. |
Looking for reviewers for this CSR which adds an https://bugs.openjdk.org/browse/JDK-8303925 The CSR was initially written by me, then edited by Lance. |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Eirik,
Thank you for starting the review of your proposed clarification. A couple of minor comments. Also the copyright should be changed to 2023 with your next update.
* should be used and the size and CRC-32 values should be set to 0: | ||
* | ||
* {@snippet lang="java" : | ||
* ZipEntry e = ...; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make this an actual value as the snippet should be valid code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially used a directory as an example: ZipEntry e = new ZipEntry("dir/")
, but then @AlanBateman suggested to make it more generic:
https://mail.openjdk.org/pipermail/core-libs-dev/2023-March/101686.html
For the note then you might want to change it to "ZipEntry e = ..." because the reader see the trailing slash after dir so it is obviously a directory.
I agree valid code would be preferrable. Not sure how to make it valid though, since the isDirectory
part suggests the path is unknown. Could an undefined local variable work?
ZipEntry e = new ZipEntry(name);
Or perhaps with a comment:
ZipEntry e = new ZipEntry(name); // name could be a file or directory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opted for ZipEntry e = new ZipEntry(entryName)
for now.
* } | ||
* | ||
* This ensures strict compliance with the ZIP specification and | ||
* allows optimal performance when processing directory entries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove at least the first part of the sentence regarding "strict compliance" as "file data" can be interpreted as the contents of the file as 4.1.3 of the App.Note allows for the default compression method to be DEFAULT. The intent of the apiNote is to remind developers that the use of the STORED compression method is preferred and may be more optimal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4.3.8 File data
Immediately following the local header for a file
SHOULD be placed the compressed or stored data for the file.
If the file is encrypted, the encryption header for the file
SHOULD be placed after the local header and before the file
data. The series of [local file header][encryption header]
[file data][data descriptor] repeats for each file in the
.ZIP archive.
Zero-byte files, directories, and other file types that
contain no content MUST NOT include file data.
My interpretation of section 4.3.8 is that 'file data' in the last sentence of this section refers to what is defined in the first sentence: 'Immediately following the local header for a file SHOULD be placed the compressed or stored data for the file.'
While I'm not sure I fully understand you interpretation, it seems you are saying that a DEFLATED entry with the two-byte 'empty final' DEFLATE blocks does not have file data? (Because it is just an encoding of 'no content')?
In any case, I'm happy to remove this since as it stands it is a bit vague and as we've seen open for interpretation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove at least the first part of the sentence regarding "strict compliance" as "file data" can be interpreted as the contents of the file as 4.1.3 of the App.Note allows for the default compression method to be DEFAULT.
4.1.3 Data compression MAY be used to reduce the size of files
placed into a ZIP file, but is not required. This format supports the
use of multiple data compression algorithms. When compression is used,
one of the documented compression algorithms MUST be used. Implementors
are advised to experiment with their data to determine which of the
available algorithms provides the best compression for their needs.
Compression method 8 (Deflate) is the method used by default by most
ZIP compatible application programs.
I think we read 4.1.3 slightly differently. I read it as saying that WHEN data compression is used, 8 (Deflate) is the method used by default by most ZIP compatible application programs. STORED entries are not using data compression, and as such do not have any 'default compression method'. 4.1.3 does not apply to them.
Imagine if jcheck could remind people of this, reviewers would be out of a job :-) |
…n strict compliance with APPNOTE.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for making the changes. I think this looks better.
I go back and forth as to whether to include the sentence regarding performance but I think it is OK. Let's see if anyone else has thoughts prior to finalizing the CSR with this change
I understand one can look at this differently. For me, winning friends and influencing people has always seemed easier when I provide an answer for 'What's in it for me?'. Here, I wanted to flip the reader's state of mind from 'hmm!?' to 'hah!'. |
The example code works without setting the compressed size on the entry? Looks like there is a check in ZipOutputStream::putNextEntry at
This must work because of the check at
|
Yes, this is the minimal code required and is also how the The current behaviour does feel a bit underspecified though. In the Perhaps a similar note should be addded to @LanceAndersen what do you think? |
More thought needs to be given to a clarification as any validation of the ZipEntry values once set are done in ZipOutPutStream. There is some validation done in ZipEntry::setSize and ZipEntry::setCRC but nothing to validate the note you point out in ZipEntry::getCompressedSize. So yes, we should probably add further clarification but lets address as a separate issue |
I moved the CSR for this PR into the Proposed state. The CSR is now looking for reviewers. |
@eirbjo This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 573 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@LanceAndersen, @AlanBateman) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
/integrate This PR is now approved and integrated and is looking for kind sponsors. |
/sponsor |
Going to push as commit 425ef06.
Your commit was automatically rebased without conflicts. |
@LanceAndersen @eirbjo Pushed as commit 425ef06. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
ZipOutputStream currently writes directory entries using the DEFLATED compression method. This does not strictly comply with the APPNOTE.TXT specification and is also about 10x slower than using the STORED compression method.
Because of these concerns,
ZipOutputStream.putNextEntry
should be updated with an@apiNote
recommendingthe use of the STORED compression method for directory entries.
Suggested CSR in the first comment.
Progress
Issues
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/12899/head:pull/12899
$ git checkout pull/12899
Update a local copy of the PR:
$ git checkout pull/12899
$ git pull https://git.openjdk.org/jdk.git pull/12899/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 12899
View PR using the GUI difftool:
$ git pr show -t 12899
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/12899.diff
Webrev
Link to Webrev Comment