-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
7036144: GZIPInputStream readTrailer uses faulty available() test for end-of-stream #17113
Conversation
👋 Welcome back acobbs! A progress list of the required criteria for merging this PR into |
@archiecobbs The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test could benefit from a conversion to JUnit. Not sure I love final local variables, I see the split assignment inside the try/catch makes it useful, but perhaps if you rewrite countBytes as suggested, final will be less useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test is shaping up nicely. Since it's a new test it should use JUnit 5.
Also included a couple suggestions, I'll stop now, promise! :)
No prob - they're all reasonable suggestions :) |
Misuse of available() is a petpeave of mine, so I am happy to see that change and test. I wonder: if the stream does no longer depend on this available() condition to be true, does that mean it’s no longer (indirectly) verified? That behavior might not be guaranteed, but it’s still often relied upon. (I guess injecting a return 0 into the current implementation and running the tests would tell us if another test catches it?) |
I'm not sure I understand ... what do you mean by "verified"? If what you're saying is "Previously we were implicitly verifying that the data reported by |
The current behavior of allowing/ignoring trailing malformed data seems to have a complicated history:
The current behavior of ignoring trailing malformed data does not seem to be specified in the API. On the contrary, the read methods are specified to throw ZipException for corrupt input data:
Not sure whether it is worthwhile to change this long-standing behavior of GZIpInputStream. But it could perhaps be noted somehow in the API documentation? (To be clear, that would be for a different PR/issue/CSR) |
Thanks for researching all of that. I agree this should be cleaned up and have created JDK-8322256 to that end. |
I mean it verified the non-zero behavior, not that the data length was correct. Not sure if that is somewhere tested now. |
Ok gotcha. The test In any case, I don't think the code that was there before was providing much in the way of implicit testing of // try concatenated case
if (this.in.available() > 0 || n > 26) {
int m = 8; // this.trailer
try {
m += readHeader(in); // next.header
} catch (IOException ze) {
return true; // ignore any malformed, do nothing
}
inf.reset();
if (n > m)
inf.setInput(buf, len - n + m, n - m);
return false;
}
return true; As you can see, in the previous version, when |
@archiecobbs This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
/pingbot |
@archiecobbs Unknown command |
Hello, I just wanted to check if there was anything way I could help move this along. |
Hello Archie, the proposal to not depend on the What's being proposed here is that we proceed and read the underlying stream's few additional bytes to detect the presence or absence of a GZIP member header and if that attempt fails (with an IOException) then we consider that we have reached the end of GZIP stream and just return back. For this change, I think we would also need to consider whether we should "unread" the read bytes from the |
Hi @jaikiran, I agree with your comments. My only question is whether we should do all of this in one stage or two stages. My initial thought is to do this in two stages:
The reason I think this two stage approach is appropriate is because there is no downside to doing it this way - that is, the problem you describe of reading beyond the end-of-stream is already a problem in the current code, with the exception of the one corner case where this bug fix applies, namely, when Your thoughts? Edited to add: I said "already a problem in the current code" but should clarify: what I mean is, suppose some clever |
Mailing list message from Archie Cobbs on core-libs-dev: On Thu, Mar 7, 2024 at 2:20?PM Louis Bergelson <duke at openjdk.org> wrote:
Hi Louis, Thanks for offering to help. The process is slow but moving forward and we -Archie -- |
@archiecobbs This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 47 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@eirbjo, @jaikiran) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
The CSR for this has been approved. I will be running one final round of CI tests with this change and if those tests complete without related issues, I'll go ahead and approve this. |
Hello Archie, tier1, tier2, tier3 completed without any related failures. I also ran JCK tests related to this class and those too passed. I've also taken Lance's inputs on this PR and he agrees that this look OK. I'll go ahead and approve this now. Thank you for fixing this, as well as your patience during the review. |
@jaikiran Only the author (@archiecobbs) is allowed to issue the |
/integrate |
@archiecobbs |
/sponsor |
Going to push as commit d3f3011.
Your commit was automatically rebased without conflicts. |
@jaikiran @archiecobbs Pushed as commit d3f3011. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
Hello Archie, we forgot to create a release note for this one (there's still time). Would you be willing to create one, following the instructions here https://openjdk.org/guide/#release-notes? If you need any help, let us know. One of us will review that release note before you can Resolve it to Delivered. |
Hi @jaikiran, No problem - please see JDK-8330995 and let me know if anything else is needed. |
Thank you Archie. With inputs from Lance, I've updated the text and the summary of the release note as per the guidelines. You can now mark it as "Resolved", "Delivered". |
Done - thanks! |
Is there any chance of backporting this to java 17 or 21? What's involved in doing that? |
It's a straightforward process but I'm not sure I'm one to judge whether it would be appropriate. @jaikiran and/or @LanceAndersen - any opnions? |
I am not convinced this is a must have for a backport. Please see the OpenJDK developers guide regarding bacports.. Also note this would require a CSR for each backport |
GZIPInputStream
, when looking for a concatenated stream, relies on what the underlyingInputStream
says is how many bytes areavailable()
. But this is inappropriate becauseInputStream.available()
is just an estimate and is allowed (for example) to always return zero.The fix is to ignore what's
available()
and just proceed and see what happens. If fewer bytes are available than required, the attempt to extend to another stream is canceled just as it was before, e.g., when the next stream header couldn't be read.Progress
Issues
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/17113/head:pull/17113
$ git checkout pull/17113
Update a local copy of the PR:
$ git checkout pull/17113
$ git pull https://git.openjdk.org/jdk.git pull/17113/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 17113
View PR using the GUI difftool:
$ git pr show -t 17113
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/17113.diff
Webrev
Link to Webrev Comment