Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

7036144: GZIPInputStream readTrailer uses faulty available() test for end-of-stream #17113

Closed
wants to merge 10 commits into from

Conversation

archiecobbs
Copy link
Contributor

@archiecobbs archiecobbs commented Dec 14, 2023

GZIPInputStream, when looking for a concatenated stream, relies on what the underlying InputStream says is how many bytes are available(). But this is inappropriate because InputStream.available() is just an estimate and is allowed (for example) to always return zero.

The fix is to ignore what's available() and just proceed and see what happens. If fewer bytes are available than required, the attempt to extend to another stream is canceled just as it was before, e.g., when the next stream header couldn't be read.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change requires CSR request JDK-8327489 to be approved

Issues

  • JDK-7036144: GZIPInputStream readTrailer uses faulty available() test for end-of-stream (Bug - P4)
  • JDK-8327489: Make GZIPInputStream no longer rely on available() for end-of-stream test (CSR)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/17113/head:pull/17113
$ git checkout pull/17113

Update a local copy of the PR:
$ git checkout pull/17113
$ git pull https://git.openjdk.org/jdk.git pull/17113/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 17113

View PR using the GUI difftool:
$ git pr show -t 17113

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/17113.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Dec 14, 2023

👋 Welcome back acobbs! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Dec 14, 2023

@archiecobbs The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the core-libs core-libs-dev@openjdk.org label Dec 14, 2023
@archiecobbs archiecobbs marked this pull request as ready for review December 14, 2023 20:41
@openjdk openjdk bot added the rfr Pull request is ready for review label Dec 14, 2023
@mlbridge
Copy link

mlbridge bot commented Dec 14, 2023

Copy link
Contributor

@eirbjo eirbjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test could benefit from a conversion to JUnit. Not sure I love final local variables, I see the split assignment inside the try/catch makes it useful, but perhaps if you rewrite countBytes as suggested, final will be less useful.

test/jdk/java/util/zip/GZIP/GZIPInputStreamAvailable.java Outdated Show resolved Hide resolved
test/jdk/java/util/zip/GZIP/GZIPInputStreamAvailable.java Outdated Show resolved Hide resolved
test/jdk/java/util/zip/GZIP/GZIPInputStreamAvailable.java Outdated Show resolved Hide resolved
Copy link
Contributor

@eirbjo eirbjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor suggestions.

test/jdk/java/util/zip/GZIP/GZIPInputStreamAvailable.java Outdated Show resolved Hide resolved
test/jdk/java/util/zip/GZIP/GZIPInputStreamAvailable.java Outdated Show resolved Hide resolved
test/jdk/java/util/zip/GZIP/GZIPInputStreamAvailable.java Outdated Show resolved Hide resolved
test/jdk/java/util/zip/GZIP/GZIPInputStreamAvailable.java Outdated Show resolved Hide resolved
Copy link
Contributor

@eirbjo eirbjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is shaping up nicely. Since it's a new test it should use JUnit 5.

Also included a couple suggestions, I'll stop now, promise! :)

test/jdk/java/util/zip/GZIP/GZIPInputStreamAvailable.java Outdated Show resolved Hide resolved
test/jdk/java/util/zip/GZIP/GZIPInputStreamAvailable.java Outdated Show resolved Hide resolved
test/jdk/java/util/zip/GZIP/GZIPInputStreamAvailable.java Outdated Show resolved Hide resolved
test/jdk/java/util/zip/GZIP/GZIPInputStreamAvailable.java Outdated Show resolved Hide resolved
test/jdk/java/util/zip/GZIP/GZIPInputStreamAvailable.java Outdated Show resolved Hide resolved
test/jdk/java/util/zip/GZIP/GZIPInputStreamAvailable.java Outdated Show resolved Hide resolved
@archiecobbs
Copy link
Contributor Author

The test is shaping up nicely. Since it's a new test it should use JUnit 5.

Also included a couple suggestions, I'll stop now, promise! :)

No prob - they're all reasonable suggestions :)

@ecki
Copy link
Contributor

ecki commented Dec 16, 2023

Misuse of available() is a petpeave of mine, so I am happy to see that change and test. I wonder: if the stream does no longer depend on this available() condition to be true, does that mean it’s no longer (indirectly) verified? That behavior might not be guaranteed, but it’s still often relied upon. (I guess injecting a return 0 into the current implementation and running the tests would tell us if another test catches it?)

@archiecobbs
Copy link
Contributor Author

I wonder: if the stream does no longer depend on this available() condition to be true, does that mean it’s no longer (indirectly) verified?

I'm not sure I understand ... what do you mean by "verified"?

If what you're saying is "Previously we were implicitly verifying that the data reported by available() was actually there, and now we're no longer verifying that" then that's not correct. The code that was previously conditional on an available() check was just catching and discarding IOException, so even if available() was lying before, that would not have ever been detected.

@eirbjo
Copy link
Contributor

eirbjo commented Dec 17, 2023

The current behavior of allowing/ignoring trailing malformed data seems to have a complicated history:

  • GZipInputStream was updated to throw ZipExeption instead of IOException on malformed GZIP data in 4263582
  • The ability to read concatenated GZ files was added in JDK-4691425 This change interestingly also introduced the current behavior of ignoring any trailing malformed data in the stream.
  • 7021870 fixed a bug where GZipInputStream closed the underlying input stream. The change also introduced the test GZIPInZip which verified that reads from a wrapped ZipInputStream does not close the stream
  • Some months later GZIPInZip was updated in fix a test failure, but the change also modified the test to verifiy that malformed trailing data was ignored. The JBS issue is not available to me: JDK-8023431
  • Soon after this, GZIPInZip was again updated to fix test failure, this time removing the use of piped streams and threads. The JBS issue is not available to me: JDK-8026756

The current behavior of ignoring trailing malformed data does not seem to be specified in the API. On the contrary, the read methods are specified to throw ZipException for corrupt input data:

     * @throws    ZipException if the compressed input data is corrupt.
     * @throws    IOException if an I/O error has occurred.
     *
     */
    public int read(byte[] buf, int off, int len) throws IOException {

Not sure whether it is worthwhile to change this long-standing behavior of GZIpInputStream. But it could perhaps be noted somehow in the API documentation? (To be clear, that would be for a different PR/issue/CSR)

@archiecobbs
Copy link
Contributor Author

The current behavior of allowing/ignoring trailing malformed data seems to have a complicated history...

Thanks for researching all of that. I agree this should be cleaned up and have created JDK-8322256 to that end.

@ecki
Copy link
Contributor

ecki commented Dec 19, 2023

If what you're saying is "Previously we were implicitly verifying that the data reported by available() was actually there, and now we're no longer verifying that" then that's not correct.

I mean it verified the non-zero behavior, not that the data length was correct. Not sure if that is somewhere tested now.

@archiecobbs
Copy link
Contributor Author

If what you're saying is "Previously we were implicitly verifying that the data reported by available() was actually there, and now we're no longer verifying that" then that's not correct.

I mean it verified the non-zero behavior, not that the data length was correct. Not sure if that is somewhere tested now.

Ok gotcha.

The test GZIPInputStreamRead.java verifies that available() returns zero once all of the compressed data has been read out. So this verifies available() at the end of a stream. It doesn't appear there are any tests which verify available() in the middle of a stream. Adding such a test would be a great idea but is beyond the scope of this bug of course.

In any case, I don't think the code that was there before was providing much in the way of implicit testing of available() either:

// try concatenated case
if (this.in.available() > 0 || n > 26) {
    int m = 8;                  // this.trailer
    try {
        m += readHeader(in);    // next.header
    } catch (IOException ze) {
        return true;  // ignore any malformed, do nothing
    }
    inf.reset();
    if (n > m)
        inf.setInput(buf, len - n + m, n - m);
    return false;
}
return true;

As you can see, in the previous version, when available() > 0, there would be no noticeable side effect if there was actually less data than that ("do nothing").

@bridgekeeper
Copy link

bridgekeeper bot commented Jan 16, 2024

@archiecobbs This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@archiecobbs
Copy link
Contributor Author

/pingbot

@openjdk
Copy link

openjdk bot commented Jan 16, 2024

@archiecobbs Unknown command pingbot - for a list of valid commands use /help.

@lbergelson
Copy link

lbergelson commented Feb 23, 2024

Hello, I just wanted to check if there was anything way I could help move this along.

@jaikiran
Copy link
Member

Hello Archie, the proposal to not depend on the available() method of the underlying InputStream to decide whether to read additional bytes from the underlying stream to detect the "next" header seems reasonable.

What's being proposed here is that we proceed and read the underlying stream's few additional bytes to detect the presence or absence of a GZIP member header and if that attempt fails (with an IOException) then we consider that we have reached the end of GZIP stream and just return back.

For this change, I think we would also need to consider whether we should "unread" the read bytes from the InputStream if those don't correspond to a "next" GZIP member header. That way any underlying InputStream which was implemented in a way that it would return availability as 0 when it knew that the GZIP stream was done and yet had additional (non GZIP) data to read on the underlying stream, would still be able to read that data after this change. It's arguable whether we should have been doing that "unread" even when we were doing the available() > 0 check and the decision that comes out of https://bugs.openjdk.org/browse/JDK-8322256 might cover that.

@archiecobbs
Copy link
Contributor Author

archiecobbs commented Feb 26, 2024

Hi @jaikiran,

I agree with your comments. My only question is whether we should do all of this in one stage or two stages.

My initial thought is to do this in two stages:

  • A narrow fix for the bug described here as implemented by this PR
  • A larger change (requiring a separate bug, CSR, and PR) to:
    • More precisely define and specify the expected behavior, with support for concatenated streams
    • Eliminate situations where we read beyond the end-of-stream (i.e., "unreading" if/when necessary)

The reason I think this two stage approach is appropriate is because there is no downside to doing it this way - that is, the problem you describe of reading beyond the end-of-stream is already a problem in the current code, with the exception of the one corner case where this bug fix applies, namely, when in.available() returns zero and yet there actually is more data available.

Your thoughts?

Edited to add: I said "already a problem in the current code" but should clarify: what I mean is, suppose some clever InputStream were never letting available() return more than the number of compressed bytes remaining. That strategy would not be reliable, because not all reads from the underlying stream are gated with a check of what's available() (for example, see InflaterInputStream.fill()). So I don't think we need to worry about breaking such a case because it's already broken for other reasons.

@mlbridge
Copy link

mlbridge bot commented Mar 8, 2024

Mailing list message from Archie Cobbs on core-libs-dev:

On Thu, Mar 7, 2024 at 2:20?PM Louis Bergelson <duke at openjdk.org> wrote:

`GZIPInputStream`, when looking for a concatenated stream, relies on
what the underlying `InputStream` says is how many bytes are `available()`.
But this is inappropriate because `InputStream.available()` is just an
estimate and is allowed (for example) to always return zero.

Hello, I just wanted to check if there was anything way I could help move
this along.

Hi Louis,

Thanks for offering to help. The process is slow but moving forward and we
are getting close. Because there are implications to the specified behavior
it is being reviewed for backward compatibility, etc.

-Archie

--
Archie L. Cobbs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20240307/0a498948/attachment.htm>

@openjdk openjdk bot removed the csr Pull request needs approved CSR before integration label Mar 8, 2024
@openjdk
Copy link

openjdk bot commented Mar 13, 2024

@archiecobbs This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

7036144: GZIPInputStream readTrailer uses faulty available() test for end-of-stream

Reviewed-by: jpai

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 47 new commits pushed to the master branch:

  • e5e7cd2: 8328386: Convert java/awt/FileDialog/FileNameOverrideTest test to main
  • 1b68c73: 8328377: Convert java/awt/Cursor/MultiResolutionCursorTest test to main
  • e0373e0: 8328378: Convert java/awt/FileDialog/FileDialogForDirectories test to main
  • 03c25b1: 8328367: Convert java/awt/Component/UpdatingBootTime test to main
  • eebcc21: 8328524: [x86] StringRepeat.java failure on linux-x86: Could not reserve enough space for 2097152KB object heap
  • a68f5d7: 8327879: Convert javax/swing/border/Test4760089.java applet test to main
  • 269163d: 8328387: Convert java/awt/Frame/FrameStateTest/FrameStateTest.html applet test to main
  • 7231fd7: 8328350: G1: Remove DO_DISCOVERED_AND_DISCOVERY
  • 4f8f486: 8328012: Convert InputMethod (/java/awt/im) applet tests to main
  • 9ca4ae3: 8328264: AArch64: remove UseNeon condition in CRC32 intrinsic
  • ... and 37 more: https://git.openjdk.org/jdk/compare/d32ce65781c1d7815a69ceac720cdf3ae39caa9e...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@eirbjo, @jaikiran) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@jaikiran
Copy link
Member

The CSR for this has been approved. I will be running one final round of CI tests with this change and if those tests complete without related issues, I'll go ahead and approve this.

@jaikiran
Copy link
Member

jaikiran commented Mar 20, 2024

Hello Archie, tier1, tier2, tier3 completed without any related failures. I also ran JCK tests related to this class and those too passed. I've also taken Lance's inputs on this PR and he agrees that this look OK. I'll go ahead and approve this now. Thank you for fixing this, as well as your patience during the review.

@openjdk
Copy link

openjdk bot commented Mar 20, 2024

@jaikiran Only the author (@archiecobbs) is allowed to issue the integrate command.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Mar 20, 2024
@archiecobbs
Copy link
Contributor Author

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Mar 20, 2024
@openjdk
Copy link

openjdk bot commented Mar 20, 2024

@archiecobbs
Your change (at version d567cef) is now ready to be sponsored by a Committer.

@jaikiran
Copy link
Member

/sponsor

@openjdk
Copy link

openjdk bot commented Mar 20, 2024

Going to push as commit d3f3011.
Since your change was applied there have been 47 commits pushed to the master branch:

  • e5e7cd2: 8328386: Convert java/awt/FileDialog/FileNameOverrideTest test to main
  • 1b68c73: 8328377: Convert java/awt/Cursor/MultiResolutionCursorTest test to main
  • e0373e0: 8328378: Convert java/awt/FileDialog/FileDialogForDirectories test to main
  • 03c25b1: 8328367: Convert java/awt/Component/UpdatingBootTime test to main
  • eebcc21: 8328524: [x86] StringRepeat.java failure on linux-x86: Could not reserve enough space for 2097152KB object heap
  • a68f5d7: 8327879: Convert javax/swing/border/Test4760089.java applet test to main
  • 269163d: 8328387: Convert java/awt/Frame/FrameStateTest/FrameStateTest.html applet test to main
  • 7231fd7: 8328350: G1: Remove DO_DISCOVERED_AND_DISCOVERY
  • 4f8f486: 8328012: Convert InputMethod (/java/awt/im) applet tests to main
  • 9ca4ae3: 8328264: AArch64: remove UseNeon condition in CRC32 intrinsic
  • ... and 37 more: https://git.openjdk.org/jdk/compare/d32ce65781c1d7815a69ceac720cdf3ae39caa9e...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Mar 20, 2024
@openjdk openjdk bot closed this Mar 20, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Mar 20, 2024
@openjdk
Copy link

openjdk bot commented Mar 20, 2024

@jaikiran @archiecobbs Pushed as commit d3f3011.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@jaikiran
Copy link
Member

Hello Archie, we forgot to create a release note for this one (there's still time). Would you be willing to create one, following the instructions here https://openjdk.org/guide/#release-notes? If you need any help, let us know. One of us will review that release note before you can Resolve it to Delivered.

@archiecobbs
Copy link
Contributor Author

Hi @jaikiran,

No problem - please see JDK-8330995 and let me know if anything else is needed.

@jaikiran
Copy link
Member

Hi @jaikiran,

No problem - please see JDK-8330995 and let me know if anything else is needed.

Thank you Archie. With inputs from Lance, I've updated the text and the summary of the release note as per the guidelines. You can now mark it as "Resolved", "Delivered".

@archiecobbs
Copy link
Contributor Author

With inputs from Lance, I've updated the text and the summary of the release note as per the guidelines. You can now mark it as "Resolved", "Delivered".

Done - thanks!

@lbergelson
Copy link

Is there any chance of backporting this to java 17 or 21? What's involved in doing that?

@archiecobbs
Copy link
Contributor Author

Is there any chance of backporting this to java 17 or 21? What's involved in doing that?

It's a straightforward process but I'm not sure I'm one to judge whether it would be appropriate.

@jaikiran and/or @LanceAndersen - any opnions?

@LanceAndersen
Copy link
Contributor

Is there any chance of backporting this to java 17 or 21? What's involved in doing that?

It's a straightforward process but I'm not sure I'm one to judge whether it would be appropriate.

@jaikiran and/or @LanceAndersen - any opnions?

I am not convinced this is a must have for a backport. Please see the OpenJDK developers guide regarding bacports.. Also note this would require a CSR for each backport

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

6 participants