Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8282648: Weaken the InflaterInputStream specification in order to allow faster Zip implementations #7986

Closed

Conversation

simonis
Copy link
Member

@simonis simonis commented Mar 28, 2022

Add an API note to InflaterInputStream::read(byte[] b, int off, int len) to highlight that it might write more bytes than the returned number of inflated bytes into the buffer b.

The superclass java.io.InputStream specifies that read(byte[] b, int off, int len) will leave the content beyond the last read byte in the read buffer b unaffected. However, the overridden read method in InflaterInputStream passes the read buffer b to Inflater::inflate(byte[] b, int off, int len) which doesn't provide this guarantee. Depending on implementation details, Inflater::inflate might write more than the returned number of inflated bytes into the buffer b.

TL;DR

java.util.zip.Inflater is the Java wrapper class for zlib's inflater functionality. Inflater::inflate(byte[] output, int off, int len) currently calls zlib's native inflate(..) function and passes the address of output[off] and len to it via JNI.

The specification of zlib's inflate(..) function (i.e. the API documentation in the original zlib implementation) doesn't give any guarantees with regard to usage of the output buffer. It only states that upon completion the function will return the number of bytes that have been written (i.e. "inflated") into the output buffer.

The original zlib implementation only wrote as many bytes into the output buffer as it inflated. However, this is not a hard requirement and newer, more performant implementations of the zlib library like zlib-chromium or zlib-cloudflare can use more bytes of the output buffer than they actually inflate as a scratch buffer. See https://github.com/simonis/zlib-chromium for a more detailed description of their approach and its performance benefit.

These new zlib versions can still be used transparently from Java (e.g. by putting them into the LD_LIBRARY_PATH or by using LD_PRELOAD), because they still fully comply to specification of Inflater::inflate(..). However, we might run into problems when using the Inflater functionality from the InflaterInputStream class. InflaterInputStream is derived from from InputStream and as such, its read(byte[] b, int off, int len) method is quite constrained. It specifically specifies that if k bytes have been read, then "these bytes will be stored in elements b[off] through b[off+k-1], leaving elements b[off+k] through b[off+len-1] unaffected". But InflaterInputStream::read(byte[] b, int off, int len) (which is constrained by InputStream::read(..)'s specification) calls Inflater::inflate(byte[] b, int off, int len) and directly passes its output buffer down to the native zlib inflate(..) method which is free to change the bytes beyond b[off+k] (where k is the number of inflated bytes).

From a practical point of view, I don't see this as a big problem, because callers of InflaterInputStream::read(byte[] b, int off, int len) can never know how many bytes will be written into the output buffer b (and in fact its content can always be completely overwritten). It therefore makes no sense to depend on any data there being untouched after the call. Also, having used zlib-cloudflare productively for about two years, we haven't seen real-world issues because of this behavior yet. However, from a specification point of view it is easy to artificially construct a program which violates InflaterInputStream::read(..)'s postcondition if using one of the alterantive zlib implementations. A recently integrated JTreg test (test/jdk/jdk/nio/zipfs/ZipFSOutputStreamTest.java) "unintentionally" fails with zlib-chromium but can fixed easily to run with alternative implementations as well (see JDK-8283756).


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change requires a CSR request to be approved

Issues

  • JDK-8282648: Weaken the InflaterInputStream specification in order to allow faster Zip implementations
  • JDK-8283758: Weaken the InflaterInputStream specification in order to allow faster Zip implementations (CSR)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/7986/head:pull/7986
$ git checkout pull/7986

Update a local copy of the PR:
$ git checkout pull/7986
$ git pull https://git.openjdk.org/jdk pull/7986/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 7986

View PR using the GUI difftool:
$ git pr show -t 7986

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/7986.diff

@simonis
Copy link
Member Author

simonis commented Mar 28, 2022

/csr

@bridgekeeper
Copy link

bridgekeeper bot commented Mar 28, 2022

👋 Welcome back simonis! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added rfr Pull request is ready for review csr Pull request needs approved CSR before integration labels Mar 28, 2022
@openjdk
Copy link

openjdk bot commented Mar 28, 2022

@simonis has indicated that a compatibility and specification (CSR) request is needed for this pull request.
@simonis please create a CSR request for issue JDK-8283758. This pull request cannot be integrated until the CSR request is approved.

@openjdk
Copy link

openjdk bot commented Mar 28, 2022

@simonis The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the core-libs core-libs-dev@openjdk.org label Mar 28, 2022
@mlbridge
Copy link

mlbridge bot commented Mar 28, 2022

@LanceAndersen
Copy link
Contributor

Hi Volker,

I believe your PR should point to the JBS issue in the title, which references the CSR and not the CSR directly in the title.

@simonis simonis force-pushed the JDK-8282648-InflaterInputStream_read_doc branch from 128166f to b55fc33 Compare March 28, 2022 10:49
@simonis simonis changed the title 8283758: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..) 8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..) Mar 28, 2022
@simonis
Copy link
Member Author

simonis commented Mar 28, 2022

Hi Volker,

I believe your PR should point to the JBS issue in the title, which references the CSR and not the CSR directly in the title.

Sorry, you're right of course :)

@jaikiran
Copy link
Member

Hello Volker,
An additional thing that we might have to consider here is whether adding this javadoc change to InflaterInputStream is ever going to "show up" to end user applications.
What I mean is, I think in many cases the end user applications won't even know they are dealing with an InflaterInputStream. For example, the following code:

ZipFile zf = ...
ZipEntry ze = zf.getEntry("some-file");
InputStream is = zf.getInputStream(ze);

As we see above, none of these APIs talk about InflaterInputStream (the return type of ZipFile.getInpustream(...) is an InputStream). So end users won't be able to view this spec change. Perhaps we should also add some note in the ZipFile.getInpustream(....) API to make a mention of this potential difference in behaviour of the returned stream?

@simonis
Copy link
Member Author

simonis commented Mar 29, 2022

Hello Volker, An additional thing that we might have to consider here is whether adding this javadoc change to InflaterInputStream is ever going to "show up" to end user applications. What I mean is, I think in many cases the end user applications won't even know they are dealing with an InflaterInputStream. For example, the following code:

ZipFile zf = ...
ZipEntry ze = zf.getEntry("some-file");
InputStream is = zf.getInputStream(ze);

As we see above, none of these APIs talk about InflaterInputStream (the return type of ZipFile.getInpustream(...) is an InputStream). So end users won't be able to view this spec change. Perhaps we should also add some note in the ZipFile.getInpustream(....) API to make a mention of this potential difference in behaviour of the returned stream?

You are right with your observation and I'll be happy to add a corresponding comment if @LanceAndersen and @AlanBateman agree. Please let me know what you think?

@RogerRiggs
Copy link
Contributor

One other way to communicate changes is in the release-note. I added release-note=yes to the bug.

@LanceAndersen
Copy link
Contributor

Hello Volker, An additional thing that we might have to consider here is whether adding this javadoc change to InflaterInputStream is ever going to "show up" to end user applications. What I mean is, I think in many cases the end user applications won't even know they are dealing with an InflaterInputStream. For example, the following code:

ZipFile zf = ...
ZipEntry ze = zf.getEntry("some-file");
InputStream is = zf.getInputStream(ze);

As we see above, none of these APIs talk about InflaterInputStream (the return type of ZipFile.getInpustream(...) is an InputStream). So end users won't be able to view this spec change. Perhaps we should also add some note in the ZipFile.getInpustream(....) API to make a mention of this potential difference in behaviour of the returned stream?

You are right with your observation and I'll be happy to add a corresponding comment if @LanceAndersen and @AlanBateman agree. Please let me know what you think?

Hi Volker,

I believe Jai raises a valid point given these javadocs probably have had limited updates if any since the API was originally added. We should look at ZipInputStream and GZipInputStream as well if we decide to update the ZipFile::getInputStream(where we could borrow some wording from the ZipInputStream class description as a start to some word smithing).

As Roger points out we will need a release note for this change as well.

@AlanBateman
Copy link
Contributor

A suggestion for the structure is to start the new paragraph by saying it reads n bytes of uncompressed data into b[off] to b[off+n-1]. Then follow to say that the contents of b[n] to b[off+len-1] after the read are are undefined, then say that an implementation is free to ... Finally just finish it by saying that the contents are also undefined when the method fails by throwing an exception.

@simonis
Copy link
Member Author

simonis commented Apr 6, 2022

Hi,

I've pushed a new version which:

  • improves the wording as suggested
  • adds the API amendments from InflaterInputStream::read() to GZIPInputStream::read()/ZipInputStream::read() as well be cause they both inherit from InflaterInputStream. Unfortunately it's not possible to simply inherit the API doc because of different parameter naming.
  • added an @implNote to ZipFile::getInputStream() to make it clear that this implementation really returns an InflaterInputStream.

Please let me know what you think?
Volker

@jaikiran
Copy link
Member

jaikiran commented Apr 8, 2022

The GZIPInputStream and ZipInputStream files will need a copyright year update.

Copy link
Contributor

@LanceAndersen LanceAndersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Volker,

I think this reads much better. Its too bad we cannot take advantage of @inheritedDoc

A couple of minor comments below

Copy link
Contributor

@AlanBateman AlanBateman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've addressed my points, the updated javadoc looks good.

Copy link
Contributor

@LanceAndersen LanceAndersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Volker,

The updates look good. Thank you. Next up is updating the CSR with the changes and craft the release-note as a subtask.

Thank you for your efforts here

@simonis
Copy link
Member Author

simonis commented Apr 13, 2022

Thanks @AlanBateman, @LanceAndersen. I've just updated the CSR with the latest wording from this PR. Please feel free to review the CSR as well so I can finalize it.

@simonis simonis changed the title 8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..) 8282648: Weaken the InflaterInputStream specification in order to allow faster Zip implementations Jul 26, 2022
@simonis simonis force-pushed the JDK-8282648-InflaterInputStream_read_doc branch from d62cba4 to 62e25d4 Compare July 27, 2022 10:43
@simonis
Copy link
Member Author

simonis commented Jul 27, 2022

The latest push contains the following changes:

  • changed the summary of the Issue and CSR to "Weaken the InflaterInputStream specification in order to allow faster Zip implementations" as suggested
  • rebased the PR up to change 63ae357 on top of the leatest HEAD revision
  • Applied the same changes to the JavaDoc of JarInputStream::read() like to ones which were added to ZipInputStream::read(). JarInputStream which derives from ZipInputStream and overrides ZipInputStream::read() was just forgotten in the previous itetrations of this change.
  • Added an @apiNote to ZipFile::getInputStream(), JarFile::getInputStream() and URLConnection::getInputStream() to indicate that the returned InputStream might scribble the contents of the output array beyond the last inflated byte. I couldn't find any other public methods in the java.* packages which return InflaterInputStream instances (or derived subclasses) as a generic InputStream.

@openjdk-notifier
Copy link

@simonis Please do not rebase or force-push to an active PR as it invalidates existing review comments. All changes will be squashed into a single commit automatically when integrating. See OpenJDK Developers’ Guide for more information.

@bridgekeeper
Copy link

bridgekeeper bot commented Aug 24, 2022

@simonis This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

Copy link
Member

@mbreinhold mbreinhold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the CSR once you’ve finalized the specification changes.

Copy link
Contributor

@LanceAndersen LanceAndersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Volker,

Thank you for your patience while we worked through all of the nuances of this

The changes look good to me

@simonis
Copy link
Member Author

simonis commented Aug 31, 2022

@mbreinhold, @LanceAndersen thanks for your reviews. I'm waiting for the CSR approval before pushing.

Copy link
Contributor

@AlanBateman AlanBateman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The update in d82c752 looks good.

@openjdk openjdk bot removed the csr Pull request needs approved CSR before integration label Aug 31, 2022
@openjdk
Copy link

openjdk bot commented Aug 31, 2022

@simonis This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8282648: Weaken the InflaterInputStream specification in order to allow faster Zip implementations

Reviewed-by: lancea, alanb, jpai, mr, darcy

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 422 new commits pushed to the master branch:

  • 2d18dda: 8173605: Remove support for source and target 1.7 option in javac
  • 7c2f299: 8293202: Document how to edit doc/testing, doc/building
  • 07616de: 8175382: clhsdb pmap should print the end addresses of the load modules
  • 9444a08: 8290709: Incorrect dominance error for unconditional pattern vs. null
  • 6a1b0b5: 8293154: TemporalQueries java doc error
  • 5204528: 8293201: Library detection in runtime/ErrorHandling/TestDwarf.java fails on some systems
  • 2d10d4f: 8291651: CleanerTest.java fails with "Cleanable was cleaned"
  • bd674dc: 8293163: G1: Rename G1HeapRegionAttr::is_humongous
  • 479795b: 8293164: Remove unimplemented Generation::print_heap_change
  • 6e6202c: 8292407: Improve Weak CAS VarHandle/Unsafe tests resilience under spurious failures
  • ... and 412 more: https://git.openjdk.org/jdk/compare/330adc03a9314b188d05b3f8d06f97826b7a3847...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Aug 31, 2022
Copy link
Contributor

@LanceAndersen LanceAndersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo cleanup looks fine

@simonis
Copy link
Member Author

simonis commented Sep 5, 2022

/integrate

@openjdk
Copy link

openjdk bot commented Sep 5, 2022

Going to push as commit 2c61efe.
Since your change was applied there have been 462 commits pushed to the master branch:

  • e31c537: 8293224: Add link to openjdk.org/jtreg/ from doc/testing
  • 955baa3: 8267374: macOS: Option+Up/Down Arrow don't traverse to beginning/end of line in JTextArea
  • 8df671c: 8293355: JDK-8293167 included bad copyright header
  • 5bed9f7: 8293290: RISC-V: Explicitly pass a third temp register to MacroAssembler::store_heap_oop
  • 48b3ab0: 8293167: Memory leak in JfrThreadSampler if stackdepth is larger than default (64)
  • 4067321: 8291586: Broken links in JVMTI specification
  • 32f4dc8: 8293295: Add type check asserts to java_lang_ref_Reference accessors
  • e945619: 8293088: Fix compilation with the new Visual Studio preprocessor
  • 730ced9: 8292660: C2: blocks made unreachable by NeverBranch-to-Goto conversion are removed incorrectly
  • 3464019: 8292899: CustomTzIDCheckDST.java testcase failed on AIX platform
  • ... and 452 more: https://git.openjdk.org/jdk/compare/330adc03a9314b188d05b3f8d06f97826b7a3847...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Sep 5, 2022
@openjdk openjdk bot closed this Sep 5, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Sep 5, 2022
@openjdk
Copy link

openjdk bot commented Sep 5, 2022

@simonis Pushed as commit 2c61efe.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

7 participants