-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8281962: Avoid unnecessary native calls in InflaterInputStream #7492
Conversation
👋 Welcome back simonis! A progress list of the required criteria for merging this PR into |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me. Benchmark numbers look good.
@simonis This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 99 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
This change is probably okay but will require study to see if there are any side effects (sadly, this area has a history of side effects being reported months and years after a change). Would you mind holding off integrating this change until it has been reviewed by someone that works in the area? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code changes look good to me.
(Shouldn't the copyright header have a year?)
test/micro/org/openjdk/bench/java/util/zip/InflaterInputStreams.java
Outdated
Show resolved
Hide resolved
Thanks Christoph! |
Hi Alan, thanks for looking at this change.
Sure, no problem. But can you please be a little more specific on who you consider to qualify as "works in the area" :) |
The change looks innocuous so it is probably OK. I would like to kick of our Mach5 runs to see if it shakes out any potential issues. From reading the 3rd party problem reports, it appears that the problem is with the 3rd party zlib implementations. Hopefully this change will not mask other issues |
Thanks Lance! Much appreciated.
The problem arises from a difference in the specification of zlib's inflate function and Java's Input stream. Basically, both take a buffer and the length of that buffer as input and return the number of bytes (i.e. payload) written into that buffer (which can be smaller than length). However, zlib doesn't specify that bytes between the returned length and the the buffer length can't be modified while Java does. Mark Adler's original zlib version never wrote more bytes into the buffer than it returned as length value and users of his implementation started to more or less rely on this implementation detail. But newer and improved versions of zlib might write more bytes into the output buffer than they return as length value (e.g. because they use vector instructions for writes). According to zlib's inflate specification this is fine as long as they don't overwrite the end of the buffer and as long as they return the correct length of inflated bytes. In order to make this behavior more evident, Chromium's zlib version puts some extra padding bytes after the last inflated byte if there's enough space left in the buffer (and this happens even if zero bytes were inflated). This behavior becomes particularly harmful if Java's InflaterInputStream unnecessarily calls the native inflate() function just to find out that there's no data left to inflate. With Chromium's zlib this will still write the padding bytes to the beginning of the output buffer and potentially overwrite the inflated output from the last call to InflaterInputStream::read. So to cut a long story short, there's no problem with Chromium's zlib implementation. It behaves correctly according to the zlib specification. This change (besides the performance improvements) helps using other zlib implementations which behave correctly but slightly different from the original zlib implementation into Java. |
@LanceAndersen , did you manage to get any Mach5 results? Did you find any issues? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change is okay. We also have to be super cautious when touching InflaterXXX and friends but I think one is okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing the microbenchmark to not have hardcoded limitations!
Tests are still running so no final results yet (please note that today is also a US holiday) |
No problem. Just let me know once the tests have finished. And thanks @AlanBateman, @cl4es for the reviews! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mach5 runs did not raise any issues, so you should be OK to move forward.
Thanks @LanceAndersen! |
/integrate |
Going to push as commit 378fa50.
Your commit was automatically rebased without conflicts. |
Currently,
InflaterInputStream::read()
first does a native call to the underlying zlibinflate()
function and only afterwards checks if the inflater requires input (i.e.Inflater::needsInput()
) or has finished inflating (Inflater::finished()
). This leads to an unnecessary native call toinflate()
whenInflaterInputStream::read()
is invoked for the very first time becauseInflater::fill()
hasn't been called and another unnecessary native call to detect the end of the stream. For small streams/files which completely fit into the output buffer passed toInflaterInputStream::read()
we can therefore save two out of three native calls. The attached micro benchmark shows that this results in a 5%-10% performance improvement for zip files of sizes between 4096 to 256 bytes (notice that in common jars like Tomcat, Spring-Boot, Maven, Jackson, etc. about 60-80% of the classes are smaller than 4096 bytes).Tested with the JTreg zip/jar/zipfs and the JCK zip/jar tests.
As a side effect, this change also fixes an issue with alternative zlib implementations like zlib-Chromium or zlib-Cloudflare which pad the inflated bytes with a specif byte pattern at the end of the output for debugging purpose. This breaks code patterns like the following:
Even if the whole data fits into the
data
array during the first call toinflaterInputStream.read()
, we still need a second call toinflaterInputStream.read()
in order to detect the end of the inflater stream. If this second call calls into the nativeinflate()
function of Cloudflare/Chromium's zlib version, this will still write some padding bytes at the beginning of thedata
array and overwrite the previously read data. This issue has been reported in Spring [1] and ASM [2] when using these libraries with Cloudflares zlib version (by settingLD_LIBRARY_PATH
accordingly).[1] spring-projects/spring-framework#27429
[2] https://gitlab.ow2.org/asm/asm/-/issues/317955
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/7492/head:pull/7492
$ git checkout pull/7492
Update a local copy of the PR:
$ git checkout pull/7492
$ git pull https://git.openjdk.java.net/jdk pull/7492/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 7492
View PR using the GUI difftool:
$ git pr show -t 7492
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/7492.diff