New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid Exception-related performance issues in LZ4BlockInputStream when stopOnEmptyBlock == false #143
Avoid Exception-related performance issues in LZ4BlockInputStream when stopOnEmptyBlock == false #143
Conversation
@JoshRosen Good catch, LGTM @odaira WDYT? As we know, it looks too expensive to return the state to a caller by using an exception. |
I haven't thoroughly checked the code, but your statement makes sense. I'll try the code myself. |
private void readFully(byte[] b, int len) throws IOException { | ||
// Like readFully(), except it signals incomplete reads by returning | ||
// false instead of throwing EOFException. | ||
private boolean tryReadFully(byte[] b, int len) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: drop throws IOException
in the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered this, but it will not work: the underlying read()
call declares that it throws IOException
so we need to keep this here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, ok.
I checked the code and it looks quite reasonable to me. |
kindly ping |
@odaira gentle ping |
Sorry, I have some urgent things, but I'll review this by the end of August. |
Thanks for your contribution. I confirmed the microbenchmark was more than 2x faster with this change. Good optimization! |
Thanks! |
Hi @odaira, If you have time, would it be possible to make a new release of |
Yes, but I am working on a few more patches. The next release will be sometime in next month. |
Thanks for the v1.7 release! @odaira |
### What changes were proposed in this pull request? This pr intends to upgrade lz4-java from 1.6.0 to 1.7.0. ### Why are the changes needed? This release includes a performance bug (lz4/lz4-java#143) fixed by JoshRosen and some improvements (e.g., LZ4 binary update). You can see the link below for the changes; https://github.com/lz4/lz4-java/blob/master/CHANGES.md#170 ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing tests. Closes #26823 from maropu/LZ4_1_7_0. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
This PR improves the performance of
LZ4BlockInputStream
whenstopOnEmptyBlock = false
, a mode used in Apache Spark (see #105 and #76 for background).When
stopOnEmptyBlock = false
,LZ4BlockInputStream
will attempt to callrefill()
after it's reached the end of a compressed block (because end-of-block no longer implies end-of-stream). If we reach the end of the stream thenrefill()
's attempt to read the next block's magic header will fail and throw anEOFException
, which is then caught and ignored.Throwing and catching this exception has a significant performance penalty, especially in applications with deep call stacks: a significant amount of time is spent collecting exception stack traces. We could try to solve this problem by throwing a static exception, but that harms users' ability to debug unexpected exceptions.
This PR addresses this problem by adding a private
tryReadFully()
method, which is likereadFully()
except it signals success / failure by returning a boolean instead of throwing an exception. This allows us to skip the exception overhead in the case of "expected" exceptions, significantly improving performance.Benchmarks
Here's a quick-and-dirty microbenchmark that I ran on my Mac (written in Scala):
This took ~5 seconds before and ~600ms after. This isn't the most scientific benchmark (no warmup), but I think it's a good illustration of the performance problems in the original code. Note that the perf. issues are even more pronounced with deep stacktraces (the stack is very shallow in this benchmark).