Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid Exception-related performance issues in LZ4BlockInputStream when stopOnEmptyBlock == false #143

Merged
merged 1 commit into from Aug 30, 2019

Conversation

JoshRosen
Copy link
Contributor

This PR improves the performance of LZ4BlockInputStream when stopOnEmptyBlock = false, a mode used in Apache Spark (see #105 and #76 for background).

When stopOnEmptyBlock = false, LZ4BlockInputStream will attempt to call refill() after it's reached the end of a compressed block (because end-of-block no longer implies end-of-stream). If we reach the end of the stream then refill()'s attempt to read the next block's magic header will fail and throw an EOFException, which is then caught and ignored.

Throwing and catching this exception has a significant performance penalty, especially in applications with deep call stacks: a significant amount of time is spent collecting exception stack traces. We could try to solve this problem by throwing a static exception, but that harms users' ability to debug unexpected exceptions.

This PR addresses this problem by adding a private tryReadFully() method, which is like readFully() except it signals success / failure by returning a boolean instead of throwing an exception. This allows us to skip the exception overhead in the case of "expected" exceptions, significantly improving performance.

Benchmarks

Here's a quick-and-dirty microbenchmark that I ran on my Mac (written in Scala):

import java.io._
import net.jpountz.lz4._

# Construct a test input consisting of two concatenated LZ4 streams
val testBytes = "Testing!".getBytes("UTF-8")
val bytes = new ByteArrayOutputStream()
val out = new LZ4BlockOutputStream(bytes)
out.write(testBytes)
out.close()
val concatInput = (bytes.toByteArray() ++ bytes.toByteArray())

// Measure repeated decompression speed
val start = System.currentTimeMillis()
var i = 0
while (i < 1000 * 1000) {
    i += 1
    val in = new LZ4BlockInputStream(new ByteArrayInputStream(concatInput), false)
    while (in.read() != -1) {}
    in.close()
}
val end = System.currentTimeMillis()
println(end - start)

This took ~5 seconds before and ~600ms after. This isn't the most scientific benchmark (no warmup), but I think it's a good illustration of the performance problems in the original code. Note that the perf. issues are even more pronounced with deep stacktraces (the stack is very shallow in this benchmark).

@JoshRosen
Copy link
Contributor Author

/cc @maropu (who worked on the original implementation of this feature) and @kiszk (who recently upgraded lz4-java in Apache Spark)

@kiszk
Copy link

kiszk commented Jun 24, 2019

@JoshRosen Good catch, LGTM

@odaira WDYT? As we know, it looks too expensive to return the state to a caller by using an exception.

@odaira
Copy link
Member

odaira commented Jun 24, 2019

I haven't thoroughly checked the code, but your statement makes sense. I'll try the code myself.

private void readFully(byte[] b, int len) throws IOException {
// Like readFully(), except it signals incomplete reads by returning
// false instead of throwing EOFException.
private boolean tryReadFully(byte[] b, int len) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: drop throws IOException in the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered this, but it will not work: the underlying read() call declares that it throws IOException so we need to keep this here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, ok.

@maropu
Copy link
Contributor

maropu commented Jun 28, 2019

I checked the code and it looks quite reasonable to me.

@maropu
Copy link
Contributor

maropu commented Jul 15, 2019

kindly ping

@kiszk
Copy link

kiszk commented Jul 30, 2019

@odaira gentle ping

@odaira
Copy link
Member

odaira commented Jul 31, 2019

Sorry, I have some urgent things, but I'll review this by the end of August.

@odaira
Copy link
Member

odaira commented Aug 30, 2019

Thanks for your contribution. I confirmed the microbenchmark was more than 2x faster with this change. Good optimization!

@maropu
Copy link
Contributor

maropu commented Aug 30, 2019

Thanks!

@joshrosen-stripe
Copy link

Hi @odaira,

If you have time, would it be possible to make a new release of lz4-java so Spark can pick up this performance optimization?

@odaira
Copy link
Member

odaira commented Sep 10, 2019

Yes, but I am working on a few more patches. The next release will be sometime in next month.

@maropu
Copy link
Contributor

maropu commented Dec 9, 2019

Thanks for the v1.7 release! @odaira

HyukjinKwon pushed a commit to apache/spark that referenced this pull request Dec 10, 2019
### What changes were proposed in this pull request?

This pr intends to upgrade lz4-java from 1.6.0 to 1.7.0.

### Why are the changes needed?

This release includes a performance bug (lz4/lz4-java#143) fixed by JoshRosen and some improvements (e.g., LZ4 binary update). You can see the link below for the changes;
https://github.com/lz4/lz4-java/blob/master/CHANGES.md#170

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Existing tests.

Closes #26823 from maropu/LZ4_1_7_0.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants