Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read concated stream #76

Closed
wants to merge 2 commits into from
Closed

read concated stream #76

wants to merge 2 commits into from

Conversation

davies
Copy link

@davies davies commented Dec 17, 2015

In Apache Spark, we have a fast path to merge two compressed shuffle files together by concat the bytes directly (without decompress and compress again), this rely on that the decompressor to handle concated streams.

The BlockStream uses an empty block as the end of a stream, this PR change to skip the empty block until the end of stream. (also added a flag for this behavior change, having old behavior by default).

stopOnEmptyBlock = true;
}

void setStopOnEmptyBlock(boolean stop) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be a public method?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes.

@@ -90,6 +96,9 @@ public LZ4BlockInputStream(InputStream in) {

@Override
public int available() throws IOException {
if (o == originalLen) {
refill();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think typically available() just returns the remaining bytes in the buffer, and it should not call a blocking method like refill(), though it is not strictly prohibited. Do you have any reason you want to call refill() in available()?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not remember the the reason, maybe it's OK to not call refill() here.

} catch (EOFException e) {
finished = true;
return;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should re-throw EOFException if stopOnEmptyBlock is true.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@odaira
Copy link
Member

odaira commented Jun 5, 2017

@davies If this feature is really needed in Spark, I'll merge it for the next release, but could you comment on my questions before that? Thanks.

@davies
Copy link
Author

davies commented Jun 5, 2017

@odaira I do not have bandwidth to work on this patch, could you fork this one?

@davies davies closed this Jun 5, 2017
@odaira
Copy link
Member

odaira commented Jun 5, 2017

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants