-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read concated stream #76
Conversation
stopOnEmptyBlock = true; | ||
} | ||
|
||
void setStopOnEmptyBlock(boolean stop) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be a public method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes.
@@ -90,6 +96,9 @@ public LZ4BlockInputStream(InputStream in) { | |||
|
|||
@Override | |||
public int available() throws IOException { | |||
if (o == originalLen) { | |||
refill(); | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think typically available() just returns the remaining bytes in the buffer, and it should not call a blocking method like refill(), though it is not strictly prohibited. Do you have any reason you want to call refill() in available()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not remember the the reason, maybe it's OK to not call refill() here.
} catch (EOFException e) { | ||
finished = true; | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should re-throw EOFException if stopOnEmptyBlock is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
@davies If this feature is really needed in Spark, I'll merge it for the next release, but could you comment on my questions before that? Thanks. |
@odaira I do not have bandwidth to work on this patch, could you fork this one? |
Thanks! |
In Apache Spark, we have a fast path to merge two compressed shuffle files together by concat the bytes directly (without decompress and compress again), this rely on that the decompressor to handle concated streams.
The BlockStream uses an empty block as the end of a stream, this PR change to skip the empty block until the end of stream. (also added a flag for this behavior change, having old behavior by default).