forked from apache/orc
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ORC-1087: [C++] Handle unloaded seek positions when seeking in an unc…
…ompressed chunk (apache#1008) ### What changes were proposed in this pull request? This PR fixes an unhandled case when seeking in an uncompressed chunk. ### Why are the changes needed? The bug causes position overflow and fails the reader when encountered the unhandled case. Some background: * Compressed streams are compressed in chunks. If the compressed size of a chunk is larger than the original size, the original (uncompressed) chunk will be kept. The chunk header records the chunk length and whether it's compressed. * Seek position in a compressed stream is encoded into 2 numbers: pos in the input stream and pos in the chunk. The first number locates the chunk header. The second number locates the position in the decompressed chunk. * Compressed chunks are decompressed in a whole so the whole chunk is in the output buffer. Uncompressed chunks don't need decompression so the input buffer is used directly. However, the chunk could be read in pieces depending on the block size of the input stream. So the seek position might not be loaded yet. The unhandled case is: the seek position is in the current chunk but posInChunk is not loaded yet. We should skip the remaining bytes to seek to it. ### How was this patch tested? Added a unit test in TestDecompression.cc. Verified the issue described in the JIRA is resolved. (cherry picked from commit d175525) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
- Loading branch information
1 parent
88a6fe6
commit 97db0f4
Showing
2 changed files
with
99 additions
and
20 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters