Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple concatenated streams #20

Merged
merged 3 commits into from
Sep 17, 2018
Merged

Conversation

sfriesel
Copy link
Collaborator

Fixes #19

unbzip2-stream stops decoding as soon as it encounters an EOS marker, but bzip2 transparently supports concatenated streams like so:
cat <(echo -n a | bzip2) <(echo b | bzip2) | bunzip2

The example file in #19 is such a case of concatenation.
To support this feature the current implementation needs to be more precise in consuming the correct number of bits/bytes up until the first stream ends, so that decoding for following bzip2 streams picks up at the right position again. If the input stream doesn't end after the first bzip2 stream, the decoder will now continue and expect additional bzip2 streams until the input stream ends.
As an intermediate step, this PR also validates the stream checksum that directly follows each EOS marker, to validate that all blocks were present and in the right order.

Otherwise the number of bytes read is off by one which later leads
to problems when consuming the input stream precisely to the last byte.
In addition to the per-block CRC sums, the stream ends with a checksum
that combines all CRC sums; check that one as well. Also align the bit reader
to the next byte boundary in the end to allow reading data that follows.
bzip2 can handle inputs that consist of multiple concatenated bzip2 streams.
To support this functionality, the decoder needs to continue after seeing an
EOS marker because it may be followed by another bzip2 stream.
@regular
Copy link
Owner

regular commented Sep 17, 2018

Looks good!

@regular regular merged commit e0e6a58 into regular:master Sep 17, 2018
@regular
Copy link
Owner

regular commented Sep 17, 2018

Merged and published as unbzip2-stream@1.3.0

Thanks, @sfriesel !

@sfriesel sfriesel deleted the multistream branch October 24, 2018 22:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants