-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
decompress block type 00 - no compression #83
Conversation
f12df33
to
abc46a5
Compare
src/test/decompress_test.cpp
Outdated
constexpr auto compressed = std::array{ | ||
std::byte{0b10011111}, | ||
std::byte{5}, | ||
std::byte{0}, // len = 5 | ||
~std::byte{5}, | ||
~std::byte{0}, // nlen = 5 | ||
std::byte{'h'}, | ||
std::byte{'e'}, | ||
std::byte{'l'}, | ||
std::byte{'l'}, | ||
std::byte{'o'}}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about writing a test utility function to construct arrays with block 00?
struct block_type_00_t
{
static constexpr auto header_byte = {0b1001'1111};
};
template <std::size_t N>
consteval auto compressed(block_type_00_t, const std::array<std::byte, N>& data)
{
const auto len = some_function(N);
auto r = std::array<std::byte, N + 5>{
block_type_00_t::header_byte,
len[0],
len[1],
-len[0],
-len[1],
};
std::ranges::copy(data, r.begin() + 5);
return r;
}
...
const auto actual = starflate::decompress(compressed(block_type_00, {'h', 'e', 'l', 'l' 'o'});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will keep this in mind if / when we need to do it more than once. If just once, I think having a helper for this is not worth it.
<< " but expected " << static_cast<int>(expected[i]); | ||
} | ||
}; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about tests for:
- 3 header bits are not byte aligned. We should be able to create a bit_span that's has a bit offset I believe?
- I assume BFINAL isn't really handled at the moment, but what about adding tests that fail for
01
,10
, and11
until those cases are handled. - error handling for len/nlen mismatch
- error handling for data.size() < len
src/decompress.hpp
Outdated
return std::unexpected{DecompressError::NonCompressedLenMismatch}; | ||
} | ||
|
||
std::copy_n(compressed.data(), len, std::back_inserter(decompressed)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::copy_n(compressed.data(), len, std::back_inserter(decompressed)); | |
decompressed.resize(len); | |
std::copy_n(compressed.data(), len, decompressed.begin()); |
Related to above, but we probably want to assert compressed.size() >= len before copy_n.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compressed
-> compressed_bits
If we change the function argument to bit_span
, then we can get an implicit conversion from span
to bit_span
in the function parameter. Then compressed
will never exist in this function and we can't mix up the two.
if (len != static_cast<uint16_t>(~nlen)) { | ||
return std::unexpected{DecompressError::NonCompressedLenMismatch}; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we want to handle compressed.size() != len?
I assume we may run into that case with buffered reading of compressed data?
I think for now (final_bit + byte_00), maybe an assert is fine?
huffman/src/bit_span.hpp
Outdated
bit_offset_ += n % CHAR_BIT; | ||
if (bit_offset_ == CHAR_BIT) { | ||
bit_offset_ = 0; | ||
n += CHAR_BIT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't convinced myself this is correct yet, particularly for n > 8. Maybe some tests?
In addition, you could also add an assertion
// invariant
assert(bit_offset_ < CHAR_BIT)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some tests and the invariant.
abc46a5
to
058a766
Compare
I'll try to address your comments next time I work on this (maybe I anticipated some of them in the last version I just pushed, but I hadn't seen them yet, so don't assume I'm ignoring you) |
Codecov Report
@@ Coverage Diff @@
## master #83 +/- ##
==========================================
+ Coverage 61.12% 61.90% +0.77%
==========================================
Files 14 16 +2
Lines 939 1000 +61
==========================================
+ Hits 574 619 +45
- Misses 365 381 +16
... and 1 file with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
058a766
to
05cdb7f
Compare
05cdb7f
to
fc2fb2a
Compare
Didn't address all your comments yet. Will do so soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't have time to finish reviewing everything. I think it's worth splitting out the bit_span stuff into a separate PR and merging that first. Or #105
huffman/src/bit_span.hpp
Outdated
bit_offset_ += n % CHAR_BIT; | ||
if (bit_offset_ >= CHAR_BIT) { | ||
bit_offset_ -= CHAR_BIT; | ||
n += CHAR_BIT; | ||
} | ||
std::advance(data_, n / CHAR_BIT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this clearer since I don't need to follow state changes in order to see the correctness of the call to advance
bit_offset_ += n % CHAR_BIT; | |
if (bit_offset_ >= CHAR_BIT) { | |
bit_offset_ -= CHAR_BIT; | |
n += CHAR_BIT; | |
} | |
std::advance(data_, n / CHAR_BIT); | |
const auto distance = bit_offset_ + n; | |
std::advance(data_, distance / CHAR_BIT); | |
bit_offset_ = static_cast<std::uint8_t>(distance % CHAR_BIT); | |
// invariant | |
assert(bit_offset_ < CHAR_BIT); |
It's also more important to assert the invariant after breaking it temporarily as we can "hopefully" assume that other public member functions also maintain the invariant.
huffman/src/bit_span.hpp
Outdated
|
||
/// Consumes the given number of bits. Advances the start of the view. | ||
/// | ||
/// @pre n <= std::ranges::size(this) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// @pre n <= std::ranges::size(this) | |
/// @pre n <= std::ranges::size(*this) |
huffman/src/bit_span.hpp
Outdated
@@ -113,5 +116,59 @@ class bit_span : public std::ranges::view_interface<bit_span> | |||
{ | |||
return iterator{*this, bit_offset_ + bit_size_}; | |||
}; | |||
|
|||
template <class T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not completely sure, but we should constrain T
to be a scalar type to prevent lifetime issues. Scalar types are included in implicit-lifetime types.
https://en.cppreference.com/w/cpp/string/byte/memcpy
If the objects are potentially-overlapping or not TriviallyCopyable, the behavior of memcpy is not specified and may be undefined.
huffman/src/bit_span.hpp
Outdated
std::advance(data_, sizeof(T)); | ||
bit_size_ -= sizeof(T) * CHAR_BIT; | ||
if constexpr (std::endian::native == std::endian::big) { | ||
res = std::byteswap(res); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should simply constrain T
to be integral otherwise use of std::byteswap
can result in a hard error.
https://en.cppreference.com/w/cpp/numeric/byteswap
hard error: causes compilation to fail
soft error: causes the compiler to discard a template from a set of candidates for overload resolution (without making compilation fail and enabling the well-known SFINAE idiom)
huffman/test/bit_span_test.cpp
Outdated
// NOLINTBEGIN(readability-magic-numbers) | ||
static constexpr std::array data{ | ||
std::byte{0b10101010}, std::byte{0b01010101}}; | ||
huffman::bit_span span{data.data(), data.size() * CHAR_BIT}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The constructor on bit_span.hpp:104 should allow this:
huffman::bit_span span{data.data(), data.size() * CHAR_BIT}; | |
huffman::bit_span span{data}; |
huffman/test/bit_span_test.cpp
Outdated
expect(*span.begin() == 0_b); | ||
expect(span.data() == data.data()); | ||
// should be a no-op now. | ||
span.consume_to_byte_boundary(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be tested in a separate test case.
huffman/test/bit_span_test.cpp
Outdated
span.consume_to_byte_boundary(); | ||
expect(*span.begin() == 0_b); | ||
expect(span.data() == data.data()); | ||
|
||
span.consume(1); | ||
expect(*span.begin() == 1_b); | ||
expect(span.data() == data.data()); | ||
|
||
span.consume(1); | ||
expect(*span.begin() == 0_b); | ||
expect(span.data() == data.data()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two notes:
- I think having a lot of mutation in a test makes it slower and/or harder to understand due to the amount of history I need to track. I think this would be easier to read with a value parameterized test such as:
test("consume") = [](auto n) {
static constexpr std::array data{
std::byte{0b10101010}, std::byte{0b01010101}};
static constexpr auto nth_bit = [](auto m) {
const auto i = m < 8U;
return huffman::bit{
std::bitset<CHAR_BIT>{std::to_integer<int>(data[i])}[m]};
};
auto bits = huffman::bit_span{data};
bits.consume(n);
expect(nth_bit(n) == bits[0]);
expect(CHAR_BIT * data.size() - n == bits.size());
} | std::views::iota(0U, 16U);
- I don't think we should have
data()
return a pointer to a byte if there is a non-zero bit offset.data()
should always return the first element of the range. If not, the semantics of this type differ slightly from the semantics of the standard library which can cause all sort of headaches down the line.
huffman/test/bit_span_test.cpp
Outdated
expect(*span.begin() == 0_b); | ||
|
||
span.consume(1); | ||
// span is now empty. Accssing .begin() would be undefined behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually fine and you can test it out with a constant expression.
constexpr auto bits = huffman::bit_span{nullptr, 0};
static_assert(bits.begin() == bits.end());
If it compiles, it's defined behavior.
However, dereferencing begin()
is UB.
fc2fb2a
to
fc91a38
Compare
fc91a38
to
515886f
Compare
362836f
to
e14287e
Compare
Moved bit_span changes to #107. |
515886f
to
750eb52
Compare
e14287e
to
f914181
Compare
76331ae
to
6790495
Compare
Change-Id: I5ceb11f5b6ba0ef63e250757747dab79c7958653
6790495
to
c3492e8
Compare
decompress block type 00 - no compression
Change-Id: I5ceb11f5b6ba0ef63e250757747dab79c7958653