Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

decompress block type 00 - no compression #83

Merged
merged 1 commit into from
Oct 30, 2023

Conversation

garymm
Copy link
Owner

@garymm garymm commented Sep 19, 2023

decompress block type 00 - no compression

Change-Id: I5ceb11f5b6ba0ef63e250757747dab79c7958653

@garymm garymm force-pushed the I5ceb11f5b6ba0ef63e250757747dab79c7958653 branch from f12df33 to abc46a5 Compare October 14, 2023 04:14
@garymm garymm changed the base branch from I65fd59b5163ca3cb128c830e9bd75f7ba7440940 to I916c03915e7319f42a4c8a0ca11106097d5b41db October 14, 2023 04:14
src/test/decompress_test.cpp Outdated Show resolved Hide resolved
src/test/decompress_test.cpp Outdated Show resolved Hide resolved
src/test/decompress_test.cpp Outdated Show resolved Hide resolved
Comment on lines 14 to 56
constexpr auto compressed = std::array{
std::byte{0b10011111},
std::byte{5},
std::byte{0}, // len = 5
~std::byte{5},
~std::byte{0}, // nlen = 5
std::byte{'h'},
std::byte{'e'},
std::byte{'l'},
std::byte{'l'},
std::byte{'o'}};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about writing a test utility function to construct arrays with block 00?

struct block_type_00_t
{
    static constexpr auto header_byte = {0b1001'1111};
};

template <std::size_t N>
consteval auto compressed(block_type_00_t, const std::array<std::byte, N>& data)
{
  const auto len = some_function(N);
  
  auto r =  std::array<std::byte, N + 5>{
    block_type_00_t::header_byte,
    len[0],
    len[1],
    -len[0],
    -len[1],
  };
  std::ranges::copy(data, r.begin() + 5);
  return r;
}

...

const auto actual = starflate::decompress(compressed(block_type_00, {'h', 'e', 'l', 'l' 'o'});

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will keep this in mind if / when we need to do it more than once. If just once, I think having a helper for this is not worth it.

<< " but expected " << static_cast<int>(expected[i]);
}
};
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about tests for:

  • 3 header bits are not byte aligned. We should be able to create a bit_span that's has a bit offset I believe?
  • I assume BFINAL isn't really handled at the moment, but what about adding tests that fail for 01 , 10, and 11 until those cases are handled.
  • error handling for len/nlen mismatch
  • error handling for data.size() < len

src/decompress.cpp Show resolved Hide resolved
return std::unexpected{DecompressError::NonCompressedLenMismatch};
}

std::copy_n(compressed.data(), len, std::back_inserter(decompressed));
Copy link
Collaborator

@oliverlee oliverlee Oct 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::copy_n(compressed.data(), len, std::back_inserter(decompressed));
decompressed.resize(len);
std::copy_n(compressed.data(), len, decompressed.begin());

Related to above, but we probably want to assert compressed.size() >= len before copy_n.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compressed -> compressed_bits

If we change the function argument to bit_span, then we can get an implicit conversion from span to bit_span in the function parameter. Then compressed will never exist in this function and we can't mix up the two.

if (len != static_cast<uint16_t>(~nlen)) {
return std::unexpected{DecompressError::NonCompressedLenMismatch};
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we want to handle compressed.size() != len?

I assume we may run into that case with buffered reading of compressed data?

I think for now (final_bit + byte_00), maybe an assert is fine?

Comment on lines 145 to 157
bit_offset_ += n % CHAR_BIT;
if (bit_offset_ == CHAR_BIT) {
bit_offset_ = 0;
n += CHAR_BIT;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't convinced myself this is correct yet, particularly for n > 8. Maybe some tests?

In addition, you could also add an assertion

// invariant
assert(bit_offset_ < CHAR_BIT)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some tests and the invariant.

huffman/src/bit_span.hpp Outdated Show resolved Hide resolved
huffman/src/bit_span.hpp Outdated Show resolved Hide resolved
@garymm garymm changed the title WIP decompress block type 00 decompress block type 00 - no compression Oct 14, 2023
@garymm garymm changed the base branch from I916c03915e7319f42a4c8a0ca11106097d5b41db to master October 14, 2023 22:07
@garymm garymm force-pushed the I5ceb11f5b6ba0ef63e250757747dab79c7958653 branch from abc46a5 to 058a766 Compare October 14, 2023 22:07
@garymm garymm marked this pull request as ready for review October 14, 2023 22:09
@garymm
Copy link
Owner Author

garymm commented Oct 14, 2023

I'll try to address your comments next time I work on this (maybe I anticipated some of them in the last version I just pushed, but I hadn't seen them yet, so don't assume I'm ignoring you)

@codecov
Copy link

codecov bot commented Oct 14, 2023

Codecov Report

Merging #83 (c3492e8) into master (217076e) will increase coverage by 0.77%.
The diff coverage is 91.42%.

@@            Coverage Diff             @@
##           master      #83      +/-   ##
==========================================
+ Coverage   61.12%   61.90%   +0.77%     
==========================================
  Files          14       16       +2     
  Lines         939     1000      +61     
==========================================
+ Hits          574      619      +45     
- Misses        365      381      +16     
Files Coverage Δ
huffman/src/bit_span.hpp 100.00% <100.00%> (ø)
src/decompress.cpp 100.00% <100.00%> (ø)
src/decompress.hpp 86.95% <84.21%> (ø)

... and 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@garymm garymm force-pushed the I5ceb11f5b6ba0ef63e250757747dab79c7958653 branch from 058a766 to 05cdb7f Compare October 16, 2023 17:30
@garymm garymm changed the base branch from master to I9f8a0329e1d37ce760ffb939ce329624872528db October 16, 2023 17:30
@garymm garymm force-pushed the I5ceb11f5b6ba0ef63e250757747dab79c7958653 branch from 05cdb7f to fc2fb2a Compare October 17, 2023 04:21
@garymm
Copy link
Owner Author

garymm commented Oct 17, 2023

Didn't address all your comments yet. Will do so soon.

Copy link
Collaborator

@oliverlee oliverlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't have time to finish reviewing everything. I think it's worth splitting out the bit_span stuff into a separate PR and merging that first. Or #105

huffman/src/bit_span.hpp Outdated Show resolved Hide resolved
Comment on lines 151 to 159
bit_offset_ += n % CHAR_BIT;
if (bit_offset_ >= CHAR_BIT) {
bit_offset_ -= CHAR_BIT;
n += CHAR_BIT;
}
std::advance(data_, n / CHAR_BIT);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this clearer since I don't need to follow state changes in order to see the correctness of the call to advance

Suggested change
bit_offset_ += n % CHAR_BIT;
if (bit_offset_ >= CHAR_BIT) {
bit_offset_ -= CHAR_BIT;
n += CHAR_BIT;
}
std::advance(data_, n / CHAR_BIT);
const auto distance = bit_offset_ + n;
std::advance(data_, distance / CHAR_BIT);
bit_offset_ = static_cast<std::uint8_t>(distance % CHAR_BIT);
// invariant
assert(bit_offset_ < CHAR_BIT);

It's also more important to assert the invariant after breaking it temporarily as we can "hopefully" assume that other public member functions also maintain the invariant.


/// Consumes the given number of bits. Advances the start of the view.
///
/// @pre n <= std::ranges::size(this)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// @pre n <= std::ranges::size(this)
/// @pre n <= std::ranges::size(*this)

@@ -113,5 +116,59 @@ class bit_span : public std::ranges::view_interface<bit_span>
{
return iterator{*this, bit_offset_ + bit_size_};
};

template <class T>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not completely sure, but we should constrain T to be a scalar type to prevent lifetime issues. Scalar types are included in implicit-lifetime types.

https://en.cppreference.com/w/cpp/string/byte/memcpy

If the objects are potentially-overlapping or not TriviallyCopyable, the behavior of memcpy is not specified and may be undefined.

https://wg21.link/p2590

std::advance(data_, sizeof(T));
bit_size_ -= sizeof(T) * CHAR_BIT;
if constexpr (std::endian::native == std::endian::big) {
res = std::byteswap(res);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should simply constrain T to be integral otherwise use of std::byteswap can result in a hard error.

https://en.cppreference.com/w/cpp/numeric/byteswap

hard error: causes compilation to fail
soft error: causes the compiler to discard a template from a set of candidates for overload resolution (without making compilation fail and enabling the well-known SFINAE idiom)

https://stackoverflow.com/questions/15260685/what-exactly-is-the-immediate-context-mentioned-in-the-c11-standard-for-whic

// NOLINTBEGIN(readability-magic-numbers)
static constexpr std::array data{
std::byte{0b10101010}, std::byte{0b01010101}};
huffman::bit_span span{data.data(), data.size() * CHAR_BIT};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constructor on bit_span.hpp:104 should allow this:

Suggested change
huffman::bit_span span{data.data(), data.size() * CHAR_BIT};
huffman::bit_span span{data};

expect(*span.begin() == 0_b);
expect(span.data() == data.data());
// should be a no-op now.
span.consume_to_byte_boundary();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be tested in a separate test case.

Comment on lines 104 to 114
span.consume_to_byte_boundary();
expect(*span.begin() == 0_b);
expect(span.data() == data.data());

span.consume(1);
expect(*span.begin() == 1_b);
expect(span.data() == data.data());

span.consume(1);
expect(*span.begin() == 0_b);
expect(span.data() == data.data());
Copy link
Collaborator

@oliverlee oliverlee Oct 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two notes:

  1. I think having a lot of mutation in a test makes it slower and/or harder to understand due to the amount of history I need to track. I think this would be easier to read with a value parameterized test such as:
   test("consume") = [](auto n) {
     static constexpr std::array data{
         std::byte{0b10101010}, std::byte{0b01010101}};

     static constexpr auto nth_bit = [](auto m) {
       const auto i = m < 8U;
       return huffman::bit{
           std::bitset<CHAR_BIT>{std::to_integer<int>(data[i])}[m]};
     };

     auto bits = huffman::bit_span{data};
     bits.consume(n);

     expect(nth_bit(n) == bits[0]);
     expect(CHAR_BIT * data.size() - n == bits.size());
   } | std::views::iota(0U, 16U);
  1. I don't think we should have data() return a pointer to a byte if there is a non-zero bit offset. data() should always return the first element of the range. If not, the semantics of this type differ slightly from the semantics of the standard library which can cause all sort of headaches down the line.

https://en.cppreference.com/w/cpp/ranges/data

expect(*span.begin() == 0_b);

span.consume(1);
// span is now empty. Accssing .begin() would be undefined behavior.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually fine and you can test it out with a constant expression.

constexpr auto bits = huffman::bit_span{nullptr, 0};
static_assert(bits.begin() == bits.end());

If it compiles, it's defined behavior.

However, dereferencing begin() is UB.

src/test/decompress_test.cpp Outdated Show resolved Hide resolved
Base automatically changed from I9f8a0329e1d37ce760ffb939ce329624872528db to master October 18, 2023 18:52
@garymm garymm force-pushed the I5ceb11f5b6ba0ef63e250757747dab79c7958653 branch from fc2fb2a to fc91a38 Compare October 18, 2023 20:21
@garymm garymm changed the base branch from master to I4ba8659230270f8e571fe70444d9f5c629a888a3 October 18, 2023 20:21
@garymm garymm force-pushed the I5ceb11f5b6ba0ef63e250757747dab79c7958653 branch from fc91a38 to 515886f Compare October 18, 2023 20:34
@garymm garymm force-pushed the I4ba8659230270f8e571fe70444d9f5c629a888a3 branch from 362836f to e14287e Compare October 18, 2023 20:34
@garymm
Copy link
Owner Author

garymm commented Oct 18, 2023

Moved bit_span changes to #107.
Putting this in draft mode for now.

@garymm garymm marked this pull request as draft October 18, 2023 20:34
@garymm garymm force-pushed the I5ceb11f5b6ba0ef63e250757747dab79c7958653 branch from 515886f to 750eb52 Compare October 20, 2023 00:00
@garymm garymm force-pushed the I4ba8659230270f8e571fe70444d9f5c629a888a3 branch from e14287e to f914181 Compare October 20, 2023 00:00
Base automatically changed from I4ba8659230270f8e571fe70444d9f5c629a888a3 to master October 20, 2023 00:20
@garymm garymm marked this pull request as ready for review October 22, 2023 03:40
@garymm garymm force-pushed the I5ceb11f5b6ba0ef63e250757747dab79c7958653 branch 2 times, most recently from 76331ae to 6790495 Compare October 30, 2023 02:38
Change-Id: I5ceb11f5b6ba0ef63e250757747dab79c7958653
@garymm garymm force-pushed the I5ceb11f5b6ba0ef63e250757747dab79c7958653 branch from 6790495 to c3492e8 Compare October 30, 2023 03:00
@garymm garymm enabled auto-merge (squash) October 30, 2023 03:00
@garymm garymm merged commit 61f7f2b into master Oct 30, 2023
28 checks passed
@garymm garymm deleted the I5ceb11f5b6ba0ef63e250757747dab79c7958653 branch October 30, 2023 03:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants