decompress block type 00 - no compression #83

garymm · 2023-09-19T16:21:36Z

decompress block type 00 - no compression

Change-Id: I5ceb11f5b6ba0ef63e250757747dab79c7958653

src/test/decompress_test.cpp

oliverlee · 2023-10-14T05:32:40Z

src/test/decompress_test.cpp

+    constexpr auto compressed = std::array{
+        std::byte{0b10011111},
+        std::byte{5},
+        std::byte{0},  // len = 5
+        ~std::byte{5},
+        ~std::byte{0},  // nlen = 5
+        std::byte{'h'},
+        std::byte{'e'},
+        std::byte{'l'},
+        std::byte{'l'},
+        std::byte{'o'}};


What about writing a test utility function to construct arrays with block 00?

struct block_type_00_t { static constexpr auto header_byte = {0b1001'1111}; }; template <std::size_t N> consteval auto compressed(block_type_00_t, const std::array<std::byte, N>& data) { const auto len = some_function(N); auto r = std::array<std::byte, N + 5>{ block_type_00_t::header_byte, len[0], len[1], -len[0], -len[1], }; std::ranges::copy(data, r.begin() + 5); return r; } ... const auto actual = starflate::decompress(compressed(block_type_00, {'h', 'e', 'l', 'l' 'o'});

Will keep this in mind if / when we need to do it more than once. If just once, I think having a helper for this is not worth it.

oliverlee · 2023-10-14T05:41:49Z

src/test/decompress_test.cpp

+          << " but expected " << static_cast<int>(expected[i]);
+    }
+  };
+};


What about tests for:

3 header bits are not byte aligned. We should be able to create a bit_span that's has a bit offset I believe?

I assume BFINAL isn't really handled at the moment, but what about adding tests that fail for 01 , 10, and 11 until those cases are handled.

error handling for len/nlen mismatch

error handling for data.size() < len

src/decompress.cpp

oliverlee · 2023-10-14T05:51:07Z

src/decompress.hpp

+        return std::unexpected{DecompressError::NonCompressedLenMismatch};
+      }
+
+      std::copy_n(compressed.data(), len, std::back_inserter(decompressed));


Suggested change

std::copy_n(compressed.data(), len, std::back_inserter(decompressed));

decompressed.resize(len);

std::copy_n(compressed.data(), len, decompressed.begin());

Related to above, but we probably want to assert compressed.size() >= len before copy_n.

compressed -> compressed_bits

If we change the function argument to bit_span, then we can get an implicit conversion from span to bit_span in the function parameter. Then compressed will never exist in this function and we can't mix up the two.

oliverlee · 2023-10-14T05:53:14Z

src/decompress.hpp

+      if (len != static_cast<uint16_t>(~nlen)) {
+        return std::unexpected{DecompressError::NonCompressedLenMismatch};
+      }
+


How do we want to handle compressed.size() != len?

I assume we may run into that case with buffered reading of compressed data?

I think for now (final_bit + byte_00), maybe an assert is fine?

oliverlee · 2023-10-14T17:29:49Z

huffman/src/bit_span.hpp

+    bit_offset_ += n % CHAR_BIT;
+    if (bit_offset_ == CHAR_BIT) {
+      bit_offset_ = 0;
+      n += CHAR_BIT;


I haven't convinced myself this is correct yet, particularly for n > 8. Maybe some tests?

In addition, you could also add an assertion

// invariant
assert(bit_offset_ < CHAR_BIT)

Added some tests and the invariant.

huffman/src/bit_span.hpp

garymm · 2023-10-14T22:11:22Z

I'll try to address your comments next time I work on this (maybe I anticipated some of them in the last version I just pushed, but I hadn't seen them yet, so don't assume I'm ignoring you)

codecov · 2023-10-14T22:14:19Z

Codecov Report

Merging #83 (c3492e8) into master (217076e) will increase coverage by 0.77%.
The diff coverage is 91.42%.

@@            Coverage Diff             @@
##           master      #83      +/-   ##
==========================================
+ Coverage   61.12%   61.90%   +0.77%     
==========================================
  Files          14       16       +2     
  Lines         939     1000      +61     
==========================================
+ Hits          574      619      +45     
- Misses        365      381      +16

Files	Coverage Δ
huffman/src/bit_span.hpp	`100.00% <100.00%> (ø)`
src/decompress.cpp	`100.00% <100.00%> (ø)`
src/decompress.hpp	`86.95% <84.21%> (ø)`

... and 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

garymm · 2023-10-17T04:21:24Z

Didn't address all your comments yet. Will do so soon.

oliverlee

Didn't have time to finish reviewing everything. I think it's worth splitting out the bit_span stuff into a separate PR and merging that first. Or #105

huffman/src/bit_span.hpp

oliverlee · 2023-10-17T05:52:30Z

huffman/src/bit_span.hpp

+    bit_offset_ += n % CHAR_BIT;
+    if (bit_offset_ >= CHAR_BIT) {
+      bit_offset_ -= CHAR_BIT;
+      n += CHAR_BIT;
+    }
+    std::advance(data_, n / CHAR_BIT);


I find this clearer since I don't need to follow state changes in order to see the correctness of the call to advance

Suggested change

bit_offset_ += n % CHAR_BIT;

if (bit_offset_ >= CHAR_BIT) {

bit_offset_ -= CHAR_BIT;

n += CHAR_BIT;

}

std::advance(data_, n / CHAR_BIT);

const auto distance = bit_offset_ + n;

std::advance(data_, distance / CHAR_BIT);

bit_offset_ = static_cast<std::uint8_t>(distance % CHAR_BIT);

// invariant

assert(bit_offset_ < CHAR_BIT);

It's also more important to assert the invariant after breaking it temporarily as we can "hopefully" assume that other public member functions also maintain the invariant.

oliverlee · 2023-10-17T05:53:02Z

huffman/src/bit_span.hpp

+
+  /// Consumes the given number of bits. Advances the start of the view.
+  ///
+  /// @pre n <= std::ranges::size(this)


Suggested change

/// @pre n <= std::ranges::size(this)

/// @pre n <= std::ranges::size(*this)

oliverlee · 2023-10-17T06:12:48Z

huffman/src/bit_span.hpp

@@ -113,5 +116,59 @@ class bit_span : public std::ranges::view_interface<bit_span>
  {
    return iterator{*this, bit_offset_ + bit_size_};
  };
+
+  template <class T>


I'm not completely sure, but we should constrain T to be a scalar type to prevent lifetime issues. Scalar types are included in implicit-lifetime types.

https://en.cppreference.com/w/cpp/string/byte/memcpy

If the objects are potentially-overlapping or not TriviallyCopyable, the behavior of memcpy is not specified and may be undefined.

https://wg21.link/p2590

oliverlee · 2023-10-17T06:21:15Z

huffman/src/bit_span.hpp

+    std::advance(data_, sizeof(T));
+    bit_size_ -= sizeof(T) * CHAR_BIT;
+    if constexpr (std::endian::native == std::endian::big) {
+      res = std::byteswap(res);


We should simply constrain T to be integral otherwise use of std::byteswap can result in a hard error.

https://en.cppreference.com/w/cpp/numeric/byteswap

hard error: causes compilation to fail
soft error: causes the compiler to discard a template from a set of candidates for overload resolution (without making compilation fail and enabling the well-known SFINAE idiom)

https://stackoverflow.com/questions/15260685/what-exactly-is-the-immediate-context-mentioned-in-the-c11-standard-for-whic

oliverlee · 2023-10-17T06:34:23Z

huffman/test/bit_span_test.cpp

+    // NOLINTBEGIN(readability-magic-numbers)
+    static constexpr std::array data{
+        std::byte{0b10101010}, std::byte{0b01010101}};
+    huffman::bit_span span{data.data(), data.size() * CHAR_BIT};


The constructor on bit_span.hpp:104 should allow this:

Suggested change

huffman::bit_span span{data.data(), data.size() * CHAR_BIT};

huffman::bit_span span{data};

oliverlee · 2023-10-17T06:43:05Z

huffman/test/bit_span_test.cpp

+    expect(*span.begin() == 0_b);
+    expect(span.data() == data.data());
+    // should be a no-op now.
+    span.consume_to_byte_boundary();


I think this should be tested in a separate test case.

oliverlee · 2023-10-17T07:07:48Z

huffman/test/bit_span_test.cpp

+    span.consume_to_byte_boundary();
+    expect(*span.begin() == 0_b);
+    expect(span.data() == data.data());
+
+    span.consume(1);
+    expect(*span.begin() == 1_b);
+    expect(span.data() == data.data());
+
+    span.consume(1);
+    expect(*span.begin() == 0_b);
+    expect(span.data() == data.data());


two notes:

I think having a lot of mutation in a test makes it slower and/or harder to understand due to the amount of history I need to track. I think this would be easier to read with a value parameterized test such as:

test("consume") = [](auto n) { static constexpr std::array data{ std::byte{0b10101010}, std::byte{0b01010101}}; static constexpr auto nth_bit = [](auto m) { const auto i = m < 8U; return huffman::bit{ std::bitset<CHAR_BIT>{std::to_integer<int>(data[i])}[m]}; }; auto bits = huffman::bit_span{data}; bits.consume(n); expect(nth_bit(n) == bits[0]); expect(CHAR_BIT * data.size() - n == bits.size()); } | std::views::iota(0U, 16U);

I don't think we should have data() return a pointer to a byte if there is a non-zero bit offset. data() should always return the first element of the range. If not, the semantics of this type differ slightly from the semantics of the standard library which can cause all sort of headaches down the line.

https://en.cppreference.com/w/cpp/ranges/data

oliverlee · 2023-10-17T07:12:55Z

huffman/test/bit_span_test.cpp

+    expect(*span.begin() == 0_b);
+
+    span.consume(1);
+    // span is now empty. Accssing .begin() would be undefined behavior.


This is actually fine and you can test it out with a constant expression.

constexpr auto bits = huffman::bit_span{nullptr, 0}; static_assert(bits.begin() == bits.end());

If it compiles, it's defined behavior.

However, dereferencing begin() is UB.

src/test/decompress_test.cpp

garymm · 2023-10-18T20:34:53Z

Moved bit_span changes to #107.
Putting this in draft mode for now.

Change-Id: I5ceb11f5b6ba0ef63e250757747dab79c7958653

garymm force-pushed the I5ceb11f5b6ba0ef63e250757747dab79c7958653 branch from f12df33 to abc46a5 Compare October 14, 2023 04:14

garymm changed the base branch from I65fd59b5163ca3cb128c830e9bd75f7ba7440940 to I916c03915e7319f42a4c8a0ca11106097d5b41db October 14, 2023 04:14

oliverlee reviewed Oct 14, 2023

View reviewed changes

garymm changed the title ~~WIP decompress block type 00~~ decompress block type 00 - no compression Oct 14, 2023

garymm changed the base branch from I916c03915e7319f42a4c8a0ca11106097d5b41db to master October 14, 2023 22:07

garymm force-pushed the I5ceb11f5b6ba0ef63e250757747dab79c7958653 branch from abc46a5 to 058a766 Compare October 14, 2023 22:07

garymm marked this pull request as ready for review October 14, 2023 22:09

garymm force-pushed the I5ceb11f5b6ba0ef63e250757747dab79c7958653 branch from 058a766 to 05cdb7f Compare October 16, 2023 17:30

garymm changed the base branch from master to I9f8a0329e1d37ce760ffb939ce329624872528db October 16, 2023 17:30

garymm force-pushed the I5ceb11f5b6ba0ef63e250757747dab79c7958653 branch from 05cdb7f to fc2fb2a Compare October 17, 2023 04:21

oliverlee reviewed Oct 17, 2023

View reviewed changes

Base automatically changed from I9f8a0329e1d37ce760ffb939ce329624872528db to master October 18, 2023 18:52

oliverlee mentioned this pull request Oct 18, 2023

I must have missed the notifications for this. I can try to write something that does this. It will speed up the debugging process which will we probably encounter again. #106

Closed

garymm force-pushed the I5ceb11f5b6ba0ef63e250757747dab79c7958653 branch from fc2fb2a to fc91a38 Compare October 18, 2023 20:21

garymm changed the base branch from master to I4ba8659230270f8e571fe70444d9f5c629a888a3 October 18, 2023 20:21

garymm force-pushed the I5ceb11f5b6ba0ef63e250757747dab79c7958653 branch from fc91a38 to 515886f Compare October 18, 2023 20:34

garymm force-pushed the I4ba8659230270f8e571fe70444d9f5c629a888a3 branch from 362836f to e14287e Compare October 18, 2023 20:34

garymm marked this pull request as draft October 18, 2023 20:34

garymm force-pushed the I5ceb11f5b6ba0ef63e250757747dab79c7958653 branch from 515886f to 750eb52 Compare October 20, 2023 00:00

garymm force-pushed the I4ba8659230270f8e571fe70444d9f5c629a888a3 branch from e14287e to f914181 Compare October 20, 2023 00:00

Base automatically changed from I4ba8659230270f8e571fe70444d9f5c629a888a3 to master October 20, 2023 00:20

garymm marked this pull request as ready for review October 22, 2023 03:40

oliverlee approved these changes Oct 22, 2023

View reviewed changes

garymm force-pushed the I5ceb11f5b6ba0ef63e250757747dab79c7958653 branch 2 times, most recently from 76331ae to 6790495 Compare October 30, 2023 02:38

decompress block type 00 - no compression

c3492e8

Change-Id: I5ceb11f5b6ba0ef63e250757747dab79c7958653

garymm force-pushed the I5ceb11f5b6ba0ef63e250757747dab79c7958653 branch from 6790495 to c3492e8 Compare October 30, 2023 03:00

garymm enabled auto-merge (squash) October 30, 2023 03:00

garymm merged commit 61f7f2b into master Oct 30, 2023
28 checks passed

garymm deleted the I5ceb11f5b6ba0ef63e250757747dab79c7958653 branch October 30, 2023 03:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

decompress block type 00 - no compression #83

decompress block type 00 - no compression #83

garymm commented Sep 19, 2023 •

edited

Loading

oliverlee Oct 14, 2023

garymm Oct 17, 2023

oliverlee Oct 14, 2023

oliverlee Oct 14, 2023 •

edited

Loading

oliverlee Oct 14, 2023

oliverlee Oct 14, 2023

oliverlee Oct 14, 2023

garymm Oct 17, 2023

garymm commented Oct 14, 2023

codecov bot commented Oct 14, 2023 •

edited

Loading

garymm commented Oct 17, 2023

oliverlee left a comment

oliverlee Oct 17, 2023

oliverlee Oct 17, 2023

oliverlee Oct 17, 2023

oliverlee Oct 17, 2023

oliverlee Oct 17, 2023

oliverlee Oct 17, 2023

oliverlee Oct 17, 2023 •

edited

Loading

oliverlee Oct 17, 2023

garymm commented Oct 18, 2023

	std::copy_n(compressed.data(), len, std::back_inserter(decompressed));
	decompressed.resize(len);
	std::copy_n(compressed.data(), len, decompressed.begin());

	/// @pre n <= std::ranges::size(this)
	/// @pre n <= std::ranges::size(*this)

	huffman::bit_span span{data.data(), data.size() * CHAR_BIT};
	huffman::bit_span span{data};

decompress block type 00 - no compression #83

decompress block type 00 - no compression #83

Conversation

garymm commented Sep 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oliverlee Oct 14, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

garymm commented Oct 14, 2023

codecov bot commented Oct 14, 2023 • edited Loading

Codecov Report

garymm commented Oct 17, 2023

oliverlee left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oliverlee Oct 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

garymm commented Oct 18, 2023

garymm commented Sep 19, 2023 •

edited

Loading

oliverlee Oct 14, 2023 •

edited

Loading

codecov bot commented Oct 14, 2023 •

edited

Loading

oliverlee Oct 17, 2023 •

edited

Loading