Update README.md

richgel999 · Apr 27, 2023 · ec3cce1 · ec3cce1
1 parent 70de2ab
commit ec3cce1
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/README.md b/README.md
@@ -11,7 +11,7 @@ Disadvantages vs. rANS: less precise, slower encode (ultimately due to the post-
 
 - The vectorized decoder uses 16 interleaved streams (in 4 groups of 4 lanes). 24-bit integers are used to enable using fast precise integer vectorized divides with `_mm_div_ps`, which is crucial for performance. The performance and practicality of a vectorized range decoder like this is highly dependent (really, completely lives and dies!) on the availability and performance of fast hardware division. This implementation specifically uses 24-bit integers, otherwise the results from `_mm_div_ps` (with a subsequent conversion back to int with truncation) wouldn't be accurate. After many experiments, this is the only way I could find to make this decoder competitive. 
 - Using 24-bit ints sacrifices some small amount of coding efficiency (a small fraction of a percent), but compared to length-limited Huffman coding it's still more efficient. The test app displays the theoretical file entropy along with the # of bytes it would take to encode the input using [Huffman coding](https://en.wikipedia.org/wiki/Huffman_coding) with the [Package Merge algorithm](https://create.stephan-brumme.com/length-limited-prefix-codes/) at various maximum code lengths, for comparison purposes.
-- The encoder swizzles each individual range encoder's output bytes into the proper order right after compression. No special signaling or sideband information is needed between the encoder and decoder, because it's easy to predict how many bytes will be fetched from each stream during each coding/decoding step. This post-compression byte swizzling step is an annoying cost that rANS doesn't pay. I'm unsure if this step can be further optimized.
+- The encoder swizzles each individual range encoder's output bytes into the proper order right after compression. No special signaling or sideband information is needed between the encoder and decoder, because it's easy to predict how many bytes will be fetched from each stream during each coding/decoding step. (Notably, at each encode step you can record the # of bytes flushed to the output, which in this implementation is always [0,2] bytes per step. The decoder always reads the same # of bytes from the stream as the encoder wrote for that step, but from a different offset.) This post-compression byte swizzling step is an annoying cost that rANS doesn't pay. I'm unsure if this step can be further optimized.
 - The decoder is safe against accidental or purposeful corruption, i.e. it shouldn't ever read past the end of the input buffer or crash on invalid/corrupt inputs. I am still testing this, however. 
 - The encoder is not optimized yet: just the vectorized decoder, which is my primary concern.