Skip to content

Releases: PSeitz/lz4_flex

0.11

19 Jun 03:34
Compare
Choose a tag to compare

Documentation

  • Docs: add decompress block example

Fixes

  • Handle empty input in Frame Format #120
Empty input was ignored previously and didn't write anything. Now an empty Frame is written. This improves compatibility with the reference implementation and some corner cases.
  • Fix: Small dict leads to panic #133
compress_into_with_dict panicked when the dict passed was smaller than 4 bytes. A match has the minimum length of 4 bytes, smaller dicts will be ignored now.

Features

  • [breaking] invert checked-decode to unchecked-decode #134
invert `checked-decode` feature flag to `unchecked-decode`
Previously setting `default-features=false` removed the bounds checks from the
`checked-decode` feature flag. `unchecked-decode` inverts this, so it will needs to be
deliberately deactivated.

To migrate, just remove the `checked-decode` feature flag.
  • Allow to pass buffer larger than size #78
This removes an unnecessary check in the decompression, when the passed buffer is too big.
  • Add auto_finish to FrameEncoder #95 #100
Empty input was ignored previously and didn't write anything. Now an empty Frame is written. This improves compatibility with the reference implementation and some corner cases.
  • Autodetect frame blocksize #81
The default blocksize of FrameInfo is now auto instead of 64kb, it will detect the blocksize
depending of the size of the first write call. This increases
compression ratio and speed for use cases where the data is larger than
64kb.
This adds in fluent API style construction for FrameInfo. Now you can do

let info = FrameInfo::new()
    .block_size(BlockSize::Max1MB)
    .content_checksum(true);

Performance

Replace calls to memcpy with custom function
  • Perf: optimize wildcopy #109
The initial check in the the 16 byte wild copy is unnecessary, since it is already done before calling the method.
  • Perf: faster duplicate_overlapping #114
Replace the aggressive compiler unrolling after the
failed attempt #69 (wrote out of bounds in some cases)

The unrolling is avoided by manually unrolling less aggressive.
Decompression performance is slightly improved by ca 4%, except the
smallest test case.

  • Perf: simplify extend_from_within_overlapping #72
extend_from_within_overlapping is used in safe decompression when
overlapping data has been detected. The prev version had unnecessary
assertions/safe guard, since this method is only used in safe code.
Removing the temporary &mut slice also simplified assembly output.

uiCA Code Analyzer

Prev
Tool 	    Skylake	IceLake 	Tiger Lake 	Rocket Lake
uiCA Cycles 28.71 	30.67 		28.71 		27.57

Simplified
Tool 	    Skylake	IceLake 	TigerLake 	Rocket Lake
uiCA Cycles 13.00 	15.00 		13.00 		11.00
  • Perf: remove unnecessary assertions
those assertions are only used in safe code and therefore unnecessary
  • Perf: improve safe decompression performance 8-18% #73
Improve safe decompression speed by 8-18%

Reduce multiple slice fetches. every slice access, also nested ones
, carries some overhead. In the hot loop a fixed &[u8;16] is fetched to
operate on. This is purely done to pass that info to the compiler.

Remove error handling that only carries overhead. As we are in safe
mode we can rely on bounds checks if custom error handling only adds overhead.
In normal operation no error should occur.

The strategy to identify improvements was by counting the lines of
assembly. A rough heuristic, but seems effective.
cargo asm --release --example decompress_block decompress_block::main |
wc -l
  • Perf: improve safe frame compression performance 7-15% #74
The frame encoding uses a fixed size hashtable.
By creating a special hashtable with a Box<[u32; 4096]> size,
in combination with the bit shift of 4, which is also moved into a constant,
the compiler can remove the bounds checks.
For that to happen, the compiler also needs to recognize the `>> 48` right
shift from the hash algorithm (u64 >> 52 <= 4096), which is the case. Yey

It also means we can use less `unsafe` for the unsafe version
  • Perf: switch to use only 3 kinds of hashtable #77
use only hashtables with fixed sizes and bit shifts, that allow to
remove bounds checks.

Refactor

  • Refactor: remove VecSink #71
remove VecSink since it can be fully replaced with a slice
this will reduce code bloat from generics

Testing

  • Tests: add proptest roundtrip #69

New Contributors

Full Changelog: 0.10...0.11

lz4_flex 0.10

30 Jan 10:11
Compare
Choose a tag to compare

What's Changed

  • Add support of decoding legacy frames by @yestyle in #66

New Contributors