Question regarding Lz4net archived repo #83

Tarcontar · 2023-06-07T13:21:03Z

Hi,

some legacy software I try to stay compatible with uses the archived version of lz4net.
Now I am trying to write a rust program which compresses data with the lz4 crate which will be then decompressed with that legacy software.
There is no way to update the depencency on lz4net so I have to adapt my rust code accordingly.
Is there documentation what exact settings lz4net uses for compression and decompression so that I can adapt to that?

Best regards and thanks in advance
Tarcontar

MiloszKrajewski · 2023-06-07T14:16:24Z

So this is touching relatively common problem.

Let me start with some analogy. Imagine two systems which need to exchange a lot of numbers. The problem is one is producing a text file with numbers separated by commas and the other one is expecting them to be separated by tabs.
The numbers are encoded exactly the same way, but the separators are different.

There is a BLOCK format and STREAM format.
BLOCKs are the the data, the meat, the numbers, while STREAM is wrapping all block together with all headers, commas and/or tabs.

A STREAM contains a lot of BLOCKs. STREAM has stream header, block header, BLOCK, block header, BLOCK, block header, BLOCK, etc...

Both stream and block headers are quite small (let's say 8 bytes), while block carry data and are large (64K - 4MB). Make sense?

SH BH BLOCK BH BLOCK BH BLOCK BH BLOCK BH BLOCK

Stream header carries information what is the overall length of the stream, what compression method was used, what was default block size, etc. Block header say something about block, how many bytes it actually had before compression, and how many bytes after compression, etc. And then there is a BLOCK of compressed LZ4 data.

BLOCK format in lz4net is absolutely the same any any other implementation of LZ4, but STREAM format is not. It is custom wrapping of LZ4 blocks.

You did not give me enough information about how lz4net is used, and lz4net had already two APIs: BLOCK and (custom) STREAM. If it expecting just BLOCK then it is almost trivial. If it expects STREAM you would need to implement your own code for headers (~8 bytes) but you can copy blocks without any modifications.

You will need to implement this file in Rust:
https://github.com/MiloszKrajewski/lz4net/blob/master/src/LZ4/LZ4Stream.cs

(If you need to implement more it means you are doing something wrong)

Also it will be even less if Rust just need to write or just read, as you need to implement only half of this custom stream handling.

MiloszKrajewski · 2023-06-11T11:18:52Z

I assume there are no further questions.

Michaelschnabel-DM · 2023-06-11T16:41:49Z

Hi,
sorry for not responding on time. I was on a festival and had no internet connection the last few days.

Thank you very much for your response!

This is the code to compress and decompress my byte arrays with the legacy lz4net nuget package:

Is there the Stream format involved or should it be only blocks with block headers?

Please let me know if you need any further information.

Tarcontar · 2023-06-12T07:48:17Z

Ahh used the wrong account to respond sorry.
And this is how i encode it in rust using the lz4 crate:

when i try to decode this with the c# code above i get a "LZ4 block is corrupted or invalid length has been given" but the length is correct, I just checked that.

Regards
Tarcontar

MiloszKrajewski · 2023-06-12T12:27:19Z

Good news first: as your C# code was using BLOCK (the one with raw data) mode only (which is identical) it should not be a problem.
Bad news: can you change C# code? your Encode is killing all potential performance benefits with Concat? Are you able to modify this code, or it has been shipped and forgotten?
Bad news: you are using STREAM (the one with headers) mode in Rust.
Good news: although I don't know details, BLOCK mode must be available in Rust (because it is used by STREAM mode anyway), you just used wrong set of methods.

Anyway, it should be simple and not too much code. If you case about performance of C# Encode (Decode is ok) you should tweak C# code a little bit, and in Rust you need to find BLOCK mode API (not STREAM).
If you need help I can definitely improve it, guessing roughly 10x (byte array allocation with GetBytes and Concat is absolute killer).

Looking at this:
https://docs.rs/crate/lz4/latest/source/src/block/mod.rs

it seems you need: compress_to_buffer and decompress_to_buffer

DO NOT use: prepend_size: true because this is not part of BLOCK mode specification, it seems it is just some extension specific to this library.

Tarcontar · 2023-06-12T21:44:39Z

Hi,

thank you so much for your help!
I got it working with using the block compression only from lz4 rust crate as follows:

fn decode(&self, in_data: Vec<u8>) -> Vec<u8> {
        let mut array: [u8; BYTE_LENGTH_OF_UINT32] = [0; BYTE_LENGTH_OF_UINT32];
        array.copy_from_slice(&in_data[0..BYTE_LENGTH_OF_UINT32]);
        let _compressed_size = u32::from_le_bytes(array);
        array.copy_from_slice(&in_data[BYTE_LENGTH_OF_UINT32..2 * BYTE_LENGTH_OF_UINT32]);
        let uncompressed_size = u32::from_le_bytes(array);

        decompress(&in_data[2 * BYTE_LENGTH_OF_UINT32..in_data.len()], Some(uncompressed_size as i32)).unwrap()
    }

    fn encode(&self, in_data: Vec<u8>) -> Vec<u8> {
        let original_size = in_data.len() as u32;
        let compressed_data = compress(&in_data, Some(CompressionMode::DEFAULT), false).unwrap();
        let compressed_size = (compressed_data.len() + 4) as u32;

        let compressed = compressed_size.to_le_bytes();
        let original = original_size.to_le_bytes();

        [[compressed, original].concat(), compressed_data].concat()
    }

Thank you very much for your time and effort!

Best Regards
Tarcontar

MiloszKrajewski added the question Further information is requested label Jun 7, 2023

MiloszKrajewski closed this as completed Jun 11, 2023

MiloszKrajewski reopened this Jun 12, 2023

Tarcontar closed this as completed Jun 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question regarding Lz4net archived repo #83

Question regarding Lz4net archived repo #83

Tarcontar commented Jun 7, 2023

MiloszKrajewski commented Jun 7, 2023 •

edited

Loading

MiloszKrajewski commented Jun 11, 2023

Michaelschnabel-DM commented Jun 11, 2023

Tarcontar commented Jun 12, 2023

MiloszKrajewski commented Jun 12, 2023 •

edited

Loading

Tarcontar commented Jun 12, 2023

Question regarding Lz4net archived repo #83

Question regarding Lz4net archived repo #83

Comments

Tarcontar commented Jun 7, 2023

MiloszKrajewski commented Jun 7, 2023 • edited Loading

MiloszKrajewski commented Jun 11, 2023

Michaelschnabel-DM commented Jun 11, 2023

Tarcontar commented Jun 12, 2023

MiloszKrajewski commented Jun 12, 2023 • edited Loading

Tarcontar commented Jun 12, 2023

MiloszKrajewski commented Jun 7, 2023 •

edited

Loading

MiloszKrajewski commented Jun 12, 2023 •

edited

Loading