Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question regarding Lz4net archived repo #83

Closed
Tarcontar opened this issue Jun 7, 2023 · 6 comments
Closed

Question regarding Lz4net archived repo #83

Tarcontar opened this issue Jun 7, 2023 · 6 comments
Labels
question Further information is requested

Comments

@Tarcontar
Copy link

Hi,

some legacy software I try to stay compatible with uses the archived version of lz4net.
Now I am trying to write a rust program which compresses data with the lz4 crate which will be then decompressed with that legacy software.
There is no way to update the depencency on lz4net so I have to adapt my rust code accordingly.
Is there documentation what exact settings lz4net uses for compression and decompression so that I can adapt to that?

Best regards and thanks in advance
Tarcontar

@MiloszKrajewski
Copy link
Owner

MiloszKrajewski commented Jun 7, 2023

So this is touching relatively common problem.

Let me start with some analogy. Imagine two systems which need to exchange a lot of numbers. The problem is one is producing a text file with numbers separated by commas and the other one is expecting them to be separated by tabs.
The numbers are encoded exactly the same way, but the separators are different.

There is a BLOCK format and STREAM format.
BLOCKs are the the data, the meat, the numbers, while STREAM is wrapping all block together with all headers, commas and/or tabs.

A STREAM contains a lot of BLOCKs. STREAM has stream header, block header, BLOCK, block header, BLOCK, block header, BLOCK, etc...

Both stream and block headers are quite small (let's say 8 bytes), while block carry data and are large (64K - 4MB). Make sense?

SH BH BLOCK BH BLOCK BH BLOCK BH BLOCK BH BLOCK

Stream header carries information what is the overall length of the stream, what compression method was used, what was default block size, etc. Block header say something about block, how many bytes it actually had before compression, and how many bytes after compression, etc. And then there is a BLOCK of compressed LZ4 data.

BLOCK format in lz4net is absolutely the same any any other implementation of LZ4, but STREAM format is not. It is custom wrapping of LZ4 blocks.

You did not give me enough information about how lz4net is used, and lz4net had already two APIs: BLOCK and (custom) STREAM. If it expecting just BLOCK then it is almost trivial. If it expects STREAM you would need to implement your own code for headers (~8 bytes) but you can copy blocks without any modifications.

You will need to implement this file in Rust:
https://github.com/MiloszKrajewski/lz4net/blob/master/src/LZ4/LZ4Stream.cs

(If you need to implement more it means you are doing something wrong)

Also it will be even less if Rust just need to write or just read, as you need to implement only half of this custom stream handling.

@MiloszKrajewski MiloszKrajewski added the question Further information is requested label Jun 7, 2023
@MiloszKrajewski
Copy link
Owner

I assume there are no further questions.

@Michaelschnabel-DM
Copy link

Hi,
sorry for not responding on time. I was on a festival and had no internet connection the last few days.

Thank you very much for your response!

This is the code to compress and decompress my byte arrays with the legacy lz4net nuget package:
Bildschirmfoto 2023-06-11 um 18 39 35

Is there the Stream format involved or should it be only blocks with block headers?

Please let me know if you need any further information.

@Tarcontar
Copy link
Author

Ahh used the wrong account to respond sorry.
And this is how i encode it in rust using the lz4 crate:
Bildschirmfoto 2023-06-12 um 09 13 28

when i try to decode this with the c# code above i get a "LZ4 block is corrupted or invalid length has been given" but the length is correct, I just checked that.

Regards
Tarcontar

@MiloszKrajewski
Copy link
Owner

MiloszKrajewski commented Jun 12, 2023

  • Good news first: as your C# code was using BLOCK (the one with raw data) mode only (which is identical) it should not be a problem.
  • Bad news: can you change C# code? your Encode is killing all potential performance benefits with Concat? Are you able to modify this code, or it has been shipped and forgotten?
  • Bad news: you are using STREAM (the one with headers) mode in Rust.
  • Good news: although I don't know details, BLOCK mode must be available in Rust (because it is used by STREAM mode anyway), you just used wrong set of methods.

Anyway, it should be simple and not too much code. If you case about performance of C# Encode (Decode is ok) you should tweak C# code a little bit, and in Rust you need to find BLOCK mode API (not STREAM).
If you need help I can definitely improve it, guessing roughly 10x (byte array allocation with GetBytes and Concat is absolute killer).

Looking at this:
https://docs.rs/crate/lz4/latest/source/src/block/mod.rs

it seems you need: compress_to_buffer and decompress_to_buffer

DO NOT use: prepend_size: true because this is not part of BLOCK mode specification, it seems it is just some extension specific to this library.

@Tarcontar
Copy link
Author

Hi,

thank you so much for your help!
I got it working with using the block compression only from lz4 rust crate as follows:

fn decode(&self, in_data: Vec<u8>) -> Vec<u8> {
        let mut array: [u8; BYTE_LENGTH_OF_UINT32] = [0; BYTE_LENGTH_OF_UINT32];
        array.copy_from_slice(&in_data[0..BYTE_LENGTH_OF_UINT32]);
        let _compressed_size = u32::from_le_bytes(array);
        array.copy_from_slice(&in_data[BYTE_LENGTH_OF_UINT32..2 * BYTE_LENGTH_OF_UINT32]);
        let uncompressed_size = u32::from_le_bytes(array);

        decompress(&in_data[2 * BYTE_LENGTH_OF_UINT32..in_data.len()], Some(uncompressed_size as i32)).unwrap()
    }

    fn encode(&self, in_data: Vec<u8>) -> Vec<u8> {
        let original_size = in_data.len() as u32;
        let compressed_data = compress(&in_data, Some(CompressionMode::DEFAULT), false).unwrap();
        let compressed_size = (compressed_data.len() + 4) as u32;

        let compressed = compressed_size.to_le_bytes();
        let original = original_size.to_le_bytes();

        [[compressed, original].concat(), compressed_data].concat()
    }

Thank you very much for your time and effort!

Best Regards
Tarcontar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants