Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding leads to other result than encoding with original c-source #1

Closed
Krensi opened this issue Aug 29, 2023 · 12 comments
Closed

Encoding leads to other result than encoding with original c-source #1

Krensi opened this issue Aug 29, 2023 · 12 comments

Comments

@Krensi
Copy link

Krensi commented Aug 29, 2023

Hi!

I wanted to use this library for decoding heatshrink compressed data. However, I was always getting an IllegalBackref error.
I decided to compare the encoding and decoding using the CLI (https://github.com/atomicobject/heatshrink) and the test data of the alpha test.
So I guess something is wrong with the encoding? In both cases, a window size of 11 and a lookahead of 4 is used.

Result alpha test:
image

Result using CLI with same data:
image

Maybe you can tell me what I made wrong.
Regards, Christian

@snakehand
Copy link
Owner

Does the decompress of both encoding yield the original data ? If so it could be that the string search has picked different references to identical substrings, which should not affect the correctness of the library. If there is discrepancies in the decompressed data I will have to investigate.

@Krensi
Copy link
Author

Krensi commented Sep 9, 2023

That's the result of test::alpha:

running 1 test
Encoded: 90d4b2b549a40a00001e001f00c9811b7ca05f1817c002da5f04025f0005
Decoded: 215295543402000000000000000000000000000000000000000000000000000000000000000000009302000000000000f202f102f0020000000000002f0400000000000000000000000000000000000000000000
test test::alpha ... ok

Decoding 90d4b2b549a40a00001e001f00c9811b7ca05f1817c002da5f04025f0005 using the C-Code leads to
215295543402000000000000000000000000000000000000000000000000000000000000000000009302000000000000F202F102F0020000000000002F0400000000000000000000000000000000000000000000 which is actually the same result

Here is my nushell output

heatshrink on  master [?] via C v12.2.0-gcc
❯ : (heatshrink -w 11 -l 4 -d in_compressed_dp.bin | into binary) == (0x[215295543402000000000000000000000000000000000000000000000000000000000000000000009302000000000000f202f102f0020000000000002f0400000000000000000000000000000000000000000000] | into binary)
true

When I encrypt 215295543402000000000000000000000000000000000000000000000000000000000000000000009302000000000000F202F102F0020000000000002F0400000000000000000000000000000000000000000000 using the C-Code the result is 90D4B2B549A408057C003E0100C9811B7CA05F1817C002DA5F04025F0005.

Trying to decode this result with this library leads to a illegal backref error. The code of the test is below and the a screenshot of the result as well.

    #[test]
    fn decode() {
        let src = hex_literal::hex!("90D4B2B549A408057C003E0100C9811B7CA05F1817C002DA5F04025F0005");
        let cfg = Config::new(11, 4).unwrap();
        let mut dst1 = [0; 100];

        decoder::decode(&src, &mut dst1, &cfg).unwrap();
    }

image

Greetings, Christian

@snakehand
Copy link
Owner

I found the problem. The C version fills a window buffer with 0 prior to compression, and allows searching in the full window. This means that the first run of 0s where referenced from before the start of input data. I have removed the IllegalBackref error and made it fetch 0 bytes instead to maintain compatibility. I have also added unit tests + fuzzing to verify this fix. Thanks for reporting this issue. I will close it once I have done some more testing and pushed an updated version.

@snakehand
Copy link
Owner

Fixed with release of version 0.2

@Krensi
Copy link
Author

Krensi commented Sep 10, 2023

Thank you for the quick fix! :-)

@snakehand
Copy link
Owner

snakehand commented Sep 10, 2023 via email

@Krensi
Copy link
Author

Krensi commented Sep 11, 2023

We have some IoT devices in my company, and we basically use heatshrink for OTA firmware update file compression. We generate diff files using the jdiff algorithm and apply heatshrink afterward, which leads to an additional size reduction of 20% to 30%.
Recently, I was developing a new protocol for transmitting data from the device to our server. I am a big fan of google protobuf and also wanted to see if heatshrink could lead to additional improvements in data size. Indeed, depending on the data entropy, it does sometimes.
I am a firmware developer, and I use the original C-Code of heatshrink in our firmware. However, I like to have tools for testing, which is the reason why I always write my parsers (recently in rust). To provide the parser functionality to our node based web backend, I used wasmpack and compiled my parser to wasm as you already guessed. :-)
This is how I stumbled upon this issue. To fix my use case I decided to use the C-Code which was not as easy as expected, because there are dependencies to <stdlib.h> and so on. This, apparently, is not possible using wasmpack and wasm-unknown-unknown so I manually removed the dependencies to the standard library. But now I can use your lib instead, which I prefer. :-D
So if you could provide me some guidance how you would possibly manage compiling a C codebase with such dependencies to wasm, I would be glad. As already mentioned we use jdiff https://jojodiff.sourceforge.net/, and I am eager to either port it to rust, or include it in a sys-crate somehow.

Greetings,
Christian

@huming2207
Copy link

huming2207 commented Sep 11, 2023

I am a firmware developer, and I use the original C-Code of heatshrink in our firmware. However, I like to have tools for testing, which is the reason why I always write my parsers (recently in rust). To provide the parser functionality to our node based web backend, I used wasmpack and compiled my parser to wasm as you already guessed. :-)
This is how I stumbled upon this issue. To fix my use case I decided to use the C-Code which was not as easy as expected, because there are dependencies to <stdlib.h> and so on. This, apparently, is not possible using wasmpack and wasm-unknown-unknown so I manually removed the dependencies to the standard library. But now I can use your lib instead, which I prefer. :-D

I have exactly the same reason for making my NodeJS binding. I'm also planning to make an another WebAssembly one for web browsers, that takes asset files in, bundles and creates a FAT32 image and compress it with Heatshrink, then push to an embedded target (ESP32 and RAM constrainted Linux boards) for further use.

@Krensi
Copy link
Author

Krensi commented Sep 11, 2023

I am a firmware developer, and I use the original C-Code of heatshrink in our firmware. However, I like to have tools for testing, which is the reason why I always write my parsers (recently in rust). To provide the parser functionality to our node based web backend, I used wasmpack and compiled my parser to wasm as you already guessed. :-)
This is how I stumbled upon this issue. To fix my use case I decided to use the C-Code which was not as easy as expected, because there are dependencies to <stdlib.h> and so on. This, apparently, is not possible using wasmpack and wasm-unknown-unknown so I manually removed the dependencies to the standard library. But now I can use your lib instead, which I prefer. :-D

I have exactly the same reason for making my NodeJS binding. I'm also planning to make an another WebAssembly one for web browsers, that takes asset files in, bundles and creates a FAT32 image and compress it with Heatshrink, then push to an embedded target (ESP32 and RAM constrainted Linux boards) for further use.

So if you can maybe provide some guidance relating to wasm and wasi I would be grateful! Or do you plan to write everything in pure rust? I discovered, that also using encrypt/decrypt https://crates.io/crates/aes-gcm is not straightforward in WASM context.

Greetings, Christian

@huming2207
Copy link

So if you can maybe provide some guidance relating to wasm and wasi I would be grateful!

@Krensi maybe try Zig? I remember Zig can handle some cross-compiling and binding stuff for C code, and it supports WASM/WASI as well. You may need to implement some glue code in Zig, expose the API you want and put in to your project.

@Krensi
Copy link
Author

Krensi commented Sep 12, 2023

@Krensi maybe try Zig? I remember Zig can handle some cross-compiling and binding stuff for C code, and it supports WASM/WASI as well. You may need to implement some glue code in Zig, expose the API you want and put in to your project.

I am interested in Zig anyway so maybe I will give it a try. Anyway keep me updated about your wasm project if you plan to release it on GitHub. Sounds interesting! I am looking for projects to contribute because it is a great way of learning rust.

@Krensi
Copy link
Author

Krensi commented Sep 12, 2023

-- Off topic --

https://wiki.seeedstudio.com/SeeedStudio_XIAO_Series_Introduction/

These are great! I can really recommend them

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants