Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determinstic compression #140

Open
showvick opened this issue May 27, 2020 · 12 comments
Open

Determinstic compression #140

showvick opened this issue May 27, 2020 · 12 comments
Labels

Comments

@showvick
Copy link

showvick commented May 27, 2020

We see great results with ISAL hence we are planning on adding it to our application. One of our requirements is that we get the same compressed output if we call ISAL compress multiple times on the same payload data. With ZLIB we always get the same compressed output. Will we get the same output every time or we can get different compressed output for the same payload? Also if the compressed output is different can the length be different?

As an example first we compress payload A and get B. If we compress payload A again, will be get B again?
Case 1 :- compress (A) --> B
Case 2 (Recovery) :- compress (A) --> will be B as above?

Appreciate your help.

We use the following code to compress data

        isal_deflate_stateless_init(&isalstream);
        isalstream.end_of_stream = 1;
        isalstream.flush = NO_FLUSH;
        isalstream.next_in = (Bytef *)cmpinstart_p;
        isalstream.avail_in = bytestocompress;
        isalstream.next_out = (Bytef *)cmpoutstart_p;
        isalstream.avail_out = length;
        isalstream.gzip_flag = IGZIP_ZLIB;

        isalstream.level = 2;
        isalstream.level_buf = blkalloc(NULL, 1, ISAL_DEF_LVL2_DEFAULT);
        isalstream.level_buf_size = ISAL_DEF_LVL2_DEFAULT;

        zret = isal_deflate_stateless(&isalstream);
@gbtucker
Copy link
Contributor

Thanks for the feedback. Yes the stateless will be deterministic. Within the same ISA-L version and using the same input and parameters, you should get the same output.

If you haven't already, you may consider pre-allocating the level_buf and reusing the same buffer on subsequent compression runs. This is just for speed of course. It doesn't change the deterministic behavior of the output.

Please let us know when you have finished integration.

@showvick
Copy link
Author

Great! Thanks for the prompt response.

@rjoursler
Copy link
Contributor

There is one caveat with this response, the compression output may differ if you run the compression on different cpu architectures. In order to get the best performance, algorithms are modified to obtain optimal performance based on the resources available on a given cpu architecture.

@showvick
Copy link
Author

Thanks for the response.

This would be on the same machine. We use the same CPUs. It is for our recovery logic. We use write ahead logging. So runtime code generated a compressed output then the recovery logic would recompress the data. We expect the same output.

So, to confirm, on the same box(same cpus) with same ISA-L version and using the same input and parameters, we should get the same output?

@rjoursler
Copy link
Contributor

Correct.

@showvick
Copy link
Author

We integrated ISAL into our database, but we are seeing non-determinism mentioned above.

We are using the same box, the same ISAL version (v2.29.0 ), same parameters but we see two different compression output.

The code that we are using to compress is the following

        isal_deflate_stateless_init(&isalstream);
        isalstream.end_of_stream = 1;
        isalstream.flush = NO_FLUSH;
        isalstream.next_in = (Bytef *)cmpinstart_p;
        isalstream.avail_in = bytestocompress;
        isalstream.next_out = (Bytef *)cmpoutstart_p;
        isalstream.avail_out = length;
        isalstream.gzip_flag = IGZIP_ZLIB;

        isalstream.level = 2;
        isalstream.level_buf = blkalloc(NULL, 1, ISAL_DEF_LVL2_DEFAULT);
        isalstream.level_buf_size = ISAL_DEF_LVL2_DEFAULT;

        zret = isal_deflate_stateless(&isalstream);

I have attached the two compressed outputs (in binary) that ISAL produced.

nondetermism.zip

Since the sizes are different, our recovery logic can't handle it as we are expecting the same size.

Apreciate your help.

Thanks

@gbtucker
Copy link
Contributor

I looked at these files. I could confirmruntime.bin and replay.bin are indeed compressed with zlib, 2byte, headers and decompress to the same 16312 byte file but a large difference on your input with 9600 and 6528 bytes sizes each. They seem to have different deflate headers also and both contain some extra junk bytes at the end. replay has 442 extra bytes and runtime has 13 extra. They will still decompress successfully if you remove these amount of bytes from the end. Even with compressing with levels 0-3 I only see a range of 7k - 6k output so the 9600 - 13 is still hard to explain. Could bkalloc() be failing if really calling every time or is it just called once?

@showvick
Copy link
Author

Thanks for looking. No blkalloc can't fail. If it fails we will abort the thread so no compression will take place.

@showvick
Copy link
Author

Note: Since it is part of the file system, we have to align it to the sector boundary. The trailing bytes at the end can be ignored as you did.

@showvick
Copy link
Author

showvick commented Oct 1, 2020

We are still encountering this issue. Is there a diagnostic/ debug info we can use to determine why we are getting two different outputs... Like a different strategy?

@gbtucker
Copy link
Contributor

gbtucker commented Oct 1, 2020

Is it possible to dump the contents of isalstream in the two cases? I wasn't able to reproduce the compression of your example to 9600 bytes by any strategy so I still think something else has been changed in the initial conditions.

@showvick
Copy link
Author

showvick commented Oct 1, 2020

Thanks Greg. We will provide the isalstream for the two cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants