New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Struggling with ZSTD_decompressBlock #227
Comments
Thanks for notification @GregSlazinski . Indeed, the construction you describe is expected to work. I checked the test suite, and only "single block" construction was tested so far. As multi-blocks are not part of the test suite yet, this construction can be buggy. Onto debugging now .... |
Latest update in "dev" branch is an attempt at fixing this issue. |
What are the intended usage scenarios for ZSTD_compress and ZSTD_decompress versus ZSTD_compressBlock and ZSTD_decompressBlock? My understanding is that both of the variants will not look back into the previously processed block and both will not add frame headers. So what will be the difference apart of the ..Block variants receiving a context? |
Thanks a lot for the quick fix. I've narrowed down the block that fails, and created a sample binary file of 1 MB. Using the code as above, you can notice that it will fail. from decompressor I get: step 2: step 3: compressed and fail |
But the intended usage for |
I see. Thanks for the explanation. |
OK. I guess I understand what's going on.
Even if you properly handle uncompressed blocks directly in your application, which seems to be the case here, the problem is This is what happens here. Fixing this issue will be a bit more complex. Maybe it's time to reconsider if supporting multi-blocks is really a good idea. Btw @GregSlazinski, is there any property that made you select this mode rather than |
I want to compress with absolute smallest possible size, without unnecessary headers. I've found the most efficient way to compress every file, is to use:
Additionally in above scheme, I pack UInt's using following code:
Allowing to encode small UInt's to just 1 byte. I think that to surround blocks with bigger frame containers than just the block size (with method as described above), is a waste of space. In short: I just want to avoid the extra overhead, and pack as small as possible. |
I believe it's a very good use case which justifies I'm just wondering if, for sources > 128 KB, the headers savings remain significant enough to justify |
I'd like to use ZSTD_compressBlock everywhere, rather than use 2 separate paths for small files vs large files. |
OK, I've been thinking about this issue for a while, and believe to have a proposition. First, if it was for my own usage, I would advise to use My guess is that the code is effectively simplified by using frames for sources > 128 KB. Anyway, I was so seduced by the concept that I was initially considering to just apply it transparently from within But it was a problem regarding definition. In such a construction, So I've switched side since. I believe the issue comes from There are 2 solutions to this :
I believe the second solution is likely the most transparent one. |
Hi, Thanks for the reply. Whichever solution you choose is fine by me. I have a lot of files, small and big, all of them are packed into one big archive PAK that has all files stored inside and packed as separate entries. Thanks for your help, this is the last issue that stops me from integrating this awesome library into my game engine. |
There's a problem with my latest suggestion :
This would require to know the size to decompress. Hence, the prototype requirement would change from "needs maximum possible decompressed size" (aka output buffer size) to "needs exact decompressed size". Is that an issue ? |
Hi, Currently I store only the uncompressed size for entire file (not single block, but all blocks that make the file summed together). I'm assuming that this information is stored in the ZSTD frame headers? If it's required, then I can provide it. I'm curious how does it compare to 'LZ4_decompress_safe_continue' which doesn't require exact decompressed size. Thank you |
Hence, they share same properties : For LZ4 though, that might not be a problem, because In contrast,
OK, so indeed having to manage the exact decompressed size of each block is an additional duty, which doesn't fit well. Maybe then the previous idea of "inserting data segments within ZSTD_DCtx* history context" would prove a better choice. It would need a new prototype though, something like :
Thinking harder about it : it might be possible to refactor Thoughts ? |
Is it my understanding that "ZSTD_insertBlock" would be called only for non-compressed blocks? In that case I think that would be the best solution. Because my decompressor currently works like that:
In this approach, the size for blocks that are not compressed is already known.
with this:
Is that correct?
I think this approach would be worse, if it would introduce +4 byte overhead. It's much better to have 1-bit overhead in UINT that's already needed for compressed block size, instead of adding +4 extra bytes. Thank you |
FWIW I'd also prefer to avoid the 4 bytes overhead if this adds to the ~15 bytes that are already there. If this 4 bytes overhead is just for blocks > 128 KB, then I don't care too much. My use case is that I could use zstd from a higher level compressor, so I can communicate exact input and output sizes to it. |
Here's a stupid idea: if
This way, there is no custom logic to deal with compressed/uncompressed at the app level, and no need to have custom headers to distinguish between compressed/uncompressed block (stored out of bound, via the app's own framing).
|
After thinking about it more, I've edited the comment above with a simpler method. |
@KrzysFR : that's indeed the core idea in the suggestion to make The problem is, it requires to change the prototype, from Hence the discussion investigating other potential ideas. |
I guess it depends if you want to completely delegate the task of framing to the application itself or not? With current implementation, application already needs to do it anyway, if only to store a Plus, for some applications, they already either store the original size anyway (metada), OR they know it from the context (size is constant, like the size of a raw video frame or vector of N values, can be deduced from file name, or some other out-of-band information). I feel that the |
That's indeed the purpose of the
According to @GregSlazinski , he only stores the uncompressed size of the complete object, not for each block. And it would work fine, if only for the uncompressed blocks. So the only question remaining is how to deal with uncompressed blocks, in a way which keeps the initial spirit of the API. |
Don't deal with it: treat it as any other block. That what they can be used to encode the next block as well. My vote is for no special treatment of uncompressed blocks. I'm not familiar with the //...
for(; !src.end(); )
{
int chunk, size;
src.decIntV(chunk); // read compressed size
src.decIntV(size); // read original size
// read exactly 'chunk' bytes from 'src' (not sure if that's what .getFast(...) does?)
auto res=ZSTD_decompressBlock(ctx, dest.mem(), dest.left(), s.data(), chunk, size);
if(ZSTD_isError(res)) Exit(ZSTD_getErrorName(res)); // here the error occurs
// assert that res == size
if (!MemWrote(dest, size)) goto error; // this does: dest.mem+=size; and dest.left-=size;
}
//... Of course this means changing the file format, which may or may not be a big deal. |
OK, so this seems a vote in favor of proposition : @GregSlazinski and @FrancescAlted consider these additional bytes as a problem. Note this proposition uses current format, it only changes the rule that "a compressed block should always be smaller than its uncompressed size". Well, no longer in this case. |
Are these +4 bytes stored inside the block, and contain the raw size? If yes, why store them in the block if the app can do it itself (probably with less than 4 bytes). I also often have the issue of having to know the original size of the block before calling If it was stored in the block itself, I would have to call a Also, by 'change the file format', I was talking of the file format likely used by @GregSlazinski 's code sample which seem to use negative value for uncompressed size, and positive for compressed. It would change to 2 positive integers and no if/else branch. If I'm not mistaken, |
Sort of, but not exactly. It contains 2 sections, one for literals, and one for sequences. As stated, this is a consequence of using the compression format "as is". Currently, the proposal with the least undesired side-effects seems to be adding a new prototype I was also considering adding a "mode" to |
ZSTD_insertBlock |
I see what you mean, since for uncompressed, raw size == compressed size, you don't need to store the same value twice. Using the MSB for this flag (the sign bit) may not be the best with varint, but you could go the other way and store But then how to you solve the issue of knowing in advanced the raw size for an actual compressed block before calling And this still leave you with having to if/else both the encoder and decoder do deal with two types of blocks. |
FWIW, Blosc already uses the 1-bit approach for representing an uncompressed chunk in a specific place of the header. I'd prefer to go this route here too. |
@KrzysFR: for this I'm actually using a variant which I call cmpIntV - compress integer with variable number of bytes. Works similar to unsigned Int version but processes negative numbers in a similar matter as you've mentioned. As for storing the decompressed size. I store it at the start of the stream. However I do it just once (decompressed size of all blocks) , and not for each block separately. Optionally you can store it in some Metadata. As for having to do if/else. I don't see this as much of a problem. |
One last detail about The current proposed prototype is : That is : it inserts the block into Now, there is an implied condition here : With above prototype, there is a risk that a user would reference the block directly from the "compressed" area, rather than the same block but copied into its destination area. Anyway, the point is to avoid such confusion. Another prototype proposition would be something like :
Name looks the same, but scope is different : I'm unsure if this is really better. Thoughts ? |
Hi, I think the first version should be chosen. Second version would force the copy even if it's not needed, thus making it slower. There's no problem with first version having to be called from the destination buffer as long as it's mentioned in the header description. Thanks |
If I summarize in pseudo C++ code (probably does not compile) Current issue is that the decoder does not have a way to know note: fixed a few typos Encoder: // input: stream-like object with the plain text to encode
// output: stream-like object where to send the compressed output
const MAX_BLOCK_SIZE = 128 * 1024;
ZSTD_createCCtx(...)
ZSTD_compressBegin(..);
void* buf_in = malloc(MAX_BLOCK_SIZE);
void* buf_out = malloc(MAX_BLOCK_SIZE);
while(!input.eof())
{
int size = input.read_buffer(buf_in, capacity: MAX_BLOCK_SIZE);
// probably introduce a loop here to fill the buffer close to the limit,
// for socket-like streams with partial reads (ex: Transfer-Encoding: chunked)
int res = ZSTD_compressBlock(..., buf_out, MAX_BLOCK_SIZE, buf_in, size);
if (ZSTD_isError(res)) { /*... handle error */ }
if (res == 0)
{ // uncompressed block
output.write_signed_int(-size);
output.write_bytes(buf_in, size);
}
else
{ // compressed block
assert(res < MAX_BLOCK_SIZE);
output.write_signed_int(res);
output.write_bytes(buf_out, res);
}
}
// cleanup
free(buf_in);
free(buf_out);
ZSTD_freeCCtx(...) Decoder: // input: stream-like object with the compressed data to decode
// output: stream-like object where to send the decompressed output
const MAX_BLOCK_SIZE = 128 * 1024;
ZSTD_createDCtx(...);
ZSTD_decompressBegin(...);
void* buf_in = malloc(MAX_BLOCK_SIZE);
void* buf_out = malloc(MAX_BLOCK_SIZE);
while(!input.eof())
{
int chunk_size = input.read_signed_int();
int size = chunk_size < 0 ? -chunk_size : chunk_size;
int read = input.read_exactly(buf_in, size);
assert(read == size);
if (chunk_size < 0)
{ // uncompressed block
int res = ZSTD_insertBlock(..., buf_out, MAX_BLOCK_SIZE, buf_in, size);
if (ZSTD_isError(res)) { /* handle error */ }
output.write_bytes(buf_out, size);
}
else
{ // compressed block
int original_size = /* BUGBUG: how do I know this? */
int res = ZSTD_decompressBlock(...., buf_out, MAX_BLOCK_SIZE, buf_in, size, original_size);
if (ZSTD_isError(res)) { /* handle error */ }
assert(res == original_size);
output.write_bytes(buf_out, res);
}
}
// cleanup
free(buf_in);
free(buf_out);
ZSTD_freeDCtx(...); |
Now, if The app creates its own framing format where each block is:
For large blocks of max size 128KB: header size would be 4 bytes (131,072 needs 3 bytes in varint format unfortunately). For compressed block, depending on the compression ratio it would need 4 to 6 bytes (most probably 5?) For small blocks of up to 1KB, header size would be 2-3 bytes for uncompressed, and 3-4 bytes for compressed. Bonus: decoder knows in advance the size of the decompressed block (useful with some languages or API). Edit: again, this is a format that is decided by the application! Others could use a more compact encoding! The point being that the headers are not handled by ZSTD, but by the app Encoder: // input: stream-like object with the plain text to encode
// output: stream-like object where to send the compressed output
const MAX_BLOCK_SIZE = 128 * 1024;
ZSTD_createCCtx(...)
ZSTD_compressBegin(..);
void* buf_in = malloc(MAX_BLOCK_SIZE);
void* buf_out = malloc(MAX_BLOCK_SIZE);
while(!input.eof())
{
int size = input.read_buffer(buf_in, capacity: MAX_BLOCK_SIZE);
// probably introduce a loop here to fill the buffer close to the limit,
// for socket-like streams with partial reads (ex: Transfer-Encoding: chunked)
int res = ZSTD_compressBlock(..., buf_out, MAX_BLOCK_SIZE, buf_in, size);
if (ZSTD_isError(res)) { /*... handle error */ }
assert(res <= MAX_BLOCK_SIZE);
output.write_int(res);
output.write_int(res == original_size ? 0 : original_size);
output.write_bytes(buf_out, res);
}
// cleanup
free(buf_in);
free(buf_out);
ZSTD_freeCCtx(...) Decoder: // input: stream-like object with the compressed data to decode
// output: stream-like object where to send the decompressed output
const MAX_BLOCK_SIZE = 128 * 1024;
ZSTD_createDCtx(...);
ZSTD_decompressBegin(...);
void* buf_in = malloc(MAX_BLOCK_SIZE);
void* buf_out = malloc(MAX_BLOCK_SIZE);
while(!input.eof())
{
int chunk_size = input.read_int();
int original_size = input.read_int();
if (!original_size) original_size = chunk_size;
int read = input.read_exactly(buf_in, chunk_size);
assert(read == chunk_size);
int res = ZSTD_decompressBlock(...., buf_out, MAX_BLOCK_SIZE, buf_in, size, original_size);
if (ZSTD_isError(res)) { /* handle error */ }
assert(res == original_size);
output.write_bytes(buf_out, res);
}
// cleanup
free(buf_in);
free(buf_out);
ZSTD_freeDCtx(...); It looks much simpler to me, but of course would need to change the existing block format internally... |
You don't need to store uncompressed size and compressed size for each block, that's a waste of space.
(very simplified version) |
There is first tentative implementation of |
I thought Also, if you are compressing streams, you don't always know the total original size in advance, so you cannot use that as a header (a footer maybe). |
@Cyan4973 awesome, thanks a lot! The low-level blocks compression is definitely the way to go, instead of built-in frame API: The data file from the test comes from my game, and includes 3d mesh data, textures, sounds, etc. In this test I've compressed the entire file data into one ZSTD compressed stream. Thanks a lot, if it's for me, then the issue can be closed.
This was one of the solutions that was considered, however it looks like it was canceled in favor of the better 'ZSTD_insertBlock' solution. Thanks again Cyan! |
Ah so this simplifies things a bit.... 👻 IF the compressed block DOES currently encode the original size, it would be great to have an API to access it (before calling But I get the feeling that there are a possibly two (or more) use cases that would want to deal with blocks, but not with the same constraints. For completion, if there was an API where there would be no special handling required for uncompressed block, but where framing would be the responsibility of the app. (see updated code below) Pros:
Cons:
Notes:
Regarding extra bytes due to custom framing:
We would gain X bytes from 1), and have to spend Y bytes for 2) and would end up with a delta of Y-X bytes. Encoder: // input: stream-like object with the plain text to encode
// output: stream-like object where to send the compressed output
const MAX_BLOCK_SIZE = 128 * 1024;
ZSTD_createCCtx(...)
ZSTD_compressBegin(..);
void* buf_in = malloc(MAX_BLOCK_SIZE);
void* buf_out = malloc(MAX_BLOCK_SIZE);
while(!input.eof())
{
int size = input.read_buffer(buf_in, capacity: MAX_BLOCK_SIZE);
// probably introduce a loop here to fill the buffer close to the limit,
// for socket-like streams with partial reads (ex: Transfer-Encoding: chunked)
// compress block with no special handling here
int res = ZSTD_compressBlock_no_framing(..., buf_out, MAX_BLOCK_SIZE, buf_in, size);
if (ZSTD_isError(res)) { /*... handle error */ }
assert(res <= size); // size cannot be larger than original!
// write block header and compressed bytes
output.write_int(res); // compressed_size
output.write_int(res == size ? 0 : size - res); // raw_size: 0 if uncompressed; else delta
output.write_bytes(buf_out, res);
}
// note: we could add a 0 marker here to mean "end of stream", and let the decoder know to stop there?
// or have it followed by a footer (with total size, metadata, signature, ...)
output.flush();
// cleanup
free(buf_in);
free(buf_out);
ZSTD_freeCCtx(...) Decoder: // input: stream-like object with the compressed data to decode
// output: stream-like object where to send the decompressed output
const MAX_BLOCK_SIZE = 128 * 1024;
ZSTD_createDCtx(...);
ZSTD_decompressBegin(...);
void* buf_in = malloc(MAX_BLOCK_SIZE);
void* buf_out = malloc(MAX_BLOCK_SIZE);
while(!input.eof())
{
// read block header
int compressed_size = input.read_int(); // compressed_size
// note: possibly we could have 0 here mean "end of stream" ?
int raw_size = input_read_int(); // raw_size: 0 or delta
raw_size = raw_size == 0 ? chunk_size : raw_size + chunk_size;
// read compressed bytes
int read = input.read_exactly(buf_in, chunk_size);
assert(read == chunk_size);
// decompress block (must provide original size)
int res = ZSTD_decompressBlock_no_framing(...., buf_out, MAX_BLOCK_SIZE, buf_in, size, original_size);
if (ZSTD_isError(res)) { /* handle error */ }
// write decompressed bytes to output
output.write_bytes(buf_out, res);
}
output.flush();
// cleanup
free(buf_in);
free(buf_out);
ZSTD_freeDCtx(...); |
An example of where both
You will know the size of the compressed byte by reading the size of the value (which is stored in the k/v store's b-tree anyway). And you can get the original size from the key by reading If the k/v store offers direct pointer access to a memory mapped file, then the encoder/decoder can read/write straight from/to the mmap with no copy to temporary buffers needed. |
No, it does not. The frame does, and there is an API to consult it ( But the block does not store its own size, neither compressed nor uncompressed, |
Oh... 😢 🐼 But then how does the decoder knows it is finished? It's just decoding stuff until it stops runing of input with no other check than the max dst capacity? It means that corruption would only be caught after the fact by comparing the result of I see now why it wasn't making sense to me... Since there is no way to flag a naked uncompressed block without adding at least one bit. I guess that we can only rely on Also not forget to check that the |
So, trying to make it run with all the checks and alarms: Version where it is possible to know in advance the total size of the source:
disclaimer: I'm writing this late, and with a pizza in one hand, so probably buggy! Encoder: // input: stream-like object with the plain text to encode
// output: stream-like object where to send the compressed output
const MAX_BLOCK_SIZE = 128 * 1024; // or less
ZSTD_createCCtx(...)
ZSTD_compressBegin(..);
void* buf_in = malloc(MAX_BLOCK_SIZE);
void* buf_out = malloc(MAX_BLOCK_SIZE);
long total = input.magically_know_total_size(); // Can't do that with: sockets, generators, JSON serializer, ...
output.write_int_64(total);
long read = 0;
while(read < total) {
// read next block from source
int size = input.read_buffer(buf_in, MAX_BLOCK_SIZE);
// probably introduce a loop here to fill the buffer close to the limit,
// for socket-like streams with partial reads (ex: Transfer-Encoding: chunked)
assert(size <= MAX_BLOCK_SIZE);
assert(read + size <= total);
read += size;
// try compressing
int res = ZSTD_compressBlock(..., buf_out, MAX_BLOCK_SIZE, buf_in, size);
if (ZSTD_isError(res)) { /*... handle error */ }
if (res == 0) {
// uncompressed
output.write_sint32(-size);
output.write_bytes(buf_in, size);
}
else {
// compressed
assert(res < size, "must reduce size");
output.write_sint32(res);
output.write_bytes(buf_out, res);
}
}
//optional: add checksum, signature, EOF marker...
cleanup:
free(buf_in);
free(buf_out);
ZSTD_freeCCtx(...) Decoder: // input: stream-like object with the compressed data to decode
// output: stream-like object where to send the decompressed output
const MAX_BLOCK_SIZE = 128 * 1024; // or less but at least as much as encoder
ZSTD_createDCtx(...);
ZSTD_decompressBegin(...);
void* buf_in = malloc(MAX_BLOCK_SIZE);
void* buf_out = malloc(MAX_BLOCK_SIZE);
long total = input.read_int64();
assert(total <= min(FREE_DISK_SPACE, SIZE_OF_THE_INTERNET); // sanity check here
long decoded = 0;
while(!input.eof()) {
int chunk_size = input.read_sint32();
if (chunk_size == 0) {
// end of stream
break;
}
int size = chunk_size < 0 ? -chunk_size : chunk_size;
assert(size <= MAX_BLOCK_SIZE);
int read = input.read_exactly(buf_in, size);
if (read < size) { /* error truncated file */ }
assert(read == size);
int res;
int capacity = min(total - decoded, MAX_BLOCK_SIZE);
if (chunk_size < 0) {
// uncompressed
if (size > capacity) { /* error decoded too much probably corruption */ }
res = ZSTD_insertBlock(..., buf_out, capacity, buf_in, size);
}
else {
// compressed
res = ZSTD_decompressBlock(...., buf_out, capacity, buf_in, size);
//note: assumes that it fails if decoded size > capacity?
}
if (ZSTD_isError(res)) { /* handle error + potential overflow here */ }
if (decoded + res > total) { /* error decoded too much probably corruption */ }
output.write_bytes(buf_out, res);
decoded += res;
}
if (decoded != total) { /* error decoded not enough */ }
// optional: checksum, signature, ... ?
cleanup:
free(buf_in);
free(buf_out);
ZSTD_freeDCtx(...); the pizza is now cold 😢 |
Today I was able to perform more tests, However when I tried using level 22, then compression/decompression succeeded without errors, however the decompressed result file had a different hash than the original file. I'm providing an unoptimized compress/decompress function that I've simplified in order to reproduce the bug. The compressor reads the source file in each step into a continuous memory capable of storing entire source. This should avoid any problems with the buffer being reused, however the problem persists.
@Cyan4973: how can I provide the original file for testing? |
Yes, you are totally right. Knowing the exact position of the 1st wrong byte is likely to help. |
I'm not sure to properly follow :
What does it do ? In particular : |
I made a "blind fix" (as I'm unable to reproduce the issue directly). |
Hi, if(!src.getFast(&d[d_pos], size))goto error; this just reads from 'src' file into specified memory address and given I made a "blind fix" (as I'm unable to reproduce your problem directly). Sure, I'll test this right away. On 8 July 2016 at 18:14, Yann Collet notifications@github.com wrote:
|
The 132 MB file now works OK, I'll do the test on the full 800+ MB file On 8 July 2016 at 18:30, Esenthel esenthel@hotmail.com wrote:
|
Thanks a lot, the big file now works OK too. Thanks, Greg On 8 July 2016 at 18:44, Esenthel esenthel@hotmail.com wrote:
|
Fixed in v0.7.3 |
Hi,
I'm having a problem when using the block-based methods.
If I use 'ZSTD_compressContinue' with 'ZSTD_decompressContinue' then my codes work fine:
But if I replace them with 'ZSTD_compressBlock' and 'ZSTD_decompressBlock', (including writing/reading the compressed buffer size before each buffer), then decompression fails:
The error occurs at the second call to ZSTD_decompressBlock
First call succeeds:
chunk=96050
size=ZSTD_decompressBlock(ctx, dest.mem(), dest.left(), s.data(), chunk);
size=131072
Second call fails:
chunk=94707
size=ZSTD_decompressBlock(ctx, dest.mem(), dest.left(), s.data(), chunk);
size=18446744073709551605 ("Corrupted block detected")
Am I missing something obvious here?
When decompressing, the 'dest' File, in this test is a continuous memory capable of storing the entire decompressed data.
And with each decompression call, I am advancing 'dest.mem' to the next decompressed chunk position.
Thanks for any help
The text was updated successfully, but these errors were encountered: