Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pako 1.x -> 2.x raw decompression regression with intermediate flush(?) #235

Open
ngladitz opened this issue Aug 13, 2021 · 5 comments
Open

Comments

@ngladitz
Copy link

Given the following package.json:

{
  "name": "pako-test",
  "version": "1.0.0",
  "description": "",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "",
  "license": "ISC",
  "dependencies": {
    "pako1": "npm:pako@^1.0.11",
    "pako2": "npm:pako@^2.0.4"
  }
}

And following index.mjs:

import {default as pako1} from 'pako1';
import * as pako2 from 'pako2';

const original = new Uint8Array([1,2,3,4,5,6,7,8,9,10]);
const extraInput = new Uint8Array([11,12]);

const deflator = new pako1.Deflate({raw: true});

deflator.push(original, 3);
const partialLength = deflator.strm.total_out;

deflator.push(extraInput, true);

const fullCompressed = deflator.result;

const partialCompressed = fullCompressed.subarray(0, partialLength);

console.log("fullCompressed", fullCompressed);
console.log("partialCompressed", partialCompressed);

const decompressed1 = pako1.inflateRaw(partialCompressed);
const decompressed2 = pako2.inflateRaw(partialCompressed);

console.log("original", original);
console.log("decompressed1", decompressed1);
console.log("decompressed2", decompressed2);

Running node index.mjs the output I get is:

fullCompressed Uint8Array(20) [
   98, 100,  98, 102, 97, 101, 99,
  231, 224, 228,   2,  0,   0,  0,
  255, 255, 227, 230,  1,   0
]
partialCompressed Uint8Array(16) [
  98, 100,  98, 102, 97, 101,
  99, 231, 224, 228,  2,   0,
   0,   0, 255, 255
]
original Uint8Array(10) [
  1, 2, 3, 4,  5,
  6, 7, 8, 9, 10
]
decompressed1 Uint8Array(10) [
  1, 2, 3, 4,  5,
  6, 7, 8, 9, 10
]
decompressed2 undefined

inflateRaw returns the expected output in version 1 but returns nothing (undefined) in version 2.

I might have missed an explicit / intended behavior change that explains this and maybe there is some specific option that would also make this work with version 2 otherwise this looks like a regression?

@puzrin
Copy link
Member

puzrin commented Aug 13, 2021

See #234 (comment)

v2 changed wrapper to "standard one", according to zlib docs. v1 terminated process on sync mark, that caused problem with multistream files. v2 does NOT emit end on sync mark, but still should flush data (and you can join it manually anytime).

In short - code, working in v1 used buggy behaviour :). Wrapper is not magical, and can not fit all possible use cases. If you need partial compress/decompress - see wrapper src and tweak for your need.

Also, if you know popular multistream use cases - you can suggest tests, to be sure this will not be broken in future.

@ngladitz
Copy link
Author

Thank you for quick reply.

I am not sure which code specifically is the wrapper (I am not familiar with the internal architecture) and the noted comment seems to reference deflation rather than inflation but I tried the following (extending the code above):

const inflator2 = new pako2.Inflate({raw: true});
inflator2.push(partialCompressed, true);
console.log("inflator2", inflator2);

I do see the inflated content is still in inflator2.strm.output; if I understood you correctly it should have been flushed ... does that mean it should have been transferred to inflator2.chunks (currently empty array)?

Also not sure what multistream means specifically in this context; my use case is I've got a compressed stream containing a stack of images (written natively with zlib). Each image is flushed explicitly (locations in the output stream are remembered / stored) with Z_FULL_FLUSH to allow extracting specific single images from the stack without extracting the entire stack.

@puzrin
Copy link
Member

puzrin commented Aug 13, 2021

I am not sure which code specifically is the wrapper

https://github.com/nodeca/pako/tree/master/lib - those 2 files are top level wrappers for zlib port. If you wish to use partial compress-decompress - you should understand what happens there, and be able to patch/replace for your needs.

I do see the inflated content is still in inflator2.strm.output; if I understood you correctly it should have been flushed ... does that mean it should have been transferred to inflator2.chunks (currently empty array)?

See wrapper sources first. In theory, Z_xxx_FLUSH should force pending output to .chunks. But i don't used this mode. The only thing i can say, wrapper use the same logic as zlib doc.

Also, you can create your own custom wrapper for your needs, and use zlib directly.

Also not sure what multistream means specifically in this context;

It means deflated data, consisting of several parts => having "sync mark" inside.

@ngladitz
Copy link
Author

Thank you for elaborating.

The current condition for the onData call that per default appends to .chunks is

if (strm.avail_out === 0 || status === Z_STREAM_END) {

So there currently is no output chunk (even when flushing) unless the output area is completely full or the input stream was fully consumed.

Would you consider changing this to also consume the data (call onData()) when _flush_mode > 0?
I'd then be able to get away with manually merging chunks from .chunks or overloading onData().

Being able to manually trigger / force finalization (onEnd()) might also be nice if it indeed can't be the default for some use cases.

I suppose I could write my own wrapper (the zlib interface itself isn't exported though(?)) or manually concatenate the data from both .chunks and .strm.output otherwise but I guess I'd be depending somewhat on implementation details too then.

@puzrin
Copy link
Member

puzrin commented Aug 13, 2021

I have no objection about improving flush in inflate wrapper, but i don't know how to do it right. Previous attempt to add "everything possible" (in v1) ended with cryptic & unmanainable wrapper without tests. So i had to drop all reinvented weels and do everything from scratch.

If anyone has time, i'd suggest to investigate logic of nodejs native zlib wrapper. It's battle-tested and should be ok.

Or, you can just copy-paste inflate wrapper from v1. If you have only one use case, you don't need to worry about universal solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants