Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: is there a ways to determine byte size of the packed message without actually decoding it? #40

Open
royaltm opened this issue Jun 27, 2016 · 4 comments

Comments

@royaltm
Copy link

royaltm commented Jun 27, 2016

The use case might be to only determine boudaries of the msgpack or count sequenced msgpacked items.

e.g.

var data1 = encode("some data"); // <Buffer a9 73 6f 6d 65 20 64 61 74 61>
var data2 = encode(123); // <Buffer 7b>
var data = Buffer.concat([data1, data2]);
// hypothetical api:
decodeSize(data) // -> 10
decodeSize(data.slice(10)) // -> 1
decodeSize(data.slice(0, 9)) // -> undefined (truncated data)

Perhaps an interface similar to Decode().

Decode().on('data', (n) => console.log(`number of bytes: ${n}`)).decodeSize(data);

Another question: is there a way to encode message to provided buffer with offset and boundary length?

// hypothetical api: encode(data[, buffer[, offset[, maxLength]]])
encode("some data", new Buffer(100), 10, 50);
@kawanet
Copy link
Owner

kawanet commented Jun 27, 2016

@royaltm

The byte length of a msgpack item is a variable. Detecting number of items or byte lengths would take almost same seconds to decode the msgpack stream. What is an use case which needs to determine boudaries of the msgpack or count sequenced msgpacked items?


// hypothetical api: encode(data[, buffer[, offset[, maxLength]]])
encode("some data", new Buffer(100), 10, 50);

I could guess an use case for above. It may allow less memory copy.
Here is msgpack-lite's Endoder object which is not well documented.
This allows a similar thing as below:

var msgpack = require("msgpack-lite");
var e = new msgpack.Encoder();
e.on("data", console.log); // debug only
var buf = e.buffer = new Buffer(100);
var start = e.offset = 10;
e.encode("some data");
var length = e.offset - start;
msgpack.decode(buf.slice(10, e.offset)); // debug only

It requests enough length of buffer.

@royaltm
Copy link
Author

royaltm commented Jun 28, 2016

Yes I know that detecting the size would require to recursively iterate over maps and arrays but decoding data means also allocating Objects and later gc'ing them. I need to scan over large stream of msgpacked objects only cherrypicking n'th of them. For now I can just throw away whatever items I don't need but it's an obvious waste.

Thanks for the hint regarding 2nd question. This is a little bit verbose, but that will do :).

@kawanet
Copy link
Owner

kawanet commented Jun 28, 2016

decoding data means also allocating Objects and later gc'ing them.

Right.

I need to scan over large stream of msgpacked objects only cherrypicking n'th of them.

It would need a kind of dryrun feature which returns the byte length instead of the object decoded.

function getNth(source, n, callback) {
  var dry = msgpack.createCodec({dryrun: true}); // proposal
  var wet = msgpack.createCodec({dryrun: false}); // normal behavior codec
  var codec = n ? dry : wet; // first codec
  var decoder = msgpack.createDecodeStream({codec: codec});
  var cnt = 0;
  source.pipe(decoder).on("data", function (data) {
    if (cnt === n) callback(null, data);
    decoder.codec = (cnt === n-1) ? wet : dry; // next codec
    cnt++;
  }).on("error", callback);
}

@royaltm
Copy link
Author

royaltm commented Jun 28, 2016

Yes exactly. The dryrun would do the trick. Is there any chance you would consider this feature?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants