Skip to content
This repository has been archived by the owner on Mar 30, 2022. It is now read-only.

Commit

Permalink
Initial README.md.
Browse files Browse the repository at this point in the history
  • Loading branch information
pgriess committed Aug 18, 2010
1 parent 9b4d8d1 commit 5f6dd28
Show file tree
Hide file tree
Showing 2 changed files with 45 additions and 0 deletions.
1 change: 1 addition & 0 deletions .gitignore
@@ -0,0 +1 @@
README.html
44 changes: 44 additions & 0 deletions README.md
@@ -0,0 +1,44 @@
A streaming tokenizer for [NodeJS](http://nodejs.org).

Parsing data coming off the wire in an event-driven environment can be a
difficult proposition, with naive implementations buffering all received data
in memory until a message has been received in its entirety. Not only is this
infficient from a memory standpoint, but it may not be possible to determine
the that a message has been fully received without attempting to parse it.
This requires a parser that can gracefully handle incomplete messages and
pick up where it left off. To make this task easier, `node-strtok` provides

* Tokenizing primitives for common network datatypes (e.g. signed and
unsigned integers in variois endian-nesses).
* A callback-driven approach well suited to an asynchronous environment (e.g.
to allow the application to asynchronously ask another party for
information about what the next type should be)
* An easily extensible type system for adding support for new,
application-defined types to the core.

## Usage

Below is an example of a parser for a simple protocol. Each mesasge is
prefixed with a big-endian unsigned 32-bit integer used as a length
specifier, followed by a sequence of opaque bytes with length equal to the
value read earlier.

var strotk = require('strtok');

var s = ... /* a net.Stream workalike */;

var numBytes = -1;

strtok.parse(s, function(v, cb) {
if (v === undefined) {
return strtok.UINT32_BE;
}

if (numBytes == -1) {
numBytes = v;
return new strtok.BufferType(v);
}

console.log('Read ' + v.toString('ascii'));
numBytes = -1;
});

4 comments on commit 5f6dd28

@ry
Copy link

@ry ry commented on 5f6dd28 Aug 19, 2010

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would love to see a msgpack benchmark

@pgriess
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meh. Numbers aren't great: node-msgpack is about 10x faster when de-serializing. Check out 2ed8a58.

node-msgpack can unpack 50k objects in 0.8s - 0.9s

node-strtok can unpack the same 50k objects in 9.8 - 10.1s

@pgriess
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting rid of Buffer.slice() operations improved throughput 3x.

@pgriess
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both arrays and primitives are actually faster in JS than native MsgPack.

However, packing and unpacking {'abcdef' : 1}, we see the following:

native: 1379ms
js:     3592ms

The major difference appears to be in the setting of named properties on JavaScript objects. If we omit that step in the MsgPack JS parser, we instead see:

native: 1329ms
js:     2144ms

Please sign in to comment.