node-binary and its deps are slow #4

stephank opened this Issue May 10, 2011 · 7 comments


None yet
4 participants

stephank commented May 10, 2011

I'm looking at performance for an application that does quite a bit of work in MongoDB using the node-mongodb-native driver. Mongo speaks a binary protocol, and the driver does pure-JS parsing, which turned up high in our profiling results. (There's a C-parser component, but it doesn't actually fully parse the protocol, which means it's still noticable when firing lots and lots of small queries.)

So my first idea was, because the JS binary parsing code was rather ugly, to see what'd happen if I replaced it with something else. In this case, I tried node-binary, but results actually got quite a bit worse. For reference, commit a922e3e of my fork contains that work.

I wrote up a small sample, based on the stream.js example:

var Binary = require('../index');
var EventEmitter = require('events').EventEmitter;

var emitter = new EventEmitter();
  .loop(function() {

var chunk = Binary.put()

var start = new Date();
for (var i = 10000; i; i--) {
  emitter.emit("data", chunk);
process.on("exit", function() {
  var end = new Date();
  console.log("Done: " + ((end - start) / 1000) + "ms");

When profiling this, I got pretty much the same results I see when profiling the actual application I'm developing. Interestingly, node-mongodb-native's original binary parsing code was slowest in bit shifting code, while I really don't see much of that in profiling results for node-binary.

By far the most expensive parts seem to be in chainsaw and node-buffers. Here are some of the top results I get in the bottom-up profile:

ticks parent  name
4834   35.1%  __write
1203   24.9%    Function: walker /home/shteef/Source/node-binary/node_modules/chainsaw/node_modules/traverse/index.js:96
1033   85.9%      Function: walk /home/shteef/Source/node-binary/node_modules/chainsaw/node_modules/traverse/index.js:92
1033  100.0%        Function: /home/shteef/Source/node-binary/node_modules/chainsaw/node_modules/traverse/index.js:7
1033  100.0%          Function: Chainsaw.saw.saw.chain /home/shteef/Source/node-binary/node_modules/chainsaw/index.js:18
1033  100.0%            Function: Chainsaw.saw.saw.nest /home/shteef/Source/node-binary/node_modules/chainsaw/index.js:56
 170   14.1%      Function: Object.keys.forEach.Traverse /home/shteef/Source/node-binary/node_modules/chainsaw/node_modules/traverse/index.js:160
 170  100.0%        LazyCompile: forEach native array.js:935
 170  100.0%          Function: walker /home/shteef/Source/node-binary/node_modules/chainsaw/node_modules/traverse/index.js:96
 170  100.0%            Function: walk /home/shteef/Source/node-binary/node_modules/chainsaw/node_modules/traverse/index.js:92
1113   23.0%    Function: Buffer.copy buffer.js:331
1113  100.0%      Function: Buffers.slice /home/shteef/Source/node-binary/node_modules/buffers/index.js:135
1113  100.0%        Function: dispatch /home/shteef/Source/node-binary/index.js:40
1113  100.0%          Function: getBytes /home/shteef/Source/node-binary/index.js:21
1113  100.0%            Function: /home/shteef/Source/node-binary/index.js:82

In addition, 23.5% of the execution time was spent in the GC. With the above, that accounts for roughly 60% of execution time.

To be honest, I can't tell right away what's so expensive about walker. That it calls into __write could be the result of a bunch of things.


stephank commented May 11, 2011

Looking a bit more at this, a new saw is created for every saw.nest, which also happens on every loop iteration. That's rather expensive for tight message parsing loops like the one I use.

As for the Buffer.copy, that's likely because a slice is taken for every (tiny) field node-binary tries to parse, which results in a new Buffer created and copied into. I wonder if Buffers.slice can return a new Buffers view instead?

I like the way node-binary does its thing in syntax. I'm wondering if the things I'm trying to use it for line up with what you originally had in mind when writing it.


substack commented May 12, 2011

I was mostly trying to stay sane by being abstract with node-binary since node-bufferlist was madness to write and test and debug. The module is probably in good enough shape right now that performance enhancements would be a useful thing to start hacking on.

I really like the syntax you've come up with. I hope to see performance improvements forthcoming!


justmoon commented Jun 30, 2011

Alright, so: I ran into this issue from a different angle. I was trying to debug a memory leak in my own application. I eventually traced it to node-binary.

Here is a simple test script demonstrating the leak: - After 5000 iterations, node-binary has instantiated over 500000 closures consuming more than 144 MB of memory. So what's going on?

On every iteration loop() calls tap(), which calls saw.nest() which creates a new saw and calls builder(), which creates new closures for every single method in the saw.

Why is tap() so expensive? Because nested tap()s allow some pretty fancy scheduling. In essence, node-binary allows every tap callback handler to add a new data listener at any time and it'll figure out where it goes in the chain and correctly shepard the data to the right place.

To fix the issue I made a very simple change in node-binary/index.js:

@@ -121,7 +125,7 @@ = function (em, eventName) {
                 this.tap(function () {
                     if (end)
-                });
+                }.bind(this));

This means that loops still create a nested context every time, but at least the next iteration builds on the original context, meaning that the new context can be garbage collected. This reduces the memory leak by about half down to 76 MB.

So what's still leaking? Well, Chainsaw actually keeps a history of all the actions it has ever done, which is used for the down(), trap() and jump() features, none of which node-binary uses I think.

My idea then is to add a new mode to Chainsaw where it just Array.shift()s each action off of the actions array (and disables down, trap and jump.) And change node-binary to use that mode. Then a history of actions would not be kept anymore and I think that should solve the memory leak.

substack, I'd love to get your initial thoughts on my approach asap, I'll probably take break for a few hours and then start working on this. Don't worry if you don't get this message in time, just in case you do; Some quick feedback would help out a lot, cheers!


substack commented Jun 30, 2011

Sounds groovy. I mostly wrote chainsaw to keep myself sane prototyping all of this. I don't care if it has to be lobotomized or removed later for performance.


justmoon commented Jun 30, 2011

Awesome, thanks for the quick reply.


justmoon commented Jun 30, 2011

Both suggested fixes are implemented and submitted:

Fix for infinitely recursing chainsaws: #8
Fix for Chainsaw keeping complete call history by default: substack/node-chainsaw#2

There are still massive performance enhancements possible, for example by moving Chainsaw towards using prototypal inheritance. But this fixes the memory leaks, which at least for my case was by far the more problematic issue and because memory operations are quite expensive, it should be quite the performance boost already.

@evangoer evangoer pushed a commit to evangoer/yeti that referenced this issue Nov 15, 2013

@reid reid Only call tap() once to improve performance. b17ebc4

gkubisa referenced this issue in sidorares/node-dbus Jun 19, 2014


Memory leak from bus.invoke #21

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment