Bad performance? #297

tsgautier · 2013-05-20T16:23:57Z

Hello, I was noticing some slow down in my application recently. Among other things we switched to using q heavily. I wasn't sure what the problem was, so I wrote a micro-benchmark for a couple of the libraries that we use that are core to our application.

I am not sure if the following is valid, maybe someone can comment. If it is valid, it seems to me that there is a real problem in the q library that should be addressed as the results I am getting are at least 10-100x slower than I would have expected.

q = require 'q'

chain = (p, val) ->
  p.then -> val

doit = ->
    original = q.defer()
    p = original.promise
    count = 0
    p = chain p, count until ++count >= 5
    original.resolve 'foo'

count = 0
max = 10000
start = new Date().getTime()
doit() until ++count >= max
end = new Date().getTime()
elapsed = end - start
console.log "took #{elapsed} ms average ", count / (elapsed / 1000), " TPS"

After running this several times, I get consistently the following results (my machine is recent model MacBook Air with a Core i7 2GHz processor):

took 7214 ms average 1386.193512614361 macro TPS 6930.967563071805 micro TPS

More troublesome than the result is the huge pause I get after printing out the data which indicates to me a big consumption of memory.

Maybe I am doing something wrong that someone would like to point out?

(To put things in perspective, I wrote a similar benchmark for the postal.js library, and the results there are about 35x faster to deliver 5,000 messages, about 20ms. Obviously postal and q are apples and oranges, but for pure in-memory operations, I surely did not expect operations to take so long, even if we account for a 5x increase)

domenic · 2013-05-20T16:25:23Z

Could you write that code in JavaScript, which is the language of this repository?

robotlolita · 2013-05-20T16:43:51Z

Is that CoffeeScript?

From this snippet, the console.log instruction and the computing of the elapsed times don't seem to use promises at all. Promises/A+ (which Q is an implementation of), resolves all promises .then methods asynchronously, so, in your example, logging and other stuff would happen before the promises are resolved — which would explain the "huge pause" at the end (those things are still scheduled to happen at some later time).

tsgautier · 2013-05-20T20:01:56Z

Thanks for the comments. Here is my revised implementation. I think this synchronously waits for the 10,000 chains to be created. The result I get from this was approximately:

took 11248 ms average 889.0469416785206 macro TPS 4445.234708392603 micro TPS

There is no more pause so I think you were correct on that count.

// Generated by CoffeeScript 1.6.2
(function() {
  var chain, count, doit, finish, i, max, q, setupChain, start, _i;

  q = require('q');

  count = 0;

  max = 10000;

  start = new Date().getTime();

  finish = function() {
    var elapsed, end;

    end = new Date().getTime();
    elapsed = end - start;
    return console.log("took " + elapsed + " ms average ", count / (elapsed / 1000), "macro TPS", 5 * count / (elapsed / 1000), "micro TPS");
  };

  chain = function(p, val) {
    return p.then(function() {
      return val;
    });
  };

  setupChain = function() {
    var i, original, p, _i;

    original = q.defer();
    p = original.promise;
    for (i = _i = 1; _i <= 5; i = ++_i) {
      p = chain(p, i);
    }
    original.resolve('foo');
    return p;
  };

  doit = function() {
    return setupChain().then(function() {
      if (++count >= max) {
        return finish();
      }
    });
  };

  for (i = _i = 1; 1 <= max ? _i <= max : _i >= max; i = 1 <= max ? ++_i : --_i) {
    doit();
  }

}).call(this);

tsgautier · 2013-05-21T03:56:02Z

I think maybe this demonstration is more concise.

$ took 2763 ms average 3619.2544335866814 TPS

// Generated by CoffeeScript 1.6.2
(function() {
  var count, doit, finish, i, max, q, start, _i;

  q = require('q');

  count = 0;

  max = 10000;

  start = new Date().getTime();

  finish = function() {
    var elapsed, end;

    end = new Date().getTime();
    elapsed = end - start;
    return console.log("took " + elapsed + " ms average ", count / (elapsed / 1000), "TPS");
  };

  doit = function() {
    return q.when('value').then(function() {
      if (++count >= max) {
        return finish();
      }
    });
  };

  for (i = _i = 1; 1 <= max ? _i <= max : _i >= max; i = 1 <= max ? ++_i : --_i) {
    doit();
  }

}).call(this);

domenic · 2013-05-21T03:58:24Z

Yes. Asynchronous code is generally slower than synchronous code. If you are using Q for synchronous code (e.g. switching synchronous code to Q-based code), you are making a mistake. If you are using Q for asynchronous code, it won't be that much slower than other asynchronous solutions---which you don't seem to be comparing against.

tsgautier · 2013-05-21T04:06:59Z

By a small, amount, maybe. By 1 or 2 orders of magnitude? No I don't agree.

Here is some async code that demonstrates expected in-memory performance

took 6 ms average 1666666.6666666667 TPS

// Generated by CoffeeScript 1.6.2
(function() {
  var count, doit, events, finish, i, max, obj, start, _i;

  events = require('events');

  count = 0;

  max = 10000;

  start = new Date().getTime();

  finish = function() {
    var elapsed, end;

    end = new Date().getTime();
    elapsed = end - start;
    return console.log("took " + elapsed + " ms average ", count / (elapsed / 1000), "TPS");
  };

  obj = new events.EventEmitter();

  obj.on('test', function() {
    if (++count >= max) {
      return finish();
    }
  });

  doit = function() {
    return obj.emit('test');
  };

  for (i = _i = 1; 1 <= max ? _i <= max : _i >= max; i = 1 <= max ? ++_i : --_i) {
    doit();
  }

}).call(this);

domenic · 2013-05-21T04:08:07Z

Even ignoring your continued rudeness in posting another language: event emitters are synchronous, not asynchronous.

domenic · 2013-05-21T04:08:16Z

Use setImmediate for a real test.

tsgautier · 2013-05-21T04:20:59Z

I was in the middle of updating my mistake, I accidentally had the coffeescript version in my buffer, and was adding the javascript version. I'm pretty sure I'm not the one being rude.

tsgautier · 2013-05-21T04:24:12Z

How about this?

took 14 ms average 714285.7142857143 TPS

// Generated by CoffeeScript 1.6.2
(function() {
  var count, doit, events, finish, i, max, obj, start, _i;

  events = require('events');

  count = 0;

  max = 10000;

  start = new Date().getTime();

  finish = function() {
    var elapsed, end;

    end = new Date().getTime();
    elapsed = end - start;
    return console.log("took " + elapsed + " ms average ", count / (elapsed / 1000), "TPS");
  };

  obj = new events.EventEmitter();

  doit = function() {
    return setTimeout(function() {
      if (++count >= max) {
        return finish();
      }
    }, 0);
  };

  for (i = _i = 1; 1 <= max ? _i <= max : _i >= max; i = 1 <= max ? ++_i : --_i) {
    doit();
  }

}).call(this);

kriskowal · 2013-05-21T04:24:36Z

It’s true, @domenic’s response was terse and hurried. I’ll apologize on his behalf as his co-maintainer. Let’s try to keep it technical, where we all do best.

I agree that the order of magnitude of difference is suspicious. We’ll have do a profile and see where it went wrong.

Thank you for updating with JavaScript. It does help me since I don’t use much CoffeeScript tooling. I know Domenic does and his comments were for my benefit.

domenic · 2013-05-21T04:26:27Z

You need to call doit from inside doit to simulate a chain of promises. You can't just run a bunch of setTimeouts in parallel.

tsgautier · 2013-05-21T04:27:58Z

That's what I did in my first example. But then I simplified it to just using when/then and see the same timing. Note that these last two examples aren't using q - I was just trying to write some async code that is performing the same # of operations to measure it's performance.

Keep in mind I am well aware that micro-benchmarks are always dangerous and one should be immediately skeptical - I get that.

But - I think I get what you are saying - my simulation of chained promises isn't very good. Nonetheless, let's focus on the simple case, it seems to exhibit similar characteristics, so it doesn't seem to require a chain.

domenic · 2013-05-21T04:46:37Z

To be clearer: you need nested (recursive) setTimeouts to simulate nested promises. Your latest code does not contain any nested setTimeout calls.

tsgautier · 2013-05-21T04:53:36Z

Right, let's focus on just running a bunch of promises in parallel as in this code where I am running q.when/then in parallel and timing the total time it takes for all the parallel events to resolve. The code below, in a micro-benchmark over 10k parallel ops, took ~3s to complete.

doit = function() {
    return q.when('value').then(function() {
      if (++count >= max) {
        return finish();
      }
    });
  };

robotlolita · 2013-05-21T17:05:03Z

Here's a proper test case for this. It executes 100 parallel computations asynchronously, where each computation is made up of 5 sequential asynchronous operations. Q is faster because my callback version is naïve and each asynchronous operation is executed in the next event loop, whereas in Q this is optimised internally.

correction: Q is only faster on Firefox. In Chromium the callback version is faster.

http://jsperf.com/callbacks-vs-promises-a

tsgautier · 2013-05-21T22:29:15Z

I think it's better if we don't worry about the chain of promises. While that simulates my actual use case more accurately, after several iterations of different tests on this thread it seems clear that just a simple benchmark of n asynchronous callbacks or n q resolves will also show similar behavior, and is easier to understand in a micro-benchmark scenario. And I think writing just a simple loop of n async callbacks or n q resolves is easier to understand and examine.

domenic · 2013-06-06T06:49:08Z

Using the branch that turns off long stack traces by default (see #311) and running this test suite, I get the results

                      A single simple async operation
          49,311 op/s » with an immediately-fulfilled promise
         436,358 op/s » with direct setImmediate usage
              64 op/s » with direct setTimeout(…, 0)

                      1000 operations in parallel
              48 op/s » with immediately-fulfilled promises
             897 op/s » with direct setImmediate usage

Q is slower than we would like, but only about 10x slower than callbacks. Running many promises in parallel seems to make Q about 20x slower.

sobotklp · 2013-06-06T15:52:14Z

We've been using q@0.8.2 for a moderately large API (~800 reqs/sec) with persistent websocket connections. Upon upgrading to q@0.9.4, we noticed a ~400% increase in CPU use as well as leaking memory. Aside from using Q.fcall() instead of Q.call(), there were no changes to the way we're using Q promises. Unfortunately, we couldn't find a fix for those issues other than reverting to the older version of q.

domenic · 2013-06-06T15:57:40Z

@sobotklp you appear to have introduced long stack traces in your upgrade. You can turn them off with Q.longStackJumpLimit = 0. In version 0.9.5 they will be off by default.

sobotklp · 2013-06-06T16:12:11Z

@domenic cool thanks! :) I'll give that a try.

domenic · 2013-07-09T14:25:50Z

I think it's time to close this. Thanks for prompting us to finally get a performance test suite up and running. If you have any further cases that you think are slower than they should be, we'd appreciate them either over in #312, or even better as pull requests that add to the existing suite.

domenic closed this as completed Jul 9, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad performance? #297

Bad performance? #297

tsgautier commented May 20, 2013

domenic commented May 20, 2013

robotlolita commented May 20, 2013

tsgautier commented May 20, 2013

tsgautier commented May 21, 2013

domenic commented May 21, 2013

tsgautier commented May 21, 2013

domenic commented May 21, 2013

domenic commented May 21, 2013

tsgautier commented May 21, 2013

tsgautier commented May 21, 2013

kriskowal commented May 21, 2013

domenic commented May 21, 2013

tsgautier commented May 21, 2013

domenic commented May 21, 2013

tsgautier commented May 21, 2013

robotlolita commented May 21, 2013

tsgautier commented May 21, 2013

domenic commented Jun 6, 2013

sobotklp commented Jun 6, 2013

domenic commented Jun 6, 2013

sobotklp commented Jun 6, 2013

domenic commented Jul 9, 2013

Bad performance? #297

Bad performance? #297

Comments

tsgautier commented May 20, 2013

domenic commented May 20, 2013

robotlolita commented May 20, 2013

tsgautier commented May 20, 2013

tsgautier commented May 21, 2013

domenic commented May 21, 2013

tsgautier commented May 21, 2013

domenic commented May 21, 2013

domenic commented May 21, 2013

tsgautier commented May 21, 2013

tsgautier commented May 21, 2013

kriskowal commented May 21, 2013

domenic commented May 21, 2013

tsgautier commented May 21, 2013

domenic commented May 21, 2013

tsgautier commented May 21, 2013

robotlolita commented May 21, 2013

tsgautier commented May 21, 2013

domenic commented Jun 6, 2013

sobotklp commented Jun 6, 2013

domenic commented Jun 6, 2013

sobotklp commented Jun 6, 2013

domenic commented Jul 9, 2013