Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad performance? #297

Closed
tsgautier opened this issue May 20, 2013 · 22 comments
Closed

Bad performance? #297

tsgautier opened this issue May 20, 2013 · 22 comments

Comments

@tsgautier
Copy link

Hello, I was noticing some slow down in my application recently. Among other things we switched to using q heavily. I wasn't sure what the problem was, so I wrote a micro-benchmark for a couple of the libraries that we use that are core to our application.

I am not sure if the following is valid, maybe someone can comment. If it is valid, it seems to me that there is a real problem in the q library that should be addressed as the results I am getting are at least 10-100x slower than I would have expected.

q = require 'q'

chain = (p, val) ->
  p.then -> val

doit = ->
    original = q.defer()
    p = original.promise
    count = 0
    p = chain p, count until ++count >= 5
    original.resolve 'foo'

count = 0
max = 10000
start = new Date().getTime()
doit() until ++count >= max
end = new Date().getTime()
elapsed = end - start
console.log "took #{elapsed} ms average ", count / (elapsed / 1000), " TPS"

After running this several times, I get consistently the following results (my machine is recent model MacBook Air with a Core i7 2GHz processor):

took 7214 ms average 1386.193512614361 macro TPS 6930.967563071805 micro TPS

More troublesome than the result is the huge pause I get after printing out the data which indicates to me a big consumption of memory.

Maybe I am doing something wrong that someone would like to point out?

(To put things in perspective, I wrote a similar benchmark for the postal.js library, and the results there are about 35x faster to deliver 5,000 messages, about 20ms. Obviously postal and q are apples and oranges, but for pure in-memory operations, I surely did not expect operations to take so long, even if we account for a 5x increase)

@domenic
Copy link
Collaborator

domenic commented May 20, 2013

Could you write that code in JavaScript, which is the language of this repository?

@robotlolita
Copy link

Is that CoffeeScript?

From this snippet, the console.log instruction and the computing of the elapsed times don't seem to use promises at all. Promises/A+ (which Q is an implementation of), resolves all promises .then methods asynchronously, so, in your example, logging and other stuff would happen before the promises are resolved — which would explain the "huge pause" at the end (those things are still scheduled to happen at some later time).

@tsgautier
Copy link
Author

Thanks for the comments. Here is my revised implementation. I think this synchronously waits for the 10,000 chains to be created. The result I get from this was approximately:

took 11248 ms average 889.0469416785206 macro TPS 4445.234708392603 micro TPS

There is no more pause so I think you were correct on that count.

// Generated by CoffeeScript 1.6.2
(function() {
  var chain, count, doit, finish, i, max, q, setupChain, start, _i;

  q = require('q');

  count = 0;

  max = 10000;

  start = new Date().getTime();

  finish = function() {
    var elapsed, end;

    end = new Date().getTime();
    elapsed = end - start;
    return console.log("took " + elapsed + " ms average ", count / (elapsed / 1000), "macro TPS", 5 * count / (elapsed / 1000), "micro TPS");
  };

  chain = function(p, val) {
    return p.then(function() {
      return val;
    });
  };

  setupChain = function() {
    var i, original, p, _i;

    original = q.defer();
    p = original.promise;
    for (i = _i = 1; _i <= 5; i = ++_i) {
      p = chain(p, i);
    }
    original.resolve('foo');
    return p;
  };

  doit = function() {
    return setupChain().then(function() {
      if (++count >= max) {
        return finish();
      }
    });
  };

  for (i = _i = 1; 1 <= max ? _i <= max : _i >= max; i = 1 <= max ? ++_i : --_i) {
    doit();
  }

}).call(this);

@tsgautier
Copy link
Author

I think maybe this demonstration is more concise.

$ took 2763 ms average 3619.2544335866814 TPS

// Generated by CoffeeScript 1.6.2
(function() {
  var count, doit, finish, i, max, q, start, _i;

  q = require('q');

  count = 0;

  max = 10000;

  start = new Date().getTime();

  finish = function() {
    var elapsed, end;

    end = new Date().getTime();
    elapsed = end - start;
    return console.log("took " + elapsed + " ms average ", count / (elapsed / 1000), "TPS");
  };

  doit = function() {
    return q.when('value').then(function() {
      if (++count >= max) {
        return finish();
      }
    });
  };

  for (i = _i = 1; 1 <= max ? _i <= max : _i >= max; i = 1 <= max ? ++_i : --_i) {
    doit();
  }

}).call(this);

@domenic
Copy link
Collaborator

domenic commented May 21, 2013

Yes. Asynchronous code is generally slower than synchronous code. If you are using Q for synchronous code (e.g. switching synchronous code to Q-based code), you are making a mistake. If you are using Q for asynchronous code, it won't be that much slower than other asynchronous solutions---which you don't seem to be comparing against.

@tsgautier
Copy link
Author

By a small, amount, maybe. By 1 or 2 orders of magnitude? No I don't agree.

Here is some async code that demonstrates expected in-memory performance

took 6 ms average 1666666.6666666667 TPS

// Generated by CoffeeScript 1.6.2
(function() {
  var count, doit, events, finish, i, max, obj, start, _i;

  events = require('events');

  count = 0;

  max = 10000;

  start = new Date().getTime();

  finish = function() {
    var elapsed, end;

    end = new Date().getTime();
    elapsed = end - start;
    return console.log("took " + elapsed + " ms average ", count / (elapsed / 1000), "TPS");
  };

  obj = new events.EventEmitter();

  obj.on('test', function() {
    if (++count >= max) {
      return finish();
    }
  });

  doit = function() {
    return obj.emit('test');
  };

  for (i = _i = 1; 1 <= max ? _i <= max : _i >= max; i = 1 <= max ? ++_i : --_i) {
    doit();
  }

}).call(this);

@domenic
Copy link
Collaborator

domenic commented May 21, 2013

Even ignoring your continued rudeness in posting another language: event emitters are synchronous, not asynchronous.

@domenic
Copy link
Collaborator

domenic commented May 21, 2013

Use setImmediate for a real test.

@tsgautier
Copy link
Author

I was in the middle of updating my mistake, I accidentally had the coffeescript version in my buffer, and was adding the javascript version. I'm pretty sure I'm not the one being rude.

@tsgautier
Copy link
Author

How about this?

took 14 ms average 714285.7142857143 TPS

// Generated by CoffeeScript 1.6.2
(function() {
  var count, doit, events, finish, i, max, obj, start, _i;

  events = require('events');

  count = 0;

  max = 10000;

  start = new Date().getTime();

  finish = function() {
    var elapsed, end;

    end = new Date().getTime();
    elapsed = end - start;
    return console.log("took " + elapsed + " ms average ", count / (elapsed / 1000), "TPS");
  };

  obj = new events.EventEmitter();

  doit = function() {
    return setTimeout(function() {
      if (++count >= max) {
        return finish();
      }
    }, 0);
  };

  for (i = _i = 1; 1 <= max ? _i <= max : _i >= max; i = 1 <= max ? ++_i : --_i) {
    doit();
  }

}).call(this);

@kriskowal
Copy link
Owner

It’s true, @domenic’s response was terse and hurried. I’ll apologize on his behalf as his co-maintainer. Let’s try to keep it technical, where we all do best.

I agree that the order of magnitude of difference is suspicious. We’ll have do a profile and see where it went wrong.

Thank you for updating with JavaScript. It does help me since I don’t use much CoffeeScript tooling. I know Domenic does and his comments were for my benefit.

@domenic
Copy link
Collaborator

domenic commented May 21, 2013

You need to call doit from inside doit to simulate a chain of promises. You can't just run a bunch of setTimeouts in parallel.

@tsgautier
Copy link
Author

That's what I did in my first example. But then I simplified it to just using when/then and see the same timing. Note that these last two examples aren't using q - I was just trying to write some async code that is performing the same # of operations to measure it's performance.

Keep in mind I am well aware that micro-benchmarks are always dangerous and one should be immediately skeptical - I get that.

But - I think I get what you are saying - my simulation of chained promises isn't very good. Nonetheless, let's focus on the simple case, it seems to exhibit similar characteristics, so it doesn't seem to require a chain.

@domenic
Copy link
Collaborator

domenic commented May 21, 2013

To be clearer: you need nested (recursive) setTimeouts to simulate nested promises. Your latest code does not contain any nested setTimeout calls.

@tsgautier
Copy link
Author

Right, let's focus on just running a bunch of promises in parallel as in this code where I am running q.when/then in parallel and timing the total time it takes for all the parallel events to resolve. The code below, in a micro-benchmark over 10k parallel ops, took ~3s to complete.

doit = function() {
    return q.when('value').then(function() {
      if (++count >= max) {
        return finish();
      }
    });
  };

@robotlolita
Copy link

Here's a proper test case for this. It executes 100 parallel computations asynchronously, where each computation is made up of 5 sequential asynchronous operations. Q is faster because my callback version is naïve and each asynchronous operation is executed in the next event loop, whereas in Q this is optimised internally.

correction: Q is only faster on Firefox. In Chromium the callback version is faster.

http://jsperf.com/callbacks-vs-promises-a

@tsgautier
Copy link
Author

I think it's better if we don't worry about the chain of promises. While that simulates my actual use case more accurately, after several iterations of different tests on this thread it seems clear that just a simple benchmark of n asynchronous callbacks or n q resolves will also show similar behavior, and is easier to understand in a micro-benchmark scenario. And I think writing just a simple loop of n async callbacks or n q resolves is easier to understand and examine.

@domenic
Copy link
Collaborator

domenic commented Jun 6, 2013

Using the branch that turns off long stack traces by default (see #311) and running this test suite, I get the results

                      A single simple async operation
          49,311 op/s » with an immediately-fulfilled promise
         436,358 op/s » with direct setImmediate usage
              64 op/s » with direct setTimeout(…, 0)

                      1000 operations in parallel
              48 op/s » with immediately-fulfilled promises
             897 op/s » with direct setImmediate usage

Q is slower than we would like, but only about 10x slower than callbacks. Running many promises in parallel seems to make Q about 20x slower.

@sobotklp
Copy link

sobotklp commented Jun 6, 2013

We've been using q@0.8.2 for a moderately large API (~800 reqs/sec) with persistent websocket connections. Upon upgrading to q@0.9.4, we noticed a ~400% increase in CPU use as well as leaking memory. Aside from using Q.fcall() instead of Q.call(), there were no changes to the way we're using Q promises. Unfortunately, we couldn't find a fix for those issues other than reverting to the older version of q.

@domenic
Copy link
Collaborator

domenic commented Jun 6, 2013

@sobotklp you appear to have introduced long stack traces in your upgrade. You can turn them off with Q.longStackJumpLimit = 0. In version 0.9.5 they will be off by default.

@sobotklp
Copy link

sobotklp commented Jun 6, 2013

@domenic cool thanks! :) I'll give that a try.

@domenic
Copy link
Collaborator

domenic commented Jul 9, 2013

I think it's time to close this. Thanks for prompting us to finally get a performance test suite up and running. If you have any further cases that you think are slower than they should be, we'd appreciate them either over in #312, or even better as pull requests that add to the existing suite.

@domenic domenic closed this as completed Jul 9, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants