Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

performance, 4x slower that raw node http server? #107

Closed
twojcik opened this Issue · 34 comments

6 participants

@twojcik

Hi,
I'd like to use node-restify in my project, but simple benchmark shows it's about 4x slower than raw node server.
I had about 4k rps for restify and 12-13k for raw node (I used cluster to utilze all cpu cores).
I know 4k is not bad, but I need very high throughput api. Are there any options in restify I could turn off to get better perf ?(I've turned off md5 hashing, but don't know what else I could turn off)

@mcavage
Owner

Hi,

I haven't benchmarked it lately; there definitely is a penalty perf hit to pick up routing (RegEx's are awesome, but slow). Also - what all interceptors are you using?(i.e., if you're contrasting restify with content parsing to node "raw" byte streaming, you need to account for that).

Some gists to illustrate what you're comparing would be helpful.

m

@twojcik

Hi ,
Thanks. Code for restify server I used for benchmark attached below. I must admit that my api will have actually only single endpoint, so probably will use some custom made solution.
Cheers,
Tadek

var cluster = require('cluster');
var restify = require('restify');

if (cluster.isMaster) {

for (var i = 0; i < 8; i++) {
cluster.fork();
}

cluster.on('death', function(worker) {
console.log('worker ' + worker.pid + ' died');
});
} else {
var server = restify.createServer();

server.use(restify.queryParser());
server.use(restify.bodyParser());

server.get('/event',function handle(req, res, next){
// res.keepAlive=true;

res.end("hello world\n");

});
server.listen(9080, function() {
console.log('listening: %s', server.url);
});
}

@mcavage
Owner

Ok - and what was the corresponding node server and client?

It's fine if you don't use it ;) - I do want to know what it is you're saying is degraded though (well, like I said, there's some fixed cost, but it shouldn't be 4x).

m

@twojcik

To run benchmark I used ab tool ( ab -n 60000 -c 200 http://localhost:9080/).

Raw node server looked very similar to restify:

var cluster = require('cluster');
var http = require('http');

if (cluster.isMaster) {

for (var i = 0; i < 8; i++) {
cluster.fork();
}

cluster.on('death', function(worker) {
console.log('worker ' + worker.pid + ' died');
});
} else {
http.Server(function(req, res) {
res.writeHead(200);
res.end("hello world\n");
}).listen(8000);
}

I've benchmarked also node with POST request and results were similar.
var cluster = require('cluster');
var http = require('http');
if (cluster.isMaster) {

for (var i = 0; i < 8; i++) {
cluster.fork();
}

cluster.on('death', function(worker) {
console.log('worker ' + worker.pid + ' died');
});
} else {

http.createServer(function(request, response) {

  if ( request.method === 'POST' ) {
      // the body of the POST is JSON payload.
      var data = '';
      request.addListener('data', function(chunk) { data += chunk; });
      request.addListener('end', function() {
          var d= JSON.parse(data);
          response.writeHead(200, {'content-type': 'text/plain' });
          response.end("post");
      });
  }
  else
  {
          response.writeHead(200, {'content-type': 'text/plain' });
          response.end("get");
  }

}).listen(8888);

}

For me if cost(perf degradation) would be not more that 20% I would go with restify (and maybe will but not for that particular api endpoint)
Thanks,
Tadek

@mcavage
Owner

Ok, thanks Tadek.

I'll look into this this week. I have a suspicion of a few places that can be cleaned up. A 4x degradation is definitely not acceptable - I too look for a 10-20% overhead :)

~Mark

@twojcik

Thanks Mark.

I know that my benchmark was basically hello world, but my use case is acutally very similar to hello world in terms on computing on server and sending back information (I'll just issue query to Redis and send response with few chars).
I'd love to see only 10-20% overhead when using restify:)
Tadek

Tadek

@mcavage
Owner

Hi Tadek,

So, I ran some preliminary tests here, but in VMWare (I wanted to run on an Illumos derivative so I could get DTrace). Here's my sample apps (which I took from yours):

Basic Node Server

var cluster = require('cluster');
var http = require('http');
var os = require('os');
var url = require('url');

if (cluster.isMaster) {
    for (var i = 0; i < os.cpus().length; i++)
        cluster.fork();

    return cluster.on('death', function(worker) {
        console.error('worker %s died', worker.pid);
    });
}

var server = http.createServer(function (req, res) {
    var u = url.parse(req.url, true);
    var data = JSON.stringify({hello: u.query.name});
    res.writeHead(200, {
        'content-type': 'application/json',
        'content-length': data
    });
    res.end(data);
});

server.listen(8080);

Restify Server

var cluster = require('cluster');
var os = require('os');

var restify = require('restify');

if (cluster.isMaster) {
    for (var i = 0; i < os.cpus().length; i++)
        cluster.fork();


    return cluster.on('death', function(worker) {
        console.error('worker %d died', worker.pid);
    });
}

var server = restify.createServer();
server.use(restify.queryParser());
server.get('/', function handle(req, res, next) {
    res.contentType = 'json';
    res.send({hello: req.params.name});
});

server.listen(9080, function() {
    console.log('listening: %s', server.url);
});

And then I ran your exact ab test, and got a ~10% degradation for restify (in terms of throughput - there is a latency hit I want to understand better):

Basic

[mark@88be5e8a-1053-40ab-8ec9-c2d6f8c89bbb ~]$ ab -n 60000 -c 200 http://127.0.0.1:8080/?name=mark
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)
Completed 6000 requests
Completed 12000 requests
Completed 18000 requests
Completed 24000 requests
Completed 30000 requests
Completed 36000 requests
Completed 42000 requests
Completed 48000 requests
Completed 54000 requests
Completed 60000 requests
Finished 60000 requests


Server Software:        
Server Hostname:        127.0.0.1
Server Port:            8080

Document Path:          /?name=mark
Document Length:        16 bytes

Concurrency Level:      200
Time taken for tests:   79.199 seconds
Complete requests:      60000
Failed requests:        4
   (Connect: 4, Receive: 0, Length: 0, Exceptions: 0)
Write errors:           0
Total transferred:      7200000 bytes
HTML transferred:       960000 bytes
Requests per second:    757.58 [#/sec] (mean)
Time per request:       263.998 [ms] (mean)
Time per request:       1.320 [ms] (mean, across all concurrent requests)
Transfer rate:          88.78 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0  255 471.1      0    1164
Processing:     0    7  13.1      1     127
Waiting:        0    5  10.4      1     126
Total:          0  262 468.4      1    1170

Percentage of the requests served within a certain time (ms)
  50%      1
  66%     32
  75%     51
  80%   1128
  90%   1130
  95%   1135
  98%   1140
  99%   1141
 100%   1170 (longest request)
[mark@88be5e8a-1053-40ab-8ec9-c2d6f8c89bbb ~]$

restify

[mark@88be5e8a-1053-40ab-8ec9-c2d6f8c89bbb ~]$ ab -n 60000 -c 200 http://127.0.0.1:9080/?name=mark
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)
Completed 6000 requests
Completed 12000 requests
Completed 18000 requests
Completed 24000 requests
Completed 30000 requests
Completed 36000 requests
Completed 42000 requests
Completed 48000 requests
Completed 54000 requests
Completed 60000 requests
Finished 60000 requests


Server Software:        restify
Server Hostname:        127.0.0.1
Server Port:            9080

Document Path:          /?name=mark
Document Length:        16 bytes

Concurrency Level:      200
Time taken for tests:   87.934 seconds
Complete requests:      60000
Failed requests:        2
   (Connect: 2, Receive: 0, Length: 0, Exceptions: 0)
Write errors:           0
Total transferred:      31980255 bytes
HTML transferred:       960000 bytes
Requests per second:    682.33 [#/sec] (mean)
Time per request:       293.115 [ms] (mean)
Time per request:       1.466 [ms] (mean, across all concurrent requests)
Transfer rate:          355.16 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0  265 481.1      0    3395
Processing:     1   26  54.2      3     369
Waiting:        0   25  53.6      3     369
Total:          1  291 474.1      8    3551

Percentage of the requests served within a certain time (ms)
  50%      8
  66%     92
  75%    222
  80%   1129
  90%   1136
  95%   1141
  98%   1147
  99%   1159
 100%   3551 (longest request)
[mark@88be5e8a-1053-40ab-8ec9-c2d6f8c89bbb ~]$

So there's certainly some latency hit going on for restify in the 50%/average case, not so much at the P99 level (P100 gets skewed, but I'm not very concerned about that). There was also a CPU utilization increase, from roughly 10% to 15%. Anyway, there's certainly some work I can do to profile and make this better, but I'm not seeing a 4X degradation. I'm going to get some real hardware next and look at it there, but I figured I'd at least share my initial numbers.

@twojcik

Thanks Mark for interest in this. I've run benchmark on Ubuntu 12 with no virtualization at all, my CPU is I7 920 2.66Ghz (4 cores with HT).
I've rerun benchmarks using your code, results below:

Raw node.js server:

This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 5000 requests
Completed 10000 requests
Completed 15000 requests
Completed 20000 requests
Completed 25000 requests
Completed 30000 requests
Completed 35000 requests
Completed 40000 requests
Completed 45000 requests
Completed 50000 requests
Finished 50000 requests


Server Software:        
Server Hostname:        localhost
Server Port:            8080

Document Path:          /?name=tadek
Document Length:        17 bytes

Concurrency Level:      200
Time taken for tests:   4.090 seconds
Complete requests:      50000
Failed requests:        0
Write errors:           0
Total transferred:      7950000 bytes
HTML transferred:       850000 bytes
Requests per second:    12224.37 [#/sec] (mean)
Time per request:       16.361 [ms] (mean)
Time per request:       0.082 [ms] (mean, across all concurrent requests)
Transfer rate:          1898.12 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        1    6   1.2      6      12
Processing:     1   10   3.3     10      34
Waiting:        1    8   3.2      8      32
Total:          3   16   3.5     16      40

Percentage of the requests served within a certain time (ms)
  50%     16
  66%     17
  75%     18
  80%     19
  90%     21
  95%     23
  98%     26
  99%     28
 100%     40 (longest request)

Restify:

This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 5000 requests
Completed 10000 requests
Completed 15000 requests
Completed 20000 requests
Completed 25000 requests
Completed 30000 requests
Completed 35000 requests
Completed 40000 requests
Completed 45000 requests
Completed 50000 requests
Finished 50000 requests

Server Software: restify
Server Hostname: localhost
Server Port: 9080

Document Path: /?name=tadek
Document Length: 17 bytes

Concurrency Level: 200
Time taken for tests: 12.678 seconds
Complete requests: 50000
Failed requests: 0
Write errors: 0
Total transferred: 26700253 bytes
HTML transferred: 850000 bytes
Requests per second: 3943.78 #/sec
Time per request: 50.713 ms
Time per request: 0.254 ms
Transfer rate: 2056.64 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.7 0 17
Processing: 1 50 49.0 35 505
Waiting: 1 50 49.0 35 505
Total: 1 50 49.0 36 505

Percentage of the requests served within a certain time (ms)
50% 36
66% 55
75% 70
80% 81
90% 115
95% 146
98% 185
99% 224
100% 505 (longest request)

CPU utilization for each core was about 12% for restify , for raw node it was about 8%, memory consumption for raw node was also smaller - about 26-35 MB, for restify 48-50 MB.

To summarize raw node >12K rps, and time per request: 0.082 ms, restify about 4k rps and time per request: 0.254 ms.
I agree, it's not 4x but little more than 3x what is still a lot.
(sorry for bad formatting)

@mcavage
Owner

Hmm...well you're getting dramatically different raw numbers than me :\

I'll get an ubuntu up and see what that does.

@mcavage
Owner

Hi Tadek,

Ok - so I'm still not getting the raw numbers you are (I was running Linux under KVM), which is a separate problem, but I do see on Linux specifically restify is ~3.5x worse than "raw" node. So at least I can recreate this degradation, and AFAICT, it's platform specific, which is weird, but I'll start looking into it.

m

@twojcik

Mark, I see, so basically when running on SmartOS I should get only ~10% lower perf? I probably will use joyent/smartos as hosting provider so if it doesn't happen on Solaris if would work perfectly for me:)
Does that degradation can be connected with DTraceProvider?
I get warnings in terminal when running restify that:

[Error: Cannot find module './DTraceProviderBindings']

Off topic:
When using ab with keep alive (-k option) I get:
restify: 4379.51 #/sec
raw node server: 40185.69 [#/sec]

Raw node server gives me 40k rps what is 10x of restify.

@mcavage
Owner

Well, that's what I'm seeing. I still haven't recreated your raw numbers of 12k out of even raw node, in either place. What sort of configuration of machines and ab did you use to get that? Was it on the same box or a different one? (in all of these cases there's still tons of CPU for node to burn, so, theoretically I should be getting more). So

And yeah, I would suspect there's something dramatic happening with the way restify is handling the absence of the DTrace module on Linux, and if that's true, this will be an easy fix.

@twojcik

I didn't made any special tweaks to my ubuntu(12.04) installation, only thing I did was increasing number of open file descriptors which is 1024 by default. I used the same machine to run both benchmark and node.js server (I know it's not best way to do benchmark). ab was run as : ab -c 200 -n 50000 http://localhost:8080/?name=tadek . In benchmarks CPU utilization wasn't higher that 12-15%.

@mcavage
Owner

Well - I'm not super concerned about whether it's good/bad to run this on the same box. Clearly it's not pegging CPU, and it takes the network out of the question. At any rate, I'm not seeing those numbers, and I don't have a "real" server to go try ubuntu easily available. I have a spare box I can get going though, so I'll get to it- for now I'm going to just get the delta in-line on Linux to around 10% (i.e., what it is everywhere else). The DTrace bit wasn't it - I hacked up a version that took it out and it's not a changed profile.

@mcavage
Owner

Oh - and thanks for confirming your stack, I'll just keep going with that.

@twojcik

Thanks Mark. If it's not DTrace , I'm really curious what it could be.

@twojcik

BTW. I used node v0.6.6 to run benchmarks as restify doesn't install/work on v.07.6.

@mcavage
Owner

Oh, that's weird re:0.7.6 - what's the error? (I've been using it on mac).

The numbers I've been running are on 0.6.13.

@twojcik

I get following error when installin via npm install restify:
ERR! error rolling back restify@1.3.0 Error: ENOTEMPTY, rmdir '/home/tadek/node_modules/restify/node_modules/http-signature/node_modules/sprintf'
ERR! Unsupported
ERR! Not compatible with your version of node/npm: ctype@0.3.1
ERR! Required: {"node":"0.4 || 0.5 || 0.6"}
ERR! Actual: {"npm":"1.1.8","node":"0.7.6"}
ERR!
ERR! System Linux 3.2.0-18-generic
ERR! command "/home/tadek/.nvm/v0.7.6/bin/node" "/home/tadek/.nvm/v0.7.6/bin/npm" "install" "restify"

@mcavage
Owner

Oh! Right, it's fixed in master, not npm. Sorry.

@twojcik

Good to know ! Thanks!

@mcavage
Owner

Hey Tadek,

Ok, so after burning my afternoon getting profiling working here, I'm not seeing anything pop out. I commented out a bunch of code in restify, and didn't see improvements, so, this is really not obvious what's afoot. I'll keep looking, but I kind of have to timebox it. I.e., I don't have it fixed today. I'll get back to it in a day or so when I clear out some other things.

m

@twojcik

Sure no problem. If it works OK on smartos it's fine for me. I've just run few quick benchmarks on joyentcloud.com with raw nodejs server and when using keep alive mode of ab it performs quite ok, but without that it's quite slow . I've tested both machines with two and four cores, in analytics it showed very spiky charts, probably due to CPU bursting, but I'm not sure. If you're interested I could post some results tomorrow.

@mcavage
Owner
@mcavage
Owner

Hi Tadek,

Ok, I did a lot of tweaking of code in the last commit to hopefully bring this up to par. So, if you take the res.send magic API out of the path, there's now about a 25-30% hit over a "raw" node api (i.e., routing). There's probably not much more I can do to increase the performance of that code path, as ultimate I have to do "stuff". res.send is better, but still a big degradation over the raw node APIs (mostly because it does a lot of magic). I'll keep plugging away at some tricks to make that faster, but I wanted to at least get this to a point where it's (hopefully) acceptable for you.

So, the general gist is, if you have some performance critical portion of your app and some "control plane" stuff where perf isn't as big a deal, just stay off the res.send api in the former, and do whatever in the latter. So, for example, here's the test case I'm still relying on:

var cluster = require('cluster');
var http = require('http');
var os = require('os');
var url = require('url');

var Logger = require('bunyan');
var restify = require('./lib');

if (cluster.isMaster) {
    for (var i = 0; i < os.cpus().length; i++)
        cluster.fork();


    return cluster.on('death', function(worker) {
        console.error('worker %d died', worker.pid);
    });
}


var server;
if (process.argv[2] === 'restify') {
  console.error('restify server');

  server = restify.createServer({
    log: new Logger({
      level: 'warn',
      stream: process.stderr,
      name: 'gh107'
    })
  });

  server.get('/hello/:name', function handle(req, res, next) {
    var data = JSON.stringify({hello: req.params.name});
    res.writeHead(200, {
        'content-type': 'application/json',
        'content-length': data.length
    });
    res.end(data);
  });
} else {
  console.error('raw node');
  server = http.createServer(function (req, res) {
    var u = url.parse(req.url);
    var data = JSON.stringify({hello: u.query.name});
    res.writeHead(200, {
        'content-type': 'application/json',
        'content-length': data.length
    });
    res.end(data);
  });
}


server.listen(8080, function () {
  console.log('Up');
});

Doing something like that gives me a roughly 20-30% decrease in the restify case over the node case (I'm getting around 13k requests/second out of the restify server vs around 17k requests/second in the simpler path on my MBP).

Basically to make this faster, I did what I've been avoiding doing, which is tacking all the helper APIs back onto the node HTTP prototype, as opposed to keeping it cleanly separated before with defineGetter's. Turns out those are a big hit in V8 (the previous version of restify just tacked onto the prototype, which is why it was faster than 1.0). Anyway, it's more or less restored back to that in master now. I'm going to hold off publishing this for a bit until I'm sure it's stable, since I've made some invasive changes. You can dep on it via npm with the stanza below, in the meantime.

"dependencies": {
    "restify": "git://github.com/mcavage/node-restify.git#4d8f00071941e7e706c3092576721b502082310b"
}
@twojcik

Thanks Mark !
I've benchmarked both again using wrk tool (it issues keep-alived requests). Results below. I can see difference between raw node server and resstify, but indeed much smaller than initially ! Great work, thanks!

wrk -t10 -c1000 -r1m http://localhost:8080/?name=test

Raw node:

Making 1000000 requests to http://localhost:8080/?name=test
10 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 20.18ms 6.54ms 51.30ms 80.46%
Req/Sec 4.31k 596.42 5.00k 59.62%
1000064 requests in 20.36s, 126.85MB read
Socket errors: connect 0, read 0, write 0, timeout 37

Requests/sec: 49123.82
Transfer/sec: 6.23MB

wrk -t10 -c1000 -r1m http://localhost:8080/hello/test

Restify:

Making 1000000 requests to http://localhost:8080/hello/test
10 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 34.47ms 16.13ms 122.42ms 82.32%
Req/Sec 2.69k 496.75 3.00k 70.47%
1000061 requests in 32.96s, 141.15MB read
Socket errors: connect 0, read 0, write 0, timeout 199

Requests/sec: 30346.02
Transfer/sec: 4.28MB
@spmason

Any idea when this is going to make its way onto stable/npm?

FYI I came looking for this issue after reading this article: http://blog.perfectapi.com/2012/benchmarking-apis-using-perfectapi-vs-express.js-vs-restify.js/ - restify does really bad vs express/perfectapi in his benchmark, these fixes should fix this by the sounds of it..

@mcavage
Owner

Hi,

Yeah, I've been using it for a while in my internal things out of master directly and it's stable. 1.4.0 is npm as of right now :)

Yeah, the meat of the slowness in what's stable now is the whole extension vs tacking onto the prototype. So, it should be better (that said, it's on my short-term plan to really try to get more performance out of the restify core. I really think it should be possible to get <= 10% degradation of a "raw" node server).

m

@spmason

Great news! Thanks!

@DukeyToo

I am the author of the benchmarks at http://blog.perfectapi.com/2012/benchmarking-apis-using-perfectapi-vs-express.js-vs-restify.js/

I reran the tests today on a medium EC2 instance and restify performance is indeed much improved, and is now on a par (and perhaps slightly better) than expressjs and perfectapi.

The basic data from the new benchmarks is at
https://gist.github.com/2361848

@mcavage
Owner
@snostorm

4d8f000 made it so that when no query string is provided in a request that req.query becomes undefined versus the previous empty object (req.params doesn't have this issue, but I use queryParser({mapParams: false});)

I can post more details and an example a little later. Potentially in a new ticket, though this problem was related to this original issue.

My quick workaround was to add an empty object setting level of middleware (coffee-script below)

server.use (req, res, next) ->
    req.query or= {}
    next()
@mcavage
Owner

Hi snostorm,

I opened #124 to track that. I'm going to resolve this perf bug for now.

@mcavage mcavage closed this
@mohdejaz

hello,

Is this perf issue resolved?

I'm seeing same slow performance vs raw node javascript.

Could author provide what was the fix ?

Regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.