Benchmarks #4

Open
X4 opened this Issue Dec 11, 2011 · 6 comments

Comments

Projects
None yet
4 participants
@X4

X4 commented Dec 11, 2011

Hi Mr. Schwartz

I've read your announcement by luck http://www.sencha.com/forum/showthread.php?160128-Announcing-SilkJS
and found your description of your benchmark results a little misleading. I was curious and tested it myself.

It would make sense to share your:
Machine Specs (+cores)
Kernel parameters (if any)
NIC Bandwidth
File size of test-file (100Byte, 1KB, 512KB 1MB)

so that comparing becomes easier, in the case someone has the same machine/setup. It also helps to optimize your server.

I can recommend weighttp. ab is single-threaded and utilizes only one core/cpu.
Your server doesn't scale linearly, thus varying req/s dependent on req# and concurrence level is normal.
Enabling keep-alive also further improves results.

I get about 4.8k to 5k req/s on a 1.3GHz Core2Duo :) I know it's weak, but hey I wanted to share my results.
On weighttp with the same parameters on a heavily optimized nginx I get 27k req/s, on a heavily optimized lighttpd I get 23k req/s and on G-WAN without optimization I get 56k req/s. I am sorry, I didn't have had the chance to test nodejs yet.

$: ab -t 30 -c 50 -k http://localhost:9090/anchor.png
...
Server Software:        SILK
Server Hostname:        localhost
Server Port:            9090

Document Path:          /anchor.png
Document Length:        523 bytes

Concurrency Level:      50
Time taken for tests:   10.402 seconds
Complete requests:      50000
Failed requests:        0
Write errors:           0
Keep-Alive requests:    50000
Total transferred:      37700000 bytes
HTML transferred:       26150000 bytes
Requests per second:    4806.58 [#/sec] (mean)
Time per request:       10.402 [ms] (mean)
Time per request:       0.208 [ms] (mean, across all concurrent requests)
Transfer rate:          3539.22 [Kbytes/sec] received

Connection Times (ms)
          min  mean[+/-sd] median   max
Connect:        0    1  50.3      0    3005
Processing:     0   10   9.5      8     176
Waiting:        0   10   9.5      8     176
Total:          0   10  52.0      8    3094

Percentage of the requests served within a certain time (ms)
  50%      8
  66%     12
  75%     15
  80%     17
  90%     21
  95%     25
  98%     30
  99%     32
 100%   3094 (longest request)



$: weighttp -n 100000 -c 100 -t 2 -k "http://localhost:9090/anchor.png"
...
finished in 19 sec, 787 millisec and 667 microsec, 5053 req/s, 3721 kbyte/s
requests: 100000 total, 100000 started, 100000 done, 100000 succeeded, 0 failed, 0 errored
traffic: 75400000 bytes total, 23100000 bytes http, 52300000 bytes data

Btw. apache bench ignores the -t flag ;)

I think using 250 workers is a little naive, because the time lost for context-switches is enormous, it's better to map threads to cpu's. But that's my humble opinion, tell me when I'm wrong :)
On a 6Core XEON processor for example you can use up to 10 pthreads after that you won't notice an improvement, but a slow decrease in performance.

Cheers!

@mschwartz

This comment has been minimized.

Show comment Hide comment
@mschwartz

mschwartz Dec 11, 2011

Owner

Your email is somewhat confusing to me.

4806 requests/second seems very good to me, for the limited processing power of your machine. The .208ms mean is sub 1ms, which is quite good.

You may edit the config.js file in htttpd/ directory and change the numChildren value to something less than 250 if you like.

However, if you are going to test with 50 concurrent connections, you will need at least 50 children to serve them.

Enabling keep-alive will always improve results because your client program isn't having to build up and tear down sockets each request. That is a fairly expensive operation.

I'd be interested in seeing similar benchmarks on your hardware against Apache, NodeJS, lighthttpd, nginx, and/or whatever other servers you have available.

I think you are wrong about 250 children being naive and the context switches. The vast majority of the time was spent sending the data, and the processes block during that. There is no penalty for context switching since the OS won't process your blocked processes.

I'd also point out that there are no pthreads in SilkJS, just pure OS processes. Each process is fully isolated from the others via the MMU.

Regards,

On Dec 10, 2011, at 7:27 PM, Fernandos wrote:

Hi Mr. Schwartz

I've read your announcement by luck http://www.sencha.com/forum/showthread.php?160128-Announcing-SilkJS
and found your description of your benchmark results curious. It would make sense to share your:

Machine Specs (+cores)
Kernel parameters (if any)
NIC Bandwidth
File size of test-file (100Byte, 1KB, 512KB 1MB)

I can recommend weighttp. ab is single-threaded and utilizes only one core/cpu.
Your server doesn't scale linearly, thus varying req/s dependent on req# and concurrence level is normal.
Enabling keep-alive also further improves results.

I get about 4.8k to 5k req/s on a 1.3 DualCore :) I know it's weak, but hey I wanted to share my results you.
$: ab -t 30 -c 50 -k http://localhost:9090/anchor.png
...
Server Software: SILK
Server Hostname: localhost
Server Port: 9090

Document Path: /anchor.png
Document Length: 523 bytes

Concurrency Level: 50
Time taken for tests: 10.402 seconds
Complete requests: 50000
Failed requests: 0
Write errors: 0
Keep-Alive requests: 50000
Total transferred: 37700000 bytes
HTML transferred: 26150000 bytes
Requests per second: 4806.58 #/sec
Time per request: 10.402 ms
Time per request: 0.208 [ms](mean, across all concurrent requests)
Transfer rate: 3539.22 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 1 50.3 0 3005
Processing: 0 10 9.5 8 176
Waiting: 0 10 9.5 8 176
Total: 0 10 52.0 8 3094

Percentage of the requests served within a certain time (ms)
50% 8
66% 12
75% 15
80% 17
90% 21
95% 25
98% 30
99% 32
100% 3094 (longest request)

$: weighttp -n 100000 -c 100 -t 2 -k "http://localhost:9090/anchor.png"
...
finished in 19 sec, 787 millisec and 667 microsec, 5053 req/s, 3721 kbyte/s
requests: 100000 total, 100000 started, 100000 done, 100000 succeeded, 0 failed, 0 errored
traffic: 75400000 bytes total, 23100000 bytes http, 52300000 bytes data

I think using 250 workers is naive, because the time lost for context-switches is enormous, it's better to map threads to cpu's.
On a 6Core XEON processor for example you can use up to 10 pthreads after that you won't notice an improvement, but a slow decrease in performance.

Cheers :)


Reply to this email directly or view it on GitHub:
#4

Owner

mschwartz commented Dec 11, 2011

Your email is somewhat confusing to me.

4806 requests/second seems very good to me, for the limited processing power of your machine. The .208ms mean is sub 1ms, which is quite good.

You may edit the config.js file in htttpd/ directory and change the numChildren value to something less than 250 if you like.

However, if you are going to test with 50 concurrent connections, you will need at least 50 children to serve them.

Enabling keep-alive will always improve results because your client program isn't having to build up and tear down sockets each request. That is a fairly expensive operation.

I'd be interested in seeing similar benchmarks on your hardware against Apache, NodeJS, lighthttpd, nginx, and/or whatever other servers you have available.

I think you are wrong about 250 children being naive and the context switches. The vast majority of the time was spent sending the data, and the processes block during that. There is no penalty for context switching since the OS won't process your blocked processes.

I'd also point out that there are no pthreads in SilkJS, just pure OS processes. Each process is fully isolated from the others via the MMU.

Regards,

On Dec 10, 2011, at 7:27 PM, Fernandos wrote:

Hi Mr. Schwartz

I've read your announcement by luck http://www.sencha.com/forum/showthread.php?160128-Announcing-SilkJS
and found your description of your benchmark results curious. It would make sense to share your:

Machine Specs (+cores)
Kernel parameters (if any)
NIC Bandwidth
File size of test-file (100Byte, 1KB, 512KB 1MB)

I can recommend weighttp. ab is single-threaded and utilizes only one core/cpu.
Your server doesn't scale linearly, thus varying req/s dependent on req# and concurrence level is normal.
Enabling keep-alive also further improves results.

I get about 4.8k to 5k req/s on a 1.3 DualCore :) I know it's weak, but hey I wanted to share my results you.
$: ab -t 30 -c 50 -k http://localhost:9090/anchor.png
...
Server Software: SILK
Server Hostname: localhost
Server Port: 9090

Document Path: /anchor.png
Document Length: 523 bytes

Concurrency Level: 50
Time taken for tests: 10.402 seconds
Complete requests: 50000
Failed requests: 0
Write errors: 0
Keep-Alive requests: 50000
Total transferred: 37700000 bytes
HTML transferred: 26150000 bytes
Requests per second: 4806.58 #/sec
Time per request: 10.402 ms
Time per request: 0.208 [ms](mean, across all concurrent requests)
Transfer rate: 3539.22 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 1 50.3 0 3005
Processing: 0 10 9.5 8 176
Waiting: 0 10 9.5 8 176
Total: 0 10 52.0 8 3094

Percentage of the requests served within a certain time (ms)
50% 8
66% 12
75% 15
80% 17
90% 21
95% 25
98% 30
99% 32
100% 3094 (longest request)

$: weighttp -n 100000 -c 100 -t 2 -k "http://localhost:9090/anchor.png"
...
finished in 19 sec, 787 millisec and 667 microsec, 5053 req/s, 3721 kbyte/s
requests: 100000 total, 100000 started, 100000 done, 100000 succeeded, 0 failed, 0 errored
traffic: 75400000 bytes total, 23100000 bytes http, 52300000 bytes data

I think using 250 workers is naive, because the time lost for context-switches is enormous, it's better to map threads to cpu's.
On a 6Core XEON processor for example you can use up to 10 pthreads after that you won't notice an improvement, but a slow decrease in performance.

Cheers :)


Reply to this email directly or view it on GitHub:
#4

@X4

This comment has been minimized.

Show comment Hide comment
@X4

X4 Dec 11, 2011

Thank you for giving a quick response :)

I'd also point out that there are no pthreads in SilkJS, just pure OS processes.
Oh yes I know, I saw in gdb that gwan uses pthreads for example and I know that pthreads have become very lightweight, compared to earlier.

Ok, sorry I didn't know you can configure the number of children.

Allright, I can benchmark Apache, NodeJS etc. soon and release the results in a paste. It'll be an apples vs oranges benchmark though, because gwan, nodejs and silkjs are application servers and nginx, lighttp and apache are pure servers.
I was just noting that you can further optimize your server :) Checkout https://github.com/vendu/OS-Zero/ the zmallock implementation there is pretty efficient, I've been told that it's even faster than jemalloc.

X4 commented Dec 11, 2011

Thank you for giving a quick response :)

I'd also point out that there are no pthreads in SilkJS, just pure OS processes.
Oh yes I know, I saw in gdb that gwan uses pthreads for example and I know that pthreads have become very lightweight, compared to earlier.

Ok, sorry I didn't know you can configure the number of children.

Allright, I can benchmark Apache, NodeJS etc. soon and release the results in a paste. It'll be an apples vs oranges benchmark though, because gwan, nodejs and silkjs are application servers and nginx, lighttp and apache are pure servers.
I was just noting that you can further optimize your server :) Checkout https://github.com/vendu/OS-Zero/ the zmallock implementation there is pretty efficient, I've been told that it's even faster than jemalloc.

@mschwartz

This comment has been minimized.

Show comment Hide comment
@mschwartz

mschwartz Dec 11, 2011

Owner

Thanks for the OS-Zero tip. I'll definitely look at it.

V8 doesn't support threading, or SilkJS would be pthreaded instead of
pre-fork...

Cheers

On Sun, Dec 11, 2011 at 10:09 AM, Fernandos <
reply@reply.github.com

wrote:

Thank you for giving a quick response :)

I'd also point out that there are no pthreads in SilkJS, just pure OS
processes.
Oh yes I know, I saw in gdb that gwan uses pthreads for example and I know
that pthreads have become very lightweight, compared to earlier.

Ok, sorry I didn't know you can configure the number of children.

Allright, I can benchmark Apache, NodeJS etc. soon and release the results
in a paste.
I was just noting that you can further optimize your server :) Checkout
https://github.com/vendu/OS-Zero/ the zmallock implementation there is
pretty efficient, I've been told that it's even faster than jemalloc.


Reply to this email directly or view it on GitHub:
#4 (comment)

Owner

mschwartz commented Dec 11, 2011

Thanks for the OS-Zero tip. I'll definitely look at it.

V8 doesn't support threading, or SilkJS would be pthreaded instead of
pre-fork...

Cheers

On Sun, Dec 11, 2011 at 10:09 AM, Fernandos <
reply@reply.github.com

wrote:

Thank you for giving a quick response :)

I'd also point out that there are no pthreads in SilkJS, just pure OS
processes.
Oh yes I know, I saw in gdb that gwan uses pthreads for example and I know
that pthreads have become very lightweight, compared to earlier.

Ok, sorry I didn't know you can configure the number of children.

Allright, I can benchmark Apache, NodeJS etc. soon and release the results
in a paste.
I was just noting that you can further optimize your server :) Checkout
https://github.com/vendu/OS-Zero/ the zmallock implementation there is
pretty efficient, I've been told that it's even faster than jemalloc.


Reply to this email directly or view it on GitHub:
#4 (comment)

@nathanaschbacher

This comment has been minimized.

Show comment Hide comment
@nathanaschbacher

nathanaschbacher Apr 22, 2012

You could run V8 Isolates in a pthread like threads_a_gogo does in Node. No?

You could run V8 Isolates in a pthread like threads_a_gogo does in Node. No?

@mschwartz

This comment has been minimized.

Show comment Hide comment
@mschwartz

mschwartz Apr 22, 2012

Owner

I saw this about NodeJS:

https://groups.google.com/forum/?fromgroups#!topic/nodejs/zLzuo292hX0

Seems they wanted to implement V8 Isolates, then backed all that code out of the main code base.

From what I've read about Isolates, you still need to Locker around entering JavaScript context, so you end up with a big contention for the lock.

SilkJS was originially entirely pthread based, but for C++ pages (not JavaScript). I truly wish V8 had the ability to have multiple threads concurrently running in the same context. There would be no preforking in that case, just pre-threading.

Owner

mschwartz commented Apr 22, 2012

I saw this about NodeJS:

https://groups.google.com/forum/?fromgroups#!topic/nodejs/zLzuo292hX0

Seems they wanted to implement V8 Isolates, then backed all that code out of the main code base.

From what I've read about Isolates, you still need to Locker around entering JavaScript context, so you end up with a big contention for the lock.

SilkJS was originially entirely pthread based, but for C++ pages (not JavaScript). I truly wish V8 had the ability to have multiple threads concurrently running in the same context. There would be no preforking in that case, just pre-threading.

@coderbuzz

This comment has been minimized.

Show comment Hide comment
@coderbuzz

coderbuzz Jul 13, 2013

Here is my quick benchmarks

HP ProBook 4420s - Intel i5 CPU 2.67GHz, 4.00 GB RAM
Debian Crunchbang Linux x32

$ ab -t 30 -c 50 -k http://127.0.0.1/anchor.png
Apache/2.2.22 (Debian) Server at 127.0.0.1 Port 80

  • Requests per second: 8421.41 #/sec

$ ab -t 30 -c 50 -k http://127.0.0.1:9090/anchor.png
SilkJS Server at 127.0.0.1 Port 9090

  • Requests per second: 8752.05 #/sec

$ ab -t 30 -c 50 -k http://127.0.0.1:8000/anchor.png
Nodejs Server at 127.0.0.1 Port 8000

  • Requests per second: 2117.88 #/sec

$ ab -t 30 -c 50 -k http://127.0.0.1:8000/anchor.png
Nodejs Server at 127.0.0.1 Port 8000 - Cluster 4 Core CPU

  • Requests per second: 4274.60 #/sec

*UPDATE:

$ ab -t 30 -c 50 -k http://127.0.0.1:8080/anchor.png
G-WAN Server at 127.0.0.1 Port 8080
Requests per second: 84900.89 #/sec

HP ProBook 4420s - Intel i5 CPU 2.67GHz, 4.00 GB RAM
Windows 8 x64

ab -t 30 -c 50 -k http://127.0.0.1:9000/anchor.png
Pashero 32bit Server at 127.0.0.1 Port 9000

  • Requests per second: 11034.23 #/sec

Here is my quick benchmarks

HP ProBook 4420s - Intel i5 CPU 2.67GHz, 4.00 GB RAM
Debian Crunchbang Linux x32

$ ab -t 30 -c 50 -k http://127.0.0.1/anchor.png
Apache/2.2.22 (Debian) Server at 127.0.0.1 Port 80

  • Requests per second: 8421.41 #/sec

$ ab -t 30 -c 50 -k http://127.0.0.1:9090/anchor.png
SilkJS Server at 127.0.0.1 Port 9090

  • Requests per second: 8752.05 #/sec

$ ab -t 30 -c 50 -k http://127.0.0.1:8000/anchor.png
Nodejs Server at 127.0.0.1 Port 8000

  • Requests per second: 2117.88 #/sec

$ ab -t 30 -c 50 -k http://127.0.0.1:8000/anchor.png
Nodejs Server at 127.0.0.1 Port 8000 - Cluster 4 Core CPU

  • Requests per second: 4274.60 #/sec

*UPDATE:

$ ab -t 30 -c 50 -k http://127.0.0.1:8080/anchor.png
G-WAN Server at 127.0.0.1 Port 8080
Requests per second: 84900.89 #/sec

HP ProBook 4420s - Intel i5 CPU 2.67GHz, 4.00 GB RAM
Windows 8 x64

ab -t 30 -c 50 -k http://127.0.0.1:9000/anchor.png
Pashero 32bit Server at 127.0.0.1 Port 9000

  • Requests per second: 11034.23 #/sec
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment