core: optimize timers, conn handler and responses for throughput under high concurrency #94

Bogdanp · 2020-06-07T17:36:19Z

This change improves the web server's throughput with many (1k+) concurrent connections. Of the bunch, the most important changes are those to the timer manager and the change to stop peeking in handle-connection/cm. I'd recommend looking at the commits individually for readability.

The change to add-missing-headers also fixes an issue: previously, the check if a header was already set on the response was case-sensitive.

On my local machine (macOS, 2.4GHz), this app performs as follows when 4k requests per second are made for a minute after it's been warmed up:

master:

$ cat targets | vegeta attack -duration=60s -rate 4000/s | vegeta report
Requests      [total, rate, throughput]         240000, 4000.02, 3999.85
Duration      [total, attack, wait]             1m0s, 1m0s, 2.585ms
Latencies     [min, mean, 50, 90, 95, 99, max]  542.112µs, 3.733ms, 3.649ms, 4.776ms, 5.147ms, 5.946ms, 70.843ms
Bytes In      [total, mean]                     1440000, 6.00
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:240000
Error Set:

this branch:

$ cat targets | vegeta attack -duration=60s -rate 4000/s | vegeta report
Requests      [total, rate, throughput]         240000, 4000.02, 4000.00
Duration      [total, attack, wait]             1m0s, 1m0s, 288.141µs
Latencies     [min, mean, 50, 90, 95, 99, max]  174.083µs, 295.654µs, 271.062µs, 381.418µs, 462.48µs, 646.391µs, 5.472ms
Bytes In      [total, mean]                     1440000, 6.00
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:240000
Error Set:

On one of my servers, running the TechEmpower Benchmarks against the two branches (racket is running master and racket-perf is running this branch) yields:

https://www.techempower.com/benchmarks/#section=test&shareid=669bfab7-9242-4c26-8921-a4fe9ccd8530&hw=ph&test=composite&a=2

The maximum throughput does not increase by much (only about 4k/s in the json and plaintext benchmarks), but take a look at the Data Table tab for each test, in particular the plaintext test. Performance now suffers less when there are many concurrent connections.

There are more changes that could be made, but this is all I had time for this weekend and I think these are some nice gains.

`seconds->date' takes a second parameter controlling whether or not to return a UTC date; this is much faster than generating a date in the local timezone. Instead of formatting a string and then converting it to bytes, we write the result directly to a bytestring, avoiding allocations and parsing of the format string.

This improves throughput by about 500 qps on my machine and read-request already effectively does the same thing that the peek was doing: it waits for request data to come in and raises an exception if the connection is closed.

samth

This is great!

samth · 2020-06-08T13:43:28Z