Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: optimize timers, conn handler and responses for throughput under high concurrency #94

Merged
merged 14 commits into from Jun 16, 2020

Conversation

Bogdanp
Copy link
Sponsor Contributor

@Bogdanp Bogdanp commented Jun 7, 2020

This change improves the web server's throughput with many (1k+) concurrent connections. Of the bunch, the most important changes are those to the timer manager and the change to stop peeking in handle-connection/cm. I'd recommend looking at the commits individually for readability.

The change to add-missing-headers also fixes an issue: previously, the check if a header was already set on the response was case-sensitive.

On my local machine (macOS, 2.4GHz), this app performs as follows when 4k requests per second are made for a minute after it's been warmed up:

  • master:
$ cat targets | vegeta attack -duration=60s -rate 4000/s | vegeta report
Requests      [total, rate, throughput]         240000, 4000.02, 3999.85
Duration      [total, attack, wait]             1m0s, 1m0s, 2.585ms
Latencies     [min, mean, 50, 90, 95, 99, max]  542.112µs, 3.733ms, 3.649ms, 4.776ms, 5.147ms, 5.946ms, 70.843ms
Bytes In      [total, mean]                     1440000, 6.00
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:240000
Error Set:
  • this branch:
$ cat targets | vegeta attack -duration=60s -rate 4000/s | vegeta report
Requests      [total, rate, throughput]         240000, 4000.02, 4000.00
Duration      [total, attack, wait]             1m0s, 1m0s, 288.141µs
Latencies     [min, mean, 50, 90, 95, 99, max]  174.083µs, 295.654µs, 271.062µs, 381.418µs, 462.48µs, 646.391µs, 5.472ms
Bytes In      [total, mean]                     1440000, 6.00
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:240000
Error Set:

On one of my servers, running the TechEmpower Benchmarks against the two branches (racket is running master and racket-perf is running this branch) yields:

https://www.techempower.com/benchmarks/#section=test&shareid=669bfab7-9242-4c26-8921-a4fe9ccd8530&hw=ph&test=composite&a=2

The maximum throughput does not increase by much (only about 4k/s in the json and plaintext benchmarks), but take a look at the Data Table tab for each test, in particular the plaintext test. Performance now suffers less when there are many concurrent connections.

There are more changes that could be made, but this is all I had time for this weekend and I think these are some nice gains.

`seconds->date' takes a second parameter controlling whether or not to
return a UTC date; this is much faster than generating a date in the
local timezone.

Instead of formatting a string and then converting it to bytes, we
write the result directly to a bytestring, avoiding allocations and
parsing of the format string.
This improves throughput by about 500 qps on my machine and
read-request already effectively does the same thing that the peek was
doing: it waits for request data to come in and raises an exception if
the connection is closed.
Copy link
Sponsor Member

@samth samth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great!

(void)]
[(list-rest (list-rest start end) rest)
(define out (connection-o-port conn))
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this definition out of the loop

web-server-lib/web-server/http/response.rkt Show resolved Hide resolved
@@ -163,14 +207,13 @@
;; a responder can run indefinitely as long as it writes
;; *something* every (current-send-timeout) seconds.
(reset-connection-response-send-timeout! conn)
(fprintf to-client "~a\r\n" (number->string bytes-read-or-eof 16))
(cprintf to-client #"~a\r\n" (number->string bytes-read-or-eof 16))
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if a number->bytes function would be useful

Copy link
Sponsor Contributor Author

@Bogdanp Bogdanp Jun 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the standard library? I think it would be. I feel like I've had to write something like it in a few libraries.

@@ -348,17 +392,18 @@
(match ranges
[(list)
; Final boundary (must start on new line; ends with a new line)
(fprintf (connection-o-port conn) "--~a--\r\n" boundary)
(cprintf (connection-o-port conn) #"--~a--\r\n" boundary)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is boundary guaranteed not to be #f here?

Copy link
Sponsor Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A boundary is always generated in output-file (line 306) when there is more than one range. The then clause of the if surrounding this loop handles the case where there is a single range and there are never 0 ranges: if the request doesn't specify a list of ranges, then a single range for the entire file is created at line 346.

web-server-lib/web-server/http/response.rkt Show resolved Hide resolved
[#"Connection" #"close"])
empty)
hs)))
(define-syntax (add-missing-headers stx)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this mostly used with empty initial headers? I think that case could be optimized to not do any of the checking here.

Copy link
Sponsor Contributor Author

@Bogdanp Bogdanp Jun 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used to add the headers iff they don't exist so I think the checking is necessary since an app could provide its own versions of these headers.

;; Limitation on this add-timer: thunk cannot make timer
;; requests directly, because it's executed directly by
;; the timer-manager thread
(define ch (make-async-channel))
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My recollection is that async-channels are relatively heavyweight -- I wonder if this could be supported more directly by the event system.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be changed to use the thread's mailbox. This code was written before thread mailboxes and we don't need to the async channel to be separate for the servicing thread.

@Bogdanp
Copy link
Sponsor Contributor Author

Bogdanp commented Jun 8, 2020

Thanks for the comments! I'll make changes to address them this weekend.

@Bogdanp
Copy link
Sponsor Contributor Author

Bogdanp commented Jun 14, 2020

I've addressed all the comments. The numbers now look like this:

Requests      [total, rate, throughput]         240000, 4000.03, 4000.01
Duration      [total, attack, wait]             1m0s, 1m0s, 265.431µs
Latencies     [min, mean, 50, 90, 95, 99, max]  168.712µs, 261.615µs, 248.794µs, 309.237µs, 354.592µs, 541.572µs, 3.149ms
Bytes In      [total, mean]                     1440000, 6.00
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:240000
Error Set:

Copy link
Sponsor Member

@samth samth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great.

#:mutable)
(define-logger web-server/timer)

(struct timer-manager (thd))
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this struct necessary now?

Copy link
Sponsor Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but I kept it to minimize the changes since it probably doesn't add much overhead (relative to other things). I'll remove it if necessary when I do some more profiling.

(log-web-server/timer-debug "timers: ~a, next deadline in ~.sms" (hash-count timers) (- first-deadline (current-inexact-milliseconds)))
(sync
(handle-evt
(thread-receive-evt)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if this produced the message, instead of the event, so that you didn't have to deal with the thread system twice in this code.

Copy link
Sponsor Contributor Author

@Bogdanp Bogdanp Jun 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I've written enough code like this now that I'm used to it, but the behavior of thread-receive-evt was definitely surprising initially. Looking at thread/thread.rkt, it wouldn't be hard to add a variant that dequeues a message, but I haven't been able to come up with a good name for it. Calling it thread-dequeue-evt feels like it would cause confusion, but I haven't been able to come up with anything better.

@jeapostrophe
Copy link
Contributor

This looks ready to me; are there any other plans?

@Bogdanp
Copy link
Sponsor Contributor Author

Bogdanp commented Jun 16, 2020

I plan to do some more profiling in the coming weeks but I don't know exactly when that will be. In the mean time, I think these can be merged.

@jeapostrophe jeapostrophe merged commit 4210bea into racket:master Jun 16, 2020
@jeapostrophe
Copy link
Contributor

Awesome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants