New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core: optimize timers, conn handler and responses for throughput under high concurrency #94
Conversation
`seconds->date' takes a second parameter controlling whether or not to return a UTC date; this is much faster than generating a date in the local timezone. Instead of formatting a string and then converting it to bytes, we write the result directly to a bytestring, avoiding allocations and parsing of the format string.
This improves throughput by about 500 qps on my machine and read-request already effectively does the same thing that the peek was doing: it waits for request data to come in and raises an exception if the connection is closed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great!
(void)] | ||
[(list-rest (list-rest start end) rest) | ||
(define out (connection-o-port conn)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this definition out of the loop
@@ -163,14 +207,13 @@ | |||
;; a responder can run indefinitely as long as it writes | |||
;; *something* every (current-send-timeout) seconds. | |||
(reset-connection-response-send-timeout! conn) | |||
(fprintf to-client "~a\r\n" (number->string bytes-read-or-eof 16)) | |||
(cprintf to-client #"~a\r\n" (number->string bytes-read-or-eof 16)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if a number->bytes
function would be useful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the standard library? I think it would be. I feel like I've had to write something like it in a few libraries.
@@ -348,17 +392,18 @@ | |||
(match ranges | |||
[(list) | |||
; Final boundary (must start on new line; ends with a new line) | |||
(fprintf (connection-o-port conn) "--~a--\r\n" boundary) | |||
(cprintf (connection-o-port conn) #"--~a--\r\n" boundary) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is boundary
guaranteed not to be #f
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A boundary is always generated in output-file
(line 306) when there is more than one range. The then
clause of the if
surrounding this loop handles the case where there is a single range and there are never 0 ranges: if the request doesn't specify a list of ranges, then a single range for the entire file is created at line 346.
[#"Connection" #"close"]) | ||
empty) | ||
hs))) | ||
(define-syntax (add-missing-headers stx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this mostly used with empty initial headers? I think that case could be optimized to not do any of the checking here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used to add the headers iff they don't exist so I think the checking is necessary since an app could provide its own versions of these headers.
;; Limitation on this add-timer: thunk cannot make timer | ||
;; requests directly, because it's executed directly by | ||
;; the timer-manager thread | ||
(define ch (make-async-channel)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My recollection is that async-channels are relatively heavyweight -- I wonder if this could be supported more directly by the event system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be changed to use the thread's mailbox. This code was written before thread mailboxes and we don't need to the async channel to be separate for the servicing thread.
Thanks for the comments! I'll make changes to address them this weekend. |
I've addressed all the comments. The numbers now look like this:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great.
#:mutable) | ||
(define-logger web-server/timer) | ||
|
||
(struct timer-manager (thd)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this struct necessary now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, but I kept it to minimize the changes since it probably doesn't add much overhead (relative to other things). I'll remove it if necessary when I do some more profiling.
(log-web-server/timer-debug "timers: ~a, next deadline in ~.sms" (hash-count timers) (- first-deadline (current-inexact-milliseconds))) | ||
(sync | ||
(handle-evt | ||
(thread-receive-evt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if this produced the message, instead of the event, so that you didn't have to deal with the thread system twice in this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I've written enough code like this now that I'm used to it, but the behavior of thread-receive-evt
was definitely surprising initially. Looking at thread/thread.rkt
, it wouldn't be hard to add a variant that dequeues a message, but I haven't been able to come up with a good name for it. Calling it thread-dequeue-evt
feels like it would cause confusion, but I haven't been able to come up with anything better.
This looks ready to me; are there any other plans? |
I plan to do some more profiling in the coming weeks but I don't know exactly when that will be. In the mean time, I think these can be merged. |
Awesome! |
This change improves the web server's throughput with many (1k+) concurrent connections. Of the bunch, the most important changes are those to the timer manager and the change to stop peeking in
handle-connection/cm
. I'd recommend looking at the commits individually for readability.The change to
add-missing-headers
also fixes an issue: previously, the check if a header was already set on the response was case-sensitive.On my local machine (macOS, 2.4GHz), this app performs as follows when 4k requests per second are made for a minute after it's been warmed up:
On one of my servers, running the TechEmpower Benchmarks against the two branches (
racket
is running master andracket-perf
is running this branch) yields:https://www.techempower.com/benchmarks/#section=test&shareid=669bfab7-9242-4c26-8921-a4fe9ccd8530&hw=ph&test=composite&a=2
The maximum throughput does not increase by much (only about 4k/s in the json and plaintext benchmarks), but take a look at the
Data Table
tab for each test, in particular the plaintext test. Performance now suffers less when there are many concurrent connections.There are more changes that could be made, but this is all I had time for this weekend and I think these are some nice gains.