-
Notifications
You must be signed in to change notification settings - Fork 281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(connlib): decrease connection setup latency #4022
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
Terraform Cloud Plan Output
|
Performance Test ResultsTCP
UDP
|
This is being dealt with in algesten/str0m#477. |
This has been merged, we just need to update our fork now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure I'm understanding this from a high level:
snownet
uses a single-threaded event loop to manage things (like a game rendering engine)- The rate at which we wake up the loop to process new data is fixed with a timer
- Previously, we waited until the next tick to handle the new candidates that may have arrived between our last tick and now
- With this PR, we "interrupt" instead, and reset the timer for the main eventloop back to 0.
Do I have that right?
rust/connlib/tunnel/src/lib.rs
Outdated
fn poll_timeout(&mut self, cx: &mut Context<'_>) -> Poll<()> { | ||
if let Some(timeout) = self.node.poll_timeout() { | ||
let timeout = tokio::time::Instant::from_std(timeout); | ||
|
||
if timeout != self.timeout.deadline() { | ||
self.timeout.as_mut().reset(timeout) | ||
} | ||
} | ||
|
||
Poll::Pending | ||
ready!(self.timeout.poll_unpin(cx)); | ||
self.node.handle_timeout(self.timeout.deadline().into()); | ||
|
||
Poll::Ready(()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay so this is exactly like quinn-proto, this tells the driver when to next wake up the state machine for a timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes!
rust/connlib/tunnel/src/lib.rs
Outdated
sockets: Sockets::new()?, | ||
stats_timer: tokio::time::interval(Duration::from_secs(60)), | ||
timeout: Box::pin(tokio::time::sleep_until(Instant::now().into())), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is when the connection is first starting? Why is the timeout set to now
? Just because it must have some value, and now
is a value that will be in the past so it won't have a spurious wakeup later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it would be nicer to set it to None
and thus correctly handle the (hypothetical) case of Node::poll_timeout
returning None
.
|
||
cx.waker().wake_by_ref(); | ||
fn poll_timeout(&mut self, cx: &mut Context<'_>) -> Poll<()> { | ||
if let Some(timeout) = self.node.poll_timeout() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Node has its own timeout in addition to the ConnectionState?
What does the ConnectionState's timeout do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Node
's poll_timeout
only indicates the next time handle_timeout
needs to be called and will always return the same timeout until something changes in its internal state, it's not a future just a Duration
.
ConnectionState
's timeout is a future, which is used to keep track of Node
's timeout and have the executor poll the Tunnel
when it happens so we can send it down to Node
to handle it.
// After any state change, check what the new timeout is and reset it if necessary. | ||
if self.connections_state.poll_timeout(cx).is_ready() { | ||
cx.waker().wake_by_ref() | ||
} | ||
|
||
Poll::Pending |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just because I'm still getting used to the poll_
style, do I understand this right?
- We call
poll_timeout
on thisconnections_state
object and pass it our context, so if it needs to register a wakeup, it can do that - If the connections state has something ready, we do a
wake_by_ref
to tell the driver / runtime, "Immediately call me again, there might be more work to do" - We return
Pending
because whateverconnections_state
needs to do is already handled above here and we don't want to duplicate calling it in these other two blocks. But probably the runtime will immediately wake up our task again, we'll checkconnections_state
, and we'll either tick it or return someReady
it returned to us. - In general if we return
Ready
or have a subtask (sub state machine?) that's ready, the caller should keep polling us until we returnPending
and neither us nor our subtasks register an immediate wakeup? e.g. if there were no timeouts somehow, we would be polled until everything was done, and then only wake up when a new packet comes in? - And the runtime doesn't actually have a concept of "Keep polling until" because
wake_by_ref
is how we tell it, "I know I'm sayingPending
but call me again as soon as you can schedule me, I may have more work to do" - I'm guessing
connections_state
manages several connections, so internally itspoll_timeout
checks on something like a timer heap, so if we have 5 connections, it just tells us which timeout will happen earliest. - All this runs in a single thread because the runtime overhead of locks and the dev overhead of multi-threading (having to deal with everything being
&self
and mixing sync and async) is not worth it, since we are not CPU-bound. If we needed an outrageous amount of bandwidth, like 1 Tbps, or if cryptography was more expensive, one CPU core would not suffice, but for our case of ballpark 1 Gbps, there's no need to use extra cores?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you've got it all right :)
The runtime will end up polling a future which is actually never Ready
because that would shut it down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment, the eventloops are still a bit fragile in regards to the polling rules but I am hoping to refactor that and make it crystal clear and easy to follow!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- All this runs in a single thread because the runtime overhead of locks and the dev overhead of multi-threading (having to deal with everything being
&self
and mixing sync and async) is not worth it, since we are not CPU-bound. If we needed an outrageous amount of bandwidth, like 1 Tbps, or if cryptography was more expensive, one CPU core would not suffice, but for our case of ballpark 1 Gbps, there's no need to use extra cores?
That is the bet yeah and so far it seems to turn out okay :)
There is even more optimisation potential like io-uring to further reduce CPU overhead. It would be interesting to know, what link speeds you need before this design fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some questions trying to make sure I understand it. Also:
- Are there tests in CI that show the latency decreasing?
- Is the
str0m
dep already pointing to theirmain
or does that get updated separately? You mentioned that they merged a patch, but I don't see a recent release on their repo.
Replying to Jamil's comment:
There can't be a fixed wake-up like the 60 FPS graphics tick in a game engine, right?
The 50 ms limit just means that some part of the code refused to 'tick' any faster than every 50 ms, so if some handshake needed 14 ticks to complete, that adds up to 700 ms? But in general if there are no incoming packets, it can sleep for any number of seconds until the next timeout, which is probably a timeout for a keepalive packet, right?
I assume there's something like a timer heap internally, and then the runtime drives it by polling something like Quinn's poll_timeout
(https://docs.rs/quinn-proto/0.10.6/quinn_proto/struct.Connection.html#method.poll_timeout) and setting an async timer to sleep until that timeout?
I think I clicked the wrong button in Github and made 5 reviews. Oops. Anyway I approve it. |
Not a test directly but together with the other PR of printing the setup time, it can be observed in the logs.
It is not yet updated no because of the timeline on when I opened this PR and the PR to str0m. |
@jamilbk Almost correct.
|
493e3d5
to
09b3663
Compare
snownet
is built in a SANS-IO way, which means it doesn't have internal timers or IO. It is up to the upper layer to correctly checkpoll_timeout
and callhandle_timeout
as soon as that expires. When we want to be called again (i.e. the result ofpoll_timeout
) may change every timesnownet
s internal state changes. This is especially critical during the initial setup of a connection.As we learn about our own candidates and candidates from the other party, we form new pairs. To actually detect whether the pair is a viable network path, we need to send a STUN request. When to send STUN requests is controlled by time. A newly formed pair should send a STUN request as soon as possible to minimize latency.
Previously, we did not update the timer upon which we "wake"
snownet
usinghandle_timeout
. As such, we waited unnecessarily long before sending STUN requests to newly formed pairs. With this patch, we checkpoll_timeout
at end of theTunnel
'spoll
function and immediately callhandle_timeout
in case we need to.Currently,
str0m
throttles updates tohandle_timeout
in 50ms blocks which still creates some delay. With that commented out, I observed improvements of ~0.7s for establishing new connections. Most of the time, the 2nd ping already goes through!