feat(connlib): decrease connection setup latency #4022

thomaseizinger · 2024-03-07T03:29:53Z

snownet is built in a SANS-IO way, which means it doesn't have internal timers or IO. It is up to the upper layer to correctly check poll_timeout and call handle_timeout as soon as that expires. When we want to be called again (i.e. the result of poll_timeout) may change every time snownets internal state changes. This is especially critical during the initial setup of a connection.

As we learn about our own candidates and candidates from the other party, we form new pairs. To actually detect whether the pair is a viable network path, we need to send a STUN request. When to send STUN requests is controlled by time. A newly formed pair should send a STUN request as soon as possible to minimize latency.

Previously, we did not update the timer upon which we "wake" snownet using handle_timeout. As such, we waited unnecessarily long before sending STUN requests to newly formed pairs. With this patch, we check poll_timeout at end of the Tunnel's poll function and immediately call handle_timeout in case we need to.

Currently, str0m throttles updates to handle_timeout in 50ms blocks which still creates some delay. With that commented out, I observed improvements of ~0.7s for establishing new connections. Most of the time, the 2nd ping already goes through!

vercel · 2024-03-07T03:29:59Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment

Name	Status	Preview	Comments	Updated (UTC)
firezone	⬜️ Ignored (Inspect)	Visit Preview		Mar 9, 2024 9:08am

github-actions · 2024-03-07T03:32:04Z

Terraform Cloud Plan Output

Plan: 8 to add, 7 to change, 8 to destroy.

Terraform Cloud Plan

github-actions · 2024-03-07T03:42:34Z

Performance Test Results

TCP

Test Name	Received/s	Sent/s	Retransmits
direct-tcp-client2server	62.1 MiB (-3%)	62.6 MiB (-3%)	68 (+33%)
direct-tcp-server2client	53.3 MiB (-2%)	53.6 MiB (-2%)	37 (-5%)
relayed-tcp-client2server	29.4 MiB (+1%)	31.7 MiB (+1%)	67 (-66%)
relayed-tcp-server2client	27.8 MiB (-0%)	30.6 MiB (+1%)	741 (+187%)

UDP

Test Name	Total/s	Jitter	Lost
direct-udp-client2server	50.0 MiB (-0%)	0.21ms (+12%)	0.21% (+Infinity%)
direct-udp-server2client	50.0 MiB (-0%)	0.19ms (-16%)	0.00% (NaN%)
relayed-udp-client2server	50.0 MiB (-0%)	0.35ms (+5%)	19.37% (+2%)
relayed-udp-server2client	50.0 MiB (-0%)	0.85ms (+106%)	27.60% (+8%)

thomaseizinger · 2024-03-07T04:35:42Z

Currently, str0m throttles updates to handle_timeout in 50ms blocks which still creates some delay. With that commented out, I observed improvements of ~0.7s for establishing new connections. Most of the time, the 2nd ping already goes through!

This is being dealt with in algesten/str0m#477.

thomaseizinger · 2024-03-08T08:05:00Z

Currently, str0m throttles updates to handle_timeout in 50ms blocks which still creates some delay. With that commented out, I observed improvements of ~0.7s for establishing new connections. Most of the time, the 2nd ping already goes through!

This is being dealt with in algesten/str0m#477.

This has been merged, we just need to update our fork now.

jamilbk

Just to make sure I'm understanding this from a high level:

snownet uses a single-threaded event loop to manage things (like a game rendering engine)
The rate at which we wake up the loop to process new data is fixed with a timer
Previously, we waited until the next tick to handle the new candidates that may have arrived between our last tick and now
With this PR, we "interrupt" instead, and reset the timer for the main eventloop back to 0.

Do I have that right?

ReactorScram · 2024-03-08T15:35:14Z

rust/connlib/tunnel/src/lib.rs

+    fn poll_timeout(&mut self, cx: &mut Context<'_>) -> Poll<()> {
+        if let Some(timeout) = self.node.poll_timeout() {
+            let timeout = tokio::time::Instant::from_std(timeout);
+
+            if timeout != self.timeout.deadline() {
+                self.timeout.as_mut().reset(timeout)
+            }
        }

-        Poll::Pending
+        ready!(self.timeout.poll_unpin(cx));
+        self.node.handle_timeout(self.timeout.deadline().into());
+
+        Poll::Ready(())


Okay so this is exactly like quinn-proto, this tells the driver when to next wake up the state machine for a timeout?

ReactorScram · 2024-03-08T15:36:34Z

rust/connlib/tunnel/src/lib.rs

            sockets: Sockets::new()?,
            stats_timer: tokio::time::interval(Duration::from_secs(60)),
+            timeout: Box::pin(tokio::time::sleep_until(Instant::now().into())),


This is when the connection is first starting? Why is the timeout set to now? Just because it must have some value, and now is a value that will be in the past so it won't have a spurious wakeup later?

I guess it would be nicer to set it to None and thus correctly handle the (hypothetical) case of Node::poll_timeout returning None.

ReactorScram · 2024-03-08T15:45:51Z

rust/connlib/tunnel/src/lib.rs


-            cx.waker().wake_by_ref();
+    fn poll_timeout(&mut self, cx: &mut Context<'_>) -> Poll<()> {
+        if let Some(timeout) = self.node.poll_timeout() {


The Node has its own timeout in addition to the ConnectionState?

What does the ConnectionState's timeout do?

Node's poll_timeout only indicates the next time handle_timeout needs to be called and will always return the same timeout until something changes in its internal state, it's not a future just a Duration.

ConnectionState's timeout is a future, which is used to keep track of Node's timeout and have the executor poll the Tunnel when it happens so we can send it down to Node to handle it.

ReactorScram · 2024-03-08T15:52:01Z

rust/connlib/tunnel/src/lib.rs

+        // After any state change, check what the new timeout is and reset it if necessary.
+        if self.connections_state.poll_timeout(cx).is_ready() {
+            cx.waker().wake_by_ref()
+        }
+
        Poll::Pending


Just because I'm still getting used to the poll_ style, do I understand this right?

We call poll_timeout on this connections_state object and pass it our context, so if it needs to register a wakeup, it can do that

If the connections state has something ready, we do a wake_by_ref to tell the driver / runtime, "Immediately call me again, there might be more work to do"

We return Pending because whatever connections_state needs to do is already handled above here and we don't want to duplicate calling it in these other two blocks. But probably the runtime will immediately wake up our task again, we'll check connections_state, and we'll either tick it or return some Ready it returned to us.

In general if we return Ready or have a subtask (sub state machine?) that's ready, the caller should keep polling us until we return Pending and neither us nor our subtasks register an immediate wakeup? e.g. if there were no timeouts somehow, we would be polled until everything was done, and then only wake up when a new packet comes in?

And the runtime doesn't actually have a concept of "Keep polling until" because wake_by_ref is how we tell it, "I know I'm saying Pending but call me again as soon as you can schedule me, I may have more work to do"

I'm guessing connections_state manages several connections, so internally its poll_timeout checks on something like a timer heap, so if we have 5 connections, it just tells us which timeout will happen earliest.

All this runs in a single thread because the runtime overhead of locks and the dev overhead of multi-threading (having to deal with everything being &self and mixing sync and async) is not worth it, since we are not CPU-bound. If we needed an outrageous amount of bandwidth, like 1 Tbps, or if cryptography was more expensive, one CPU core would not suffice, but for our case of ballpark 1 Gbps, there's no need to use extra cores?

Yes you've got it all right :)

The runtime will end up polling a future which is actually never Ready because that would shut it down.

At the moment, the eventloops are still a bit fragile in regards to the polling rules but I am hoping to refactor that and make it crystal clear and easy to follow!

All this runs in a single thread because the runtime overhead of locks and the dev overhead of multi-threading (having to deal with everything being &self and mixing sync and async) is not worth it, since we are not CPU-bound. If we needed an outrageous amount of bandwidth, like 1 Tbps, or if cryptography was more expensive, one CPU core would not suffice, but for our case of ballpark 1 Gbps, there's no need to use extra cores?

That is the bet yeah and so far it seems to turn out okay :)

There is even more optimisation potential like io-uring to further reduce CPU overhead. It would be interesting to know, what link speeds you need before this design fails.

ReactorScram

I left some questions trying to make sure I understand it. Also:

Are there tests in CI that show the latency decreasing?
Is the str0m dep already pointing to their main or does that get updated separately? You mentioned that they merged a patch, but I don't see a recent release on their repo.

Replying to Jamil's comment:

There can't be a fixed wake-up like the 60 FPS graphics tick in a game engine, right?

The 50 ms limit just means that some part of the code refused to 'tick' any faster than every 50 ms, so if some handshake needed 14 ticks to complete, that adds up to 700 ms? But in general if there are no incoming packets, it can sleep for any number of seconds until the next timeout, which is probably a timeout for a keepalive packet, right?

I assume there's something like a timer heap internally, and then the runtime drives it by polling something like Quinn's poll_timeout (https://docs.rs/quinn-proto/0.10.6/quinn_proto/struct.Connection.html#method.poll_timeout) and setting an async timer to sleep until that timeout?

ReactorScram · 2024-03-08T16:00:56Z

I think I clicked the wrong button in Github and made 5 reviews. Oops. Anyway I approve it.

rust/connlib/tunnel/src/lib.rs

thomaseizinger · 2024-03-08T20:13:16Z

Are there tests in CI that show the latency decreasing?

Not a test directly but together with the other PR of printing the setup time, it can be observed in the logs.

Is the str0m dep already pointing to their main or does that get updated separately? You mentioned that they merged a patch, but I don't see a recent release on their repo.

It is not yet updated no because of the timeline on when I opened this PR and the PR to str0m.

thomaseizinger · 2024-03-08T20:21:08Z

snownet uses a single-threaded event loop to manage things (like a game rendering engine)

The rate at which we wake up the loop to process new data is fixed with a timer

Previously, we waited until the next tick to handle the new candidates that may have arrived between our last tick and now

With this PR, we "interrupt" instead, and reset the timer for the main eventloop back to 0.

@jamilbk Almost correct.

snownet doesn't have an eventloop at all, just the tunnel does. A snownet::Node is just a state machine. The eventloop just calls it and translates it into side-effects.
We don't reset the timer to 0 but to whatever the Node wants it to be.

…#4047) As part of implementing #4022, I realised that I probably composed all the SANS-IO components in the wrong way. This PR fixes this and adds some documentation on why. Interestingly, this makes several things of the design simpler because we end up passing around less state.

thomaseizinger requested a review from conectado March 7, 2024 03:29

thomaseizinger mentioned this pull request Mar 7, 2024

feat: immediately queue STUN request after forming pair algesten/str0m#476

Closed

thomaseizinger mentioned this pull request Mar 8, 2024

refactor(snownet): compose lower-level SANS-IO components differently #4047

Merged

thomaseizinger requested review from jamilbk and ReactorScram March 8, 2024 08:03

jamilbk approved these changes Mar 8, 2024

View reviewed changes

ReactorScram reviewed Mar 8, 2024

View reviewed changes

ReactorScram approved these changes Mar 8, 2024

View reviewed changes

conectado reviewed Mar 8, 2024

View reviewed changes

rust/connlib/tunnel/src/lib.rs Outdated Show resolved Hide resolved

thomaseizinger added 2 commits March 9, 2024 19:58

Always check poll_timeout after changing state

bd812c6

Make timeout optional

09b3663

thomaseizinger force-pushed the feat/connlib/always-poll-timeout branch from 493e3d5 to 09b3663 Compare March 9, 2024 09:08

thomaseizinger enabled auto-merge March 9, 2024 09:09

thomaseizinger added this pull request to the merge queue Mar 9, 2024

Merged via the queue into main with commit a2f289f Mar 9, 2024
64 checks passed

thomaseizinger deleted the feat/connlib/always-poll-timeout branch March 9, 2024 09:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(connlib): decrease connection setup latency #4022

feat(connlib): decrease connection setup latency #4022

thomaseizinger commented Mar 7, 2024

vercel bot commented Mar 7, 2024 •

edited

Loading

github-actions bot commented Mar 7, 2024 •

edited

Loading

github-actions bot commented Mar 7, 2024 •

edited

Loading

thomaseizinger commented Mar 7, 2024

thomaseizinger commented Mar 8, 2024

jamilbk left a comment

ReactorScram Mar 8, 2024 •

edited

Loading

thomaseizinger Mar 8, 2024

ReactorScram Mar 8, 2024

thomaseizinger Mar 8, 2024

thomaseizinger Mar 8, 2024

ReactorScram Mar 8, 2024

conectado Mar 8, 2024

ReactorScram Mar 8, 2024 •

edited

Loading

thomaseizinger Mar 8, 2024

thomaseizinger Mar 8, 2024

thomaseizinger Mar 8, 2024 •

edited

Loading

ReactorScram left a comment

ReactorScram commented Mar 8, 2024

thomaseizinger commented Mar 8, 2024

thomaseizinger commented Mar 8, 2024 •

edited

Loading

feat(connlib): decrease connection setup latency #4022

feat(connlib): decrease connection setup latency #4022

Conversation

thomaseizinger commented Mar 7, 2024

vercel bot commented Mar 7, 2024 • edited Loading

github-actions bot commented Mar 7, 2024 • edited Loading

Terraform Cloud Plan Output

github-actions bot commented Mar 7, 2024 • edited Loading

Performance Test Results

TCP

UDP

thomaseizinger commented Mar 7, 2024

thomaseizinger commented Mar 8, 2024

jamilbk left a comment

Choose a reason for hiding this comment

ReactorScram Mar 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ReactorScram Mar 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thomaseizinger Mar 8, 2024 • edited Loading

Choose a reason for hiding this comment

ReactorScram left a comment

Choose a reason for hiding this comment

ReactorScram commented Mar 8, 2024

thomaseizinger commented Mar 8, 2024

thomaseizinger commented Mar 8, 2024 • edited Loading

vercel bot commented Mar 7, 2024 •

edited

Loading

github-actions bot commented Mar 7, 2024 •

edited

Loading

github-actions bot commented Mar 7, 2024 •

edited

Loading

ReactorScram Mar 8, 2024 •

edited

Loading

ReactorScram Mar 8, 2024 •

edited

Loading

thomaseizinger Mar 8, 2024 •

edited

Loading

thomaseizinger commented Mar 8, 2024 •

edited

Loading