feat(connlib): introduce `Session::reconnect` #4116

thomaseizinger · 2024-03-13T01:51:28Z

I ended up calling it reconnect because that is really what we are doing:

We reconnect to the portal.
We "reconnect" to all relays, i.e. refresh the allocations.

I decided not to use an ICE restart. An ICE restart clears the local as well as the remote credentials, meaning we would need to run another instance of the signalling protocol. The current control plane does not support this and it is also unnecessary in our situation. In the case of an actual network change (e.g. WiFI to cellular), refreshing of the allocations will turn up new candidates as that is how we discovered our original ones in the first place. Because we constantly operate in ICE trickle mode, those will be sent to the remote via the control plane and we start testing them.

As those new paths become available, str0m will automatically nominate them in case the current one runs into an ICE timeout. Here is a screen-recording of the Linux CLI client where Session::refresh is triggered via the SIGHUP signal:

Screencast.from.2024-03-14.11-16-47.webm

Provides the infrastructure for: #4028.

vercel · 2024-03-13T01:51:35Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment

Name	Status	Preview	Comments	Updated (UTC)
firezone	⬜️ Ignored (Inspect)	Visit Preview		Mar 14, 2024 1:43am

github-actions · 2024-03-13T01:53:48Z

Terraform Cloud Plan Output

Plan: 8 to add, 7 to change, 8 to destroy.

Terraform Cloud Plan

github-actions · 2024-03-13T02:07:36Z

Performance Test Results

TCP

Test Name	Received/s	Sent/s	Retransmits
direct-tcp-client2server	204.3 MiB (+2%)	204.8 MiB (+2%)	247 (+42%)
direct-tcp-server2client	198.9 MiB (-2%)	199.9 MiB (-2%)	605 (-14%)
relayed-tcp-client2server	138.5 MiB (-2%)	139.2 MiB (-2%)	126 (-32%)
relayed-tcp-server2client	138.1 MiB (-1%)	138.5 MiB (-1%)	181 (+21%)

UDP

Test Name	Total/s	Jitter	Lost
direct-udp-client2server	50.0 MiB (-0%)	0.05ms (-20%)	0.00% (NaN%)
direct-udp-server2client	50.0 MiB (-0%)	0.03ms (+29%)	0.00% (NaN%)
relayed-udp-client2server	50.0 MiB (+0%)	0.11ms (-31%)	0.00% (NaN%)
relayed-udp-server2client	50.0 MiB (+0%)	0.06ms (-24%)	0.00% (NaN%)

ReactorScram · 2024-03-14T00:31:15Z

Will this be okay if it's called multiple times quickly? The network change detection on Windows is "bouncy". I could probably de-bounce it with a timer if needed, e.g. when we get an event, wait until 1 full second of no events, then tell connlib to refresh / reconnect

thomaseizinger · 2024-03-14T00:40:33Z

Will this be okay if it's called multiple times quickly? The network change detection on Windows is "bouncy". I could probably de-bounce it with a timer if needed, e.g. when we get an event, wait until 1 full second of no events, then tell connlib to refresh / reconnect

It should be okay, see the screen-recording above. We will reconnect to the portal each time it is called but that does not affect the data plane.

thomaseizinger · 2024-03-14T00:41:21Z

Will this be okay if it's called multiple times quickly? The network change detection on Windows is "bouncy". I could probably de-bounce it with a timer if needed, e.g. when we get an event, wait until 1 full second of no events, then tell connlib to refresh / reconnect

It should be okay, see the screen-recording above. We will reconnect to the portal each time it is called but that does not affect the data plane.

I would like it to be okay, i.e. if anything, connlib should de-bounce and not the apps.

ReactorScram

LGTM. Should I also call this when the DNS servers change? And after that next DNS refactor is ready, that will be replaced with some more specific "Update DNS only" command?

rust/connlib/snownet/src/node.rs

ReactorScram · 2024-03-14T00:44:30Z

rust/connlib/snownet/src/stun_binding.rs

+        let backoff = self
+            .backoff
+            .next_backoff()
+            .expect("to have backoff right after resetting");


Suggested change

.expect("to have backoff right after resetting");

.expect("should have backoff Instant right after resetting");

thomaseizinger · 2024-03-14T00:54:05Z

LGTM. Should I also call this when the DNS servers change? And after that next DNS refactor is ready, that will be replaced with some more specific "Update DNS only" command?

We will have a dedicated function to update DNS servers.

For background processes, it is common to have them reload their configuration upon SIGUP.

With the new `reconnect` command, clients can initiate this directly so we don't need to change the backoff.

conectado · 2024-03-14T18:29:31Z

rust/connlib/clients/shared/src/eventloop.rs

@@ -64,6 +65,12 @@ where
        loop {
            match self.rx.poll_recv(cx) {
                Poll::Ready(Some(Command::Stop)) | Poll::Ready(None) => return Poll::Ready(Ok(())),
+                Poll::Ready(Some(Command::Reconnect)) => {


I wonder if using a channel for this is the most convenient way to go about this.

We might want to use a different mechanism so that multiple reconnects aren't queued up I was thinking we can use a Notify, that way we don't need to worry about the bounded channel and there's no point on doing multiple reconnects in a row we want to just listen to the latest.

Yeah if they're guaranteed idempotent that would keep the channel from filling up. We did a similar thing for on_update_resources in the Tauri Client:

firezone/rust/gui-client/src-tauri/src/client/gui.rs

Line 484 in ffc034d

self.notify_controller.notify_one();

It was possible if I only allowed 5 items in the channel, and connlib rapidly sent on_update_resources events, that the channel might fill up and error (since it's not allowed to block the callbacks) before the GUI got around to dealing with them.

So the channel was replaced with a Notify and something that the reader can poll when it's notified, same as if it was a channel that dropped all but the most recent event.

I think we also considered tokio's watch and it wasn't a perfect fit. Notify has worked well.

Unfortunately, Notify doesn't have a poll API so it would be a bit clunky to use. If debouncing is what we want, then I can add a small delay to the sending of the command through the channel and cancel the current send if we get another one.

Huh, yeah it doesn't. And it can't be replicated with AtomicWaker?

hm, a size 1 channel would also achieve the same effect as Notify

Hm so it's a 1-sized channel that uses try_send for both commands, so if I did reconnect and stop in the same tick somehow, the stop will be silently ignored?

Yes. The Stop isn't super critical though. If you drop Session, the Runtime gets dropped and with it, all tasks should be stopped.

Even the Stop does not need a command, as much as a "I want you to be running / not be running" flag and a way to notify when it's changed.

With channels we have to trade off between 3 problems: Sender may block, sends may fail silently, or sends may panic.

Are you suggesting using shared memory instead of channels instead and just notify when to re-read the shared memory?

Kinda. AtomicBool, so not like it has to lock.
But I wrote my last comment at the same time as you wrote "The Stop isn't super critical", so it might just be something to put as an issue and merge this PR anyway

conectado

Looks good, just left some non-blocking comments

rust/linux-client/src/main.rs

thomaseizinger · 2024-03-14T23:23:52Z

Merging for now, we can always act upon #4116 (comment) in a follow-up.

thomaseizinger force-pushed the feat/connlib/reconnect-command branch from e4a77a8 to 86e6059 Compare March 13, 2024 03:33

thomaseizinger changed the base branch from refactor/connlib/commands to refactor/connlib/no-runtime March 13, 2024 03:33

thomaseizinger force-pushed the feat/connlib/reconnect-command branch from 86e6059 to 0700f2c Compare March 13, 2024 03:42

thomaseizinger force-pushed the refactor/connlib/no-runtime branch from ffd0375 to 3fe93bc Compare March 13, 2024 23:40

thomaseizinger force-pushed the feat/connlib/reconnect-command branch from 0700f2c to 4d15789 Compare March 13, 2024 23:40

Base automatically changed from refactor/connlib/no-runtime to main March 14, 2024 00:17

thomaseizinger force-pushed the feat/connlib/reconnect-command branch from 18c5800 to 0f4e2f0 Compare March 14, 2024 00:21

thomaseizinger marked this pull request as ready for review March 14, 2024 00:21

thomaseizinger requested review from ReactorScram, conectado and jamilbk and removed request for ReactorScram and conectado March 14, 2024 00:21

ReactorScram approved these changes Mar 14, 2024

View reviewed changes

thomaseizinger force-pushed the feat/connlib/reconnect-command branch from 0f4e2f0 to 487ce6e Compare March 14, 2024 00:49

thomaseizinger force-pushed the feat/connlib/reconnect-command branch from 7067deb to 487ce6e Compare March 14, 2024 01:01

thomaseizinger added 5 commits March 14, 2024 12:42

Allow PhoenixChannel to be re-connected

ad64b61

Split updating credentials from refresh

45a1005

Add StunBinding::refresh

2a65a3d

Implement reconnect command

42aec5b

Trigger Session::reconnect upon SIGHUP

5c88f72

For background processes, it is common to have them reload their configuration upon SIGUP.

thomaseizinger added 3 commits March 14, 2024 12:42

Remove shortened reconnect interval

2de5e55

With the new `reconnect` command, clients can initiate this directly so we don't need to change the backoff.

Don't perform an ICE restart

7cc0ff9

Add span to refresh function

796eed6

thomaseizinger force-pushed the feat/connlib/reconnect-command branch from 487ce6e to 796eed6 Compare March 14, 2024 01:42

conectado reviewed Mar 14, 2024

View reviewed changes

conectado approved these changes Mar 14, 2024

View reviewed changes

rust/linux-client/src/main.rs Show resolved Hide resolved

thomaseizinger added this pull request to the merge queue Mar 14, 2024

Merged via the queue into main with commit d092e22 Mar 14, 2024
134 checks passed

thomaseizinger deleted the feat/connlib/reconnect-command branch March 14, 2024 23:34

jamilbk mentioned this pull request Mar 15, 2024

Implement Session.update(has_internet: bool, dns_servers: [string]) #4028

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(connlib): introduce `Session::reconnect` #4116

feat(connlib): introduce `Session::reconnect` #4116

thomaseizinger commented Mar 13, 2024 •

edited

vercel bot commented Mar 13, 2024 •

edited

github-actions bot commented Mar 13, 2024 •

edited

github-actions bot commented Mar 13, 2024 •

edited

ReactorScram commented Mar 14, 2024

thomaseizinger commented Mar 14, 2024

thomaseizinger commented Mar 14, 2024

ReactorScram left a comment

ReactorScram Mar 14, 2024

thomaseizinger commented Mar 14, 2024

conectado Mar 14, 2024

ReactorScram Mar 14, 2024

thomaseizinger Mar 14, 2024

ReactorScram Mar 14, 2024

conectado Mar 14, 2024

ReactorScram Mar 14, 2024

thomaseizinger Mar 14, 2024

ReactorScram Mar 14, 2024

thomaseizinger Mar 14, 2024

ReactorScram Mar 14, 2024

conectado left a comment

thomaseizinger commented Mar 14, 2024

	.expect("to have backoff right after resetting");
	.expect("should have backoff Instant right after resetting");

feat(connlib): introduce Session::reconnect #4116

feat(connlib): introduce Session::reconnect #4116

Conversation

thomaseizinger commented Mar 13, 2024 • edited

vercel bot commented Mar 13, 2024 • edited

github-actions bot commented Mar 13, 2024 • edited

Terraform Cloud Plan Output

github-actions bot commented Mar 13, 2024 • edited

Performance Test Results

TCP

UDP

ReactorScram commented Mar 14, 2024

thomaseizinger commented Mar 14, 2024

thomaseizinger commented Mar 14, 2024

ReactorScram left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thomaseizinger commented Mar 14, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

conectado left a comment

Choose a reason for hiding this comment

thomaseizinger commented Mar 14, 2024

feat(connlib): introduce `Session::reconnect` #4116

feat(connlib): introduce `Session::reconnect` #4116

thomaseizinger commented Mar 13, 2024 •

edited

vercel bot commented Mar 13, 2024 •

edited

github-actions bot commented Mar 13, 2024 •

edited

github-actions bot commented Mar 13, 2024 •

edited