-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(connlib): introduce Session::reconnect
#4116
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
Terraform Cloud Plan Output
|
Performance Test ResultsTCP
UDP
|
e4a77a8
to
86e6059
Compare
86e6059
to
0700f2c
Compare
ffd0375
to
3fe93bc
Compare
0700f2c
to
4d15789
Compare
18c5800
to
0f4e2f0
Compare
Will this be okay if it's called multiple times quickly? The network change detection on Windows is "bouncy". I could probably de-bounce it with a timer if needed, e.g. when we get an event, wait until 1 full second of no events, then tell connlib to refresh / reconnect |
It should be okay, see the screen-recording above. We will reconnect to the portal each time it is called but that does not affect the data plane. |
I would like it to be okay, i.e. if anything, connlib should de-bounce and not the apps. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Should I also call this when the DNS servers change? And after that next DNS refactor is ready, that will be replaced with some more specific "Update DNS only" command?
let backoff = self | ||
.backoff | ||
.next_backoff() | ||
.expect("to have backoff right after resetting"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.expect("to have backoff right after resetting"); | |
.expect("should have backoff Instant right after resetting"); |
0f4e2f0
to
487ce6e
Compare
We will have a dedicated function to update DNS servers. |
7067deb
to
487ce6e
Compare
For background processes, it is common to have them reload their configuration upon SIGUP.
With the new `reconnect` command, clients can initiate this directly so we don't need to change the backoff.
487ce6e
to
796eed6
Compare
@@ -64,6 +65,12 @@ where | |||
loop { | |||
match self.rx.poll_recv(cx) { | |||
Poll::Ready(Some(Command::Stop)) | Poll::Ready(None) => return Poll::Ready(Ok(())), | |||
Poll::Ready(Some(Command::Reconnect)) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if using a channel for this is the most convenient way to go about this.
We might want to use a different mechanism so that multiple reconnects
aren't queued up I was thinking we can use a Notify
, that way we don't need to worry about the bounded channel and there's no point on doing multiple reconnects in a row we want to just listen to the latest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah if they're guaranteed idempotent that would keep the channel from filling up. We did a similar thing for on_update_resources
in the Tauri Client:
self.notify_controller.notify_one(); |
It was possible if I only allowed 5 items in the channel, and connlib rapidly sent on_update_resources
events, that the channel might fill up and error (since it's not allowed to block the callbacks) before the GUI got around to dealing with them.
So the channel was replaced with a Notify
and something that the reader can poll when it's notified, same as if it was a channel that dropped all but the most recent event.
I think we also considered tokio's watch
and it wasn't a perfect fit. Notify
has worked well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, Notify
doesn't have a poll
API so it would be a bit clunky to use. If debouncing is what we want, then I can add a small delay to the sending of the command through the channel and cancel the current send if we get another one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, yeah it doesn't. And it can't be replicated with AtomicWaker
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, a size 1 channel would also achieve the same effect as Notify
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm so it's a 1-sized channel that uses try_send
for both commands, so if I did reconnect
and stop
in the same tick somehow, the stop
will be silently ignored?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The Stop
isn't super critical though. If you drop Session
, the Runtime
gets dropped and with it, all tasks should be stopped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even the Stop
does not need a command, as much as a "I want you to be running / not be running" flag and a way to notify when it's changed.
With channels we have to trade off between 3 problems: Sender may block, sends may fail silently, or sends may panic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting using shared memory instead of channels instead and just notify when to re-read the shared memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kinda. AtomicBool
, so not like it has to lock.
But I wrote my last comment at the same time as you wrote "The Stop isn't super critical", so it might just be something to put as an issue and merge this PR anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just left some non-blocking comments
Merging for now, we can always act upon #4116 (comment) in a follow-up. |
I ended up calling it
reconnect
because that is really what we are doing:I decided not to use an ICE restart. An ICE restart clears the local as well as the remote credentials, meaning we would need to run another instance of the signalling protocol. The current control plane does not support this and it is also unnecessary in our situation. In the case of an actual network change (e.g. WiFI to cellular), refreshing of the allocations will turn up new candidates as that is how we discovered our original ones in the first place. Because we constantly operate in ICE trickle mode, those will be sent to the remote via the control plane and we start testing them.
As those new paths become available, str0m will automatically nominate them in case the current one runs into an ICE timeout. Here is a screen-recording of the Linux CLI client where
Session::refresh
is triggered via the SIGHUP signal:Screencast.from.2024-03-14.11-16-47.webm
Provides the infrastructure for: #4028.