-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(android): detect network and dns changes and send them to connlib #4163
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
b95e428
to
021c116
Compare
Terraform Cloud Plan Output
|
Performance Test ResultsTCP
UDP
|
kotlin/android/app/src/main/java/dev/firezone/android/tunnel/NetworkMonitor.kt
Outdated
Show resolved
Hide resolved
@@ -69,10 +77,14 @@ impl Session { | |||
/// | |||
/// In case of destructive network state changes, i.e. the user switched from wifi to cellular, | |||
/// reconnect allows connlib to re-establish connections faster because we don't have to wait for timeouts first. | |||
pub fn reconnect(&mut self) { | |||
pub fn reconnect(&self) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is mut
we could break aliasing rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have some code style suggestions for Kotlin, can't really comment on the Rust changes so I'll let @thomaseizinger and @ReactorScram approve those.
kotlin/android/app/src/main/java/dev/firezone/android/tunnel/NetworkMonitor.kt
Outdated
Show resolved
Hide resolved
val connectivityManager = | ||
getSystemService(ConnectivityManager::class.java) as ConnectivityManager | ||
connectivityManager.unregisterNetworkCallback(networkCallback!!) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
val connectivityManager = | |
getSystemService(ConnectivityManager::class.java) as ConnectivityManager | |
connectivityManager.unregisterNetworkCallback(networkCallback!!) | |
networkCallback?.let { | |
val connectivityManager = | |
getSystemService(ConnectivityManager::class.java) as ConnectivityManager | |
connectivityManager.unregisterNetworkCallback(networkCallback) | |
networkCallback = null | |
} |
I think I assumed that shutdown()
would be safe to call idempotently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or better yet move this to a helper function, stopNetworkMonitoring(connlibSessionPtr)
like the Apple client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would we want to call shutdown
multiple times in a row?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like shutdown
should only be called when the tunnel is no longer used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't, but there's an edge case if onDisconnect and the user disconnect at the same time.
The system can shut us down too but I don't think that calls disconnect.
Just trying to avoid crashing in our only cleanup handler is all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see, then we probably want to change disconnect
to not use connlibSessionPtr!!
, I can do that on this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed it here: 0ce223b lmk if I should roll the change on disconnect
back 😃
kotlin/android/app/src/main/java/dev/firezone/android/tunnel/TunnelService.kt
Outdated
Show resolved
Hide resolved
Having problems to reproduce the CI errors locally. |
if let Some(mut dns_update) = self.dns_update.take() { | ||
match dns_update.poll_unpin(cx) { | ||
Poll::Ready(dns) => { | ||
self.tunnel.set_dns(dns); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather have Tunnel::set_dns
debounce internally than out here. We can just capture the timestamp of now
and delay acting ln it in the ClientState
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That will allows us to unit-test the debouncing and it doesn't require messing around with futures. All we need is an Option
with an Instant
and act on it in handle_timeout
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the downside with this approach is that handle_timeout
would need to call into Tunnel
to call update_interface
, so we would need to return some event from handle_timeout
, if that's acceptable I almost have an implementation for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
handle_timeout
shouldn't return anything, instead we should queue an event of some kind that is then consumed by the Tunnel
so it can update its IO state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this part of the Android PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're moving to set_dns
otherwise, this would break all other platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah maybe it would be better to land all of these into a staging branch? Otherwise main is gonna go wonky for a bit while we test manually.
Just an idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to merge it like this so we have everything on main. This pr should work a-ok like this for all platforms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we could extract the set_dns
to its own PR but that is quite a lot of work at this point
channel: tokio::sync::mpsc::Sender<Command>, | ||
dns_updater: tokio::sync::mpsc::UnboundedSender<Vec<IpAddr>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have two channels? Can we just make the command one unbounded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just didn't want to make the channel for the other events unbounded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not? It is not like the app will memory DoS connlib with a million commands that we can't process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it won't, then set the limit to a million and panic if the impossible happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also it could be a watch
or Notify+ArcSwap or something, right? No question of memory leaks when out-of-date DNS server vecs don't even matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bounded channels normally preemptively allocate enough to handle the worst case scenario, I don't think it'd make sense to set it to a million.
Also I think it'll look a bit weird and could throw someone off when debugging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also it could be a watch or Notify+ArcSwap or something, right? No question of memory leaks when out-of-date DNS server vecs don't even matter.
I think this is what we discussed about having dumb plumbing, it's more logic that we can move inside the tunnel.
We could go with watch
without the logic to notify if updated but that will also look a bit weird, having the notify/debounce logic distributed between the session and the tunnel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are also only shipping around pointers or ZST here so the memory footprint should be tiny.
Lets set the bound to a 100 and panic on error. It should never be full in normal operation and if it is, that is a bug.
It will not be disconnected unless the app calls commands after on_disconnect
which is also a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will not be disconnected unless the app calls commands after on_disconnect which is also a bug.
This can also happen if the user disconnects manually at the same time on_disconnect
is called, super unlikely but possible.
let _ = self.channel.try_send(Command::Reconnect); | ||
} | ||
|
||
pub fn set_dns(&self, new_dns: Vec<IpAddr>) { | ||
self.dns_updater.send(new_dns).expect("Developer error: As long as the session is up the dns update receiver must not be dropped"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be hit if the app calls connlib after it emitted on_disconnect
. I don't think we should panic here but just drop the message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can make it a debug_assert if you want but there is no need to crash a production app I'd say.
rust/connlib/tunnel/src/client.rs
Outdated
== HashSet::<&IpAddr>::from_iter(new_dns.iter()) | ||
}) | ||
{ | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a debug log here.
callback_handler | ||
.get_system_default_resolvers() | ||
.expect("We expect to get the system's DNS") | ||
.unwrap_or_default(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, can we avoid calling our own callbacks and instead initialize it with the data that the callbacks return?
rust/connlib/tunnel/src/client.rs
Outdated
let Some(system_resolvers) = self.role_state.system_resolvers.as_ref().cloned() else { | ||
return Ok(()); | ||
}; | ||
|
||
let effective_dns_servers = | ||
effective_dns_servers(config.upstream_dns.clone(), system_resolvers); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this duplicated state? If our effective DNS servers contain the system resolves, why can't we just check our current DNS mapping instead of tracking the system resolvers separately?
We are already tracking so much state, we should avoid redundancy as much as we can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because, if we have some upstream dns the effective dns doesn't contain the system's resolver
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we unit test this somehow? Seems fragile to me :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to stop reaching into fields of ClientState
from the Tunnel
and make proper APIs that manipulate ClientState
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we unit test this somehow? Seems fragile to me :)
I can add some unit-tests for effective_dns_servers
but is that what you want to unit-test?
We need to stop reaching into fields of ClientState from the Tunnel and make proper APIs that manipulate ClientState.
We can add a ClientState::get_dns
that returns this effective_dns_servers
since both part of this function are stored there, what do you thinka bout that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I'd like to test for example is that setting the same DNS servers is a no-op.
For example, we could make a function that given a set of DNS servers outputs an Option
of dns_mapping
and only update Io
if that is Some
. Maybe there is more stuff we can do, just need to set down and de-couple the side-effects from the state management and translate it into values that we pass around and then can write tests against.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accidentally made separate comments than a full review. I have some concerns in how some things are implemented, esp. in regards to redundant state in ClientState
and the duplicated channel.
904c265
to
9279de2
Compare
I'll try to review this today in-between sessions. |
14cc68a
to
3326eda
Compare
309375e
to
56d7ded
Compare
3326eda
to
cb1fb29
Compare
cb1fb29
to
bf66d65
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand the Android part but the FFI looks clean :)
_: JClass, | ||
session: *const SessionWrapper, | ||
) { | ||
(*session).inner.reconnect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiousity, what happens if we are passed a null-ptr? Will we segfault?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we somehow signal this via JNIEnv
? Like throw an exception in Android?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiousity, what happens if we are passed a null-ptr? Will we segfault?
It's UB but normally we will segfault
Can we somehow signal this via JNIEnv? Like throw an exception in Android?
I'm not sure, I'd need to research it
This completely removes the
get_system_default_resolvers
for android