feat(android): detect network and dns changes and send them to connlib #4163

conectado · 2024-03-15T22:59:08Z

This completely removes the get_system_default_resolvers for android

vercel · 2024-03-15T22:59:14Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment

Name	Status	Preview	Comments	Updated (UTC)
firezone	⬜️ Ignored (Inspect)	Visit Preview		Mar 21, 2024 1:47am

github-actions · 2024-03-18T21:21:23Z

Terraform Cloud Plan Output

Plan: 9 to add, 8 to change, 9 to destroy.

Terraform Cloud Plan

github-actions · 2024-03-19T05:35:12Z

Performance Test Results

TCP

Test Name	Received/s	Sent/s	Retransmits
direct-tcp-client2server	221.8 MiB (+1%)	224.0 MiB (+1%)	246 (+64%)
direct-tcp-server2client	230.9 MiB (+2%)	232.3 MiB (+2%)	117 (-75%)
relayed-tcp-client2server	148.2 MiB (+1%)	149.0 MiB (+1%)	188 (+23%)
relayed-tcp-server2client	152.3 MiB (+2%)	152.8 MiB (+2%)	217 (+10%)

UDP

Test Name	Total/s	Jitter	Lost
direct-udp-client2server	50.0 MiB (+0%)	0.29ms (+717%)	0.00% (NaN%)
direct-udp-server2client	50.0 MiB (+0%)	0.01ms (-30%)	0.00% (NaN%)
relayed-udp-client2server	50.0 MiB (+0%)	0.17ms (+42%)	0.00% (NaN%)
relayed-udp-server2client	50.0 MiB (+0%)	0.07ms (+30%)	0.00% (NaN%)

rust/connlib/snownet/src/node.rs

rust/connlib/tunnel/src/io.rs

kotlin/android/app/src/main/java/dev/firezone/android/tunnel/NetworkMonitor.kt

rust/connlib/clients/shared/src/eventloop.rs

conectado · 2024-03-19T23:20:13Z

rust/connlib/clients/shared/src/lib.rs

@@ -69,10 +77,14 @@ impl Session {
    ///
    /// In case of destructive network state changes, i.e. the user switched from wifi to cellular,
    /// reconnect allows connlib to re-establish connections faster because we don't have to wait for timeouts first.
-    pub fn reconnect(&mut self) {
+    pub fn reconnect(&self) {


If this is mut we could break aliasing rules.

rust/linux-client/src/main.rs

jamilbk

Have some code style suggestions for Kotlin, can't really comment on the Rust changes so I'll let @thomaseizinger and @ReactorScram approve those.

kotlin/android/app/src/main/java/dev/firezone/android/tunnel/NetworkMonitor.kt

jamilbk · 2024-03-19T23:25:42Z

kotlin/android/app/src/main/java/dev/firezone/android/tunnel/TunnelService.kt

+        val connectivityManager =
+            getSystemService(ConnectivityManager::class.java) as ConnectivityManager
+        connectivityManager.unregisterNetworkCallback(networkCallback!!)


Suggested change

val connectivityManager =

getSystemService(ConnectivityManager::class.java) as ConnectivityManager

connectivityManager.unregisterNetworkCallback(networkCallback!!)

networkCallback?.let {

val connectivityManager =

getSystemService(ConnectivityManager::class.java) as ConnectivityManager

connectivityManager.unregisterNetworkCallback(networkCallback)

networkCallback = null

}

I think I assumed that shutdown() would be safe to call idempotently.

Or better yet move this to a helper function, stopNetworkMonitoring(connlibSessionPtr) like the Apple client.

why would we want to call shutdown multiple times in a row?

I feel like shutdown should only be called when the tunnel is no longer used.

We don't, but there's an edge case if onDisconnect and the user disconnect at the same time.

The system can shut us down too but I don't think that calls disconnect.

Just trying to avoid crashing in our only cleanup handler is all.

Ah, I see, then we probably want to change disconnect to not use connlibSessionPtr!!, I can do that on this PR

changed it here: 0ce223b lmk if I should roll the change on disconnect back 😃

kotlin/android/app/src/main/java/dev/firezone/android/tunnel/TunnelService.kt

rust/connlib/clients/shared/src/eventloop.rs

conectado · 2024-03-20T01:29:13Z

Having problems to reproduce the CI errors locally.

thomaseizinger · 2024-03-19T23:21:08Z

rust/connlib/clients/shared/src/eventloop.rs

+            if let Some(mut dns_update) = self.dns_update.take() {
+                match dns_update.poll_unpin(cx) {
+                    Poll::Ready(dns) => {
+                        self.tunnel.set_dns(dns);


I'd rather have Tunnel::set_dns debounce internally than out here. We can just capture the timestamp of now and delay acting ln it in the ClientState.

That will allows us to unit-test the debouncing and it doesn't require messing around with futures. All we need is an Option with an Instant and act on it in handle_timeout.

the downside with this approach is that handle_timeout would need to call into Tunnel to call update_interface, so we would need to return some event from handle_timeout, if that's acceptable I almost have an implementation for this.

handle_timeout shouldn't return anything, instead we should queue an event of some kind that is then consumed by the Tunnel so it can update its IO state.

thomaseizinger · 2024-03-20T01:40:56Z

rust/connlib/clients/apple/src/lib.rs

Why is this part of the Android PR?

we're moving to set_dns otherwise, this would break all other platforms.

Yeah maybe it would be better to land all of these into a staging branch? Otherwise main is gonna go wonky for a bit while we test manually.

Just an idea

I think it is better to merge it like this so we have everything on main. This pr should work a-ok like this for all platforms

I think that we could extract the set_dns to its own PR but that is quite a lot of work at this point

thomaseizinger · 2024-03-20T01:42:28Z

rust/connlib/clients/shared/src/lib.rs

    channel: tokio::sync::mpsc::Sender<Command>,
+    dns_updater: tokio::sync::mpsc::UnboundedSender<Vec<IpAddr>>,


Why have two channels? Can we just make the command one unbounded?

I just didn't want to make the channel for the other events unbounded

Why not? It is not like the app will memory DoS connlib with a million commands that we can't process.

If it won't, then set the limit to a million and panic if the impossible happens?

Also it could be a watch or Notify+ArcSwap or something, right? No question of memory leaks when out-of-date DNS server vecs don't even matter.

Bounded channels normally preemptively allocate enough to handle the worst case scenario, I don't think it'd make sense to set it to a million.

Also I think it'll look a bit weird and could throw someone off when debugging

Also it could be a watch or Notify+ArcSwap or something, right? No question of memory leaks when out-of-date DNS server vecs don't even matter.

I think this is what we discussed about having dumb plumbing, it's more logic that we can move inside the tunnel.

We could go with watch without the logic to notify if updated but that will also look a bit weird, having the notify/debounce logic distributed between the session and the tunnel.

We are also only shipping around pointers or ZST here so the memory footprint should be tiny.

Lets set the bound to a 100 and panic on error. It should never be full in normal operation and if it is, that is a bug.

It will not be disconnected unless the app calls commands after on_disconnect which is also a bug.

It will not be disconnected unless the app calls commands after on_disconnect which is also a bug.

This can also happen if the user disconnects manually at the same time on_disconnect is called, super unlikely but possible.

thomaseizinger · 2024-03-20T01:43:31Z

rust/connlib/clients/shared/src/lib.rs

        let _ = self.channel.try_send(Command::Reconnect);
    }

+    pub fn set_dns(&self, new_dns: Vec<IpAddr>) {
+        self.dns_updater.send(new_dns).expect("Developer error: As long as the session is up the dns update receiver must not be dropped");


This can be hit if the app calls connlib after it emitted on_disconnect. I don't think we should panic here but just drop the message.

You can make it a debug_assert if you want but there is no need to crash a production app I'd say.

thomaseizinger · 2024-03-20T01:46:18Z

rust/connlib/tunnel/src/client.rs

+                    == HashSet::<&IpAddr>::from_iter(new_dns.iter())
+            })
+        {
+            return;


Add a debug log here.

thomaseizinger · 2024-03-20T01:49:53Z

rust/gui-client/src-tauri/src/client/gui.rs

+            callback_handler
+                .get_system_default_resolvers()
+                .expect("We expect to get the system's DNS")
+                .unwrap_or_default(),


Same here, can we avoid calling our own callbacks and instead initialize it with the data that the callbacks return?

thomaseizinger · 2024-03-20T01:52:15Z

rust/connlib/tunnel/src/client.rs

+        let Some(system_resolvers) = self.role_state.system_resolvers.as_ref().cloned() else {
+            return Ok(());
+        };
+
+        let effective_dns_servers =
+            effective_dns_servers(config.upstream_dns.clone(), system_resolvers);


Isn't this duplicated state? If our effective DNS servers contain the system resolves, why can't we just check our current DNS mapping instead of tracking the system resolvers separately?

We are already tracking so much state, we should avoid redundancy as much as we can.

because, if we have some upstream dns the effective dns doesn't contain the system's resolver

Can we unit test this somehow? Seems fragile to me :)

We need to stop reaching into fields of ClientState from the Tunnel and make proper APIs that manipulate ClientState.

Can we unit test this somehow? Seems fragile to me :)

I can add some unit-tests for effective_dns_servers but is that what you want to unit-test?

We need to stop reaching into fields of ClientState from the Tunnel and make proper APIs that manipulate ClientState.

We can add a ClientState::get_dns that returns this effective_dns_servers since both part of this function are stored there, what do you thinka bout that?

What I'd like to test for example is that setting the same DNS servers is a no-op.

For example, we could make a function that given a set of DNS servers outputs an Option of dns_mapping and only update Io if that is Some. Maybe there is more stuff we can do, just need to set down and de-couple the side-effects from the state management and translate it into values that we pass around and then can write tests against.

thomaseizinger

Accidentally made separate comments than a full review. I have some concerns in how some things are implemented, esp. in regards to redundant state in ClientState and the duplicated channel.

… changes (#4222) found while working on #4163

scripts/tests/direct-curl-portal-down.sh

rust/connlib/clients/shared/src/eventloop.rs

thomaseizinger · 2024-03-20T22:36:08Z

I'll try to review this today in-between sessions.

) ref: #4018 extracted from #4163

…ing it via callback (#4240) Extracted from #4163 Dependant PRs: #4198 #4133 #4163

thomaseizinger

I don't really understand the Android part but the FFI looks clean :)

thomaseizinger · 2024-03-21T02:05:36Z

rust/connlib/clients/android/src/lib.rs

+    _: JClass,
+    session: *const SessionWrapper,
+) {
+    (*session).inner.reconnect();


Out of curiousity, what happens if we are passed a null-ptr? Will we segfault?

Can we somehow signal this via JNIEnv? Like throw an exception in Android?

Out of curiousity, what happens if we are passed a null-ptr? Will we segfault?

It's UB but normally we will segfault

Can we somehow signal this via JNIEnv? Like throw an exception in Android?

I'm not sure, I'd need to research it

jamilbk changed the title ~~feat(kotlin): detect network and dns changes~~ feat(android): detect network and dns changes Mar 15, 2024

conectado force-pushed the feat/android-react-to-network branch from b95e428 to 021c116 Compare March 18, 2024 21:19

ReactorScram mentioned this pull request Mar 19, 2024

feat(windows): listen for DNS change events #4198

Merged