feat(snownet): minimize delay when roaming #4246

thomaseizinger · 2024-03-21T13:45:58Z

Currently, we need to wait for the timeout of the current candidate pair during reconnect before we nominate a new one. To speed this up, we can preemptively invalidate candidates we have previously discovered via our Allocations, i.e. relay candidates and srflx candidates.

vercel · 2024-03-21T13:46:04Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment

Name	Status	Preview	Comments	Updated (UTC)
firezone	⬜️ Ignored (Inspect)	Visit Preview		Mar 22, 2024 1:18am

thomaseizinger · 2024-03-21T13:46:54Z

This is still a draft because there is an edge-case in which we immediately go into Disconnected as a result of invalidating the candidates. Relevant discussion is here: algesten/str0m#486 (comment)

github-actions · 2024-03-21T13:48:32Z

Terraform Cloud Plan Output

Plan: 9 to add, 8 to change, 0 to destroy.

Terraform Cloud Plan

github-actions · 2024-03-21T13:55:21Z

Performance Test Results

TCP

Test Name	Received/s	Sent/s	Retransmits
direct-tcp-client2server	222.2 MiB (+0%)	223.7 MiB (+0%)	219 (+59%)
direct-tcp-server2client	225.4 MiB (-0%)	226.8 MiB (-1%)	174 (-75%)
relayed-tcp-client2server	143.0 MiB (-5%)	143.7 MiB (-5%)	153 (+3%)
relayed-tcp-server2client	155.5 MiB (-1%)	155.8 MiB (-1%)	177 (+2%)

UDP

Test Name	Total/s	Jitter	Lost
direct-udp-client2server	50.0 MiB (+0%)	0.19ms (+583%)	0.00% (NaN%)
direct-udp-server2client	50.0 MiB (+0%)	0.01ms (-3%)	0.00% (NaN%)
relayed-udp-client2server	50.0 MiB (+0%)	0.18ms (+150%)	0.00% (NaN%)
relayed-udp-server2client	50.0 MiB (+0%)	0.05ms (-9%)	0.00% (NaN%)

thomaseizinger · 2024-03-21T14:27:15Z

@firezone/engineering I am getting pretty good results for the downtime when switching networks with this PR. I'd appreciate some testing on other platforms. My test setup was:

Laptop is connected to hotspot from phone
Phone has cellular & WiFi
Continuously ping a resource
Toggle WiFi on and off (I also learned today that Android can hotspot AND use the wifi it is connected to for Internet!)
Send SIGHUP to firezone-linux-client after toggling the connection

I am getting a downtime of about 5 seconds before the pings resume:

64 bytes from 10.0.32.101: icmp_seq=7 ttl=62 time=239 ms
64 bytes from 10.0.32.101: icmp_seq=8 ttl=62 time=238 ms
64 bytes from 10.0.32.101: icmp_seq=9 ttl=62 time=240 ms
64 bytes from 10.0.32.101: icmp_seq=14 ttl=62 time=252 ms
64 bytes from 10.0.32.101: icmp_seq=15 ttl=62 time=256 ms
64 bytes from 10.0.32.101: icmp_seq=16 ttl=62 time=277 ms

Note that I also didn't send the SIGHUP signal instantly, but probably had like a second or even a bit more delay so from the time the app receives the signal until it has a working connection again it might only be 4 seconds or something.

jamilbk · 2024-03-21T15:24:31Z

Nice!

@thomaseizinger I can test this on Apple once #4133 is nearly finished, so blocked on that atm.

ReactorScram · 2024-03-21T16:59:37Z

On Windows 6905491

Had the laptop on my home Wi-Fi
Pinged ifconfig.net a few times
Switched to the iPhone's hotspot
Couldn't ping ifconfig.net for a very long time, was getting "Error with the DNS fallback lookup"
It started pinging again eventually
Turned off iPhone's Wi-Fi - No change. Maybe it was using cellular for the hotspot
Turned iPhone's Wi-Fi back on - Got a warning about "This will disconnect hotspot users"
The laptop eventually reconnects to the iPhone

Ping logs during that reconnection:

Reply from 172.67.199.190: bytes=32 time=76ms TTL=53
Request timed out.
Request timed out.
General failure.
Request timed out.
Request timed out.
Reply from 172.67.199.190: bytes=32 time=38ms TTL=53

Not sure if I tested the right thing.

thomaseizinger · 2024-03-21T20:43:48Z

@ReactorScram That seems like a plausible test and something users might do.

Just for science reasons, can you try other ways of changing your observed public address?

Moving your laptop from one WiFi with working Internet to another should also do the trick for example. Does connlib disconnect its connection at some point? i.e. do you ever see "ICE timeout" in the logs?

thomaseizinger · 2024-03-21T20:45:57Z

Also, is "reconnect" triggered correctly as a result? You can identify that by seeing "Connected to the portal" in the logs and "Allocation mismatch".

thomaseizinger · 2024-03-21T23:31:31Z

I've now patches our str0m fork to do what I want to do for this. We may have to come back for a different solution eventually if we can't get this upstreamed. For now, this will do and makes reconnect more stable because we don't trigger a connection failure which results in all sorts of state like cached DNS queries etc to be cleared.

thomaseizinger · 2024-03-21T23:32:24Z

For reference, this is the patch that is now included: algesten/str0m#489

conectado · 2024-03-22T01:03:07Z

Just tested this PR on Android

Switching networks with no noticeable downtime now 🚀 🎉

The way I tested it, load ifconfig.net switch networks, then load it again and it does immediately, no matter how many times we switch networks.

What I'd like to test next is some long-lived connection but I assume that it will work a-ok, I will make some setup to test this on android

thomaseizinger · 2024-03-22T01:04:47Z

What I'd like to test next is some long-lived connection but I assume that it will work a-ok, I will make some setup to test this on android

Can we deploy some server to download a large file?

conectado · 2024-03-22T01:18:49Z

What I'd like to test next is some long-lived connection but I assume that it will work a-ok, I will make some setup to test this on android

Can we deploy some server to download a large file?

@bmanifold might have an idea on how to do that

jamilbk · 2024-03-22T01:19:57Z

What I'd like to test next is some long-lived connection but I assume that it will work a-ok, I will make some setup to test this on android

Can we deploy some server to download a large file?

How big of a file? We already have GitHub -- we can add a repo with LFS support which supports up to 5 GB.

conectado

Even if we depend on a fork I think it's good to merge this ASAP since it improves UX a lot

jamilbk · 2024-03-22T01:21:41Z

@thomaseizinger You can also use ionice to make downloads last longer:

/usr/bin/ionice -c2 -n7 rsync \
-bwlimit=1000 /path/to/source /path/to/dest/

thomaseizinger · 2024-03-22T01:22:10Z

What I'd like to test next is some long-lived connection but I assume that it will work a-ok, I will make some setup to test this on android

Can we deploy some server to download a large file?

How big of a file? We already have GitHub -- we can add a repo with LFS support which supports up to 5 GB.

I guess we could also test using a speedtest?

thomaseizinger · 2024-03-22T01:23:29Z

Even if we depend on a fork I think it's good to merge this ASAP since it improves UX a lot

We already did depend on a fork :)

conectado · 2024-03-22T01:24:47Z

What I'd like to test next is some long-lived connection but I assume that it will work a-ok, I will make some setup to test this on android

Can we deploy some server to download a large file?

How big of a file? We already have GitHub -- we can add a repo with LFS support which supports up to 5 GB.

I guess we could also test using a speedtest?

Just tested with speed.cloudflare.com and it seems to resume connection reliably

But I think something that we deploy ourselves would be better, otherwise I can't be sure we're connected through firezone

AndrewDryga · 2024-03-22T01:50:42Z

We can upload a file to Google Cloud Storage, but you can also find a large docker image and pull it - free large binary hosting :).

bmanifold · 2024-03-22T02:37:36Z

What I'd like to test next is some long-lived connection but I assume that it will work a-ok, I will make some setup to test this on android

Can we deploy some server to download a large file?

@bmanifold might have an idea on how to do that

Multiple good ideas have been thrown out, but as another suggestion, if we don't care about what the file actually is, I just threw together a quick PoC in golang that creates an HTTP server that will generate an arbitrarily large random file and stream the result so that it doesn't take up any disk space or much memory on the server. I just used my personal laptop to make a curl request to download a 1GB file from my work laptop and it worked fine.

All we'd need to do from there is create a docker container and then either use it in CI or we could also run it in AWS/GCP if needed. That said, we may want to be careful of downloading large files from AWS/GCP VMs outside of the AWS/GCP network as we could run up our network traffic bill.

AndrewDryga · 2024-03-22T05:11:12Z

@bmanifold I have a small coding challenge for you. Use Elixirs Plug to stream endless random bytes :). You can see an example in our CSV export

thomaseizinger · 2024-03-22T05:54:12Z

What I'd like to test next is some long-lived connection but I assume that it will work a-ok, I will make some setup to test this on android

Can we deploy some server to download a large file?

@bmanifold might have an idea on how to do that

Multiple good ideas have been thrown out, but as another suggestion, if we don't care about what the file actually is, I just threw together a quick PoC in golang that creates an HTTP server that will generate an arbitrarily large random file and stream the result so that it doesn't take up any disk space or much memory on the server. I just used my personal laptop to make a curl request to download a 1GB file from my work laptop and it worked fine.

I built something similar for the integration test I want to write 😁

We could actually deploy that to staging too, hadn't thought of that!

thomaseizinger · 2024-03-22T05:56:46Z

Lets hold off on this for now, I want to see if my integration tests passes with all the fixes we are putting in. Once that is done, we can think about deploying such a dummy server to staging to do more testing :)

thomaseizinger · 2024-03-22T05:57:33Z

This has been validated on multiple systems so I am going ahead and merge it.

bmanifold · 2024-03-22T14:50:49Z

@bmanifold I have a small coding challenge for you. Use Elixirs Plug to stream endless random bytes :). You can see an example in our CSV export

😄 I like it. I might try that this weekend.

ReactorScram · 2024-03-22T14:54:43Z

I know for sure you can do it in a screenful of hyper :P https://six-five-six-four.com/git/reactor/ptth/src/commit/5965a76c7549643d471d4637b55245f1b37cb32c/crates/ptth_relay/src/lib.rs#L480-L526

jamilbk · 2024-03-22T15:36:25Z

Building a new Android client with this.

thomaseizinger marked this pull request as ready for review March 22, 2024 01:16

thomaseizinger added 3 commits March 22, 2024 11:18

Invalidate previous candidates upon allocation refresh

dac8ad9

Increase max pairs to avoid dropping useful ones during roaming

3cdb6f6

Bump str0m dependency to include latest patches

069e772

thomaseizinger force-pushed the feat/connlib/faster-reconnect branch from 7e575a3 to 069e772 Compare March 22, 2024 01:18

conectado approved these changes Mar 22, 2024

View reviewed changes

thomaseizinger added this pull request to the merge queue Mar 22, 2024

Merged via the queue into main with commit 3fe8f6d Mar 22, 2024
138 checks passed

thomaseizinger deleted the feat/connlib/faster-reconnect branch March 22, 2024 06:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(snownet): minimize delay when roaming #4246

feat(snownet): minimize delay when roaming #4246

thomaseizinger commented Mar 21, 2024

vercel bot commented Mar 21, 2024 •

edited

Loading

thomaseizinger commented Mar 21, 2024

github-actions bot commented Mar 21, 2024 •

edited

Loading

github-actions bot commented Mar 21, 2024 •

edited

Loading

thomaseizinger commented Mar 21, 2024

jamilbk commented Mar 21, 2024

ReactorScram commented Mar 21, 2024

thomaseizinger commented Mar 21, 2024

thomaseizinger commented Mar 21, 2024

thomaseizinger commented Mar 21, 2024

thomaseizinger commented Mar 21, 2024

conectado commented Mar 22, 2024

thomaseizinger commented Mar 22, 2024

conectado commented Mar 22, 2024 •

edited

Loading

jamilbk commented Mar 22, 2024

conectado left a comment

jamilbk commented Mar 22, 2024

thomaseizinger commented Mar 22, 2024

thomaseizinger commented Mar 22, 2024

conectado commented Mar 22, 2024

AndrewDryga commented Mar 22, 2024

bmanifold commented Mar 22, 2024

AndrewDryga commented Mar 22, 2024 •

edited

Loading

thomaseizinger commented Mar 22, 2024 •

edited

Loading

thomaseizinger commented Mar 22, 2024

thomaseizinger commented Mar 22, 2024 •

edited

Loading

bmanifold commented Mar 22, 2024

ReactorScram commented Mar 22, 2024

jamilbk commented Mar 22, 2024

feat(snownet): minimize delay when roaming #4246

feat(snownet): minimize delay when roaming #4246

Conversation

thomaseizinger commented Mar 21, 2024

vercel bot commented Mar 21, 2024 • edited Loading

thomaseizinger commented Mar 21, 2024

github-actions bot commented Mar 21, 2024 • edited Loading

Terraform Cloud Plan Output

github-actions bot commented Mar 21, 2024 • edited Loading

Performance Test Results

TCP

UDP

thomaseizinger commented Mar 21, 2024

jamilbk commented Mar 21, 2024

ReactorScram commented Mar 21, 2024

thomaseizinger commented Mar 21, 2024

thomaseizinger commented Mar 21, 2024

thomaseizinger commented Mar 21, 2024

thomaseizinger commented Mar 21, 2024

conectado commented Mar 22, 2024

thomaseizinger commented Mar 22, 2024

conectado commented Mar 22, 2024 • edited Loading

jamilbk commented Mar 22, 2024

conectado left a comment

Choose a reason for hiding this comment

jamilbk commented Mar 22, 2024

thomaseizinger commented Mar 22, 2024

thomaseizinger commented Mar 22, 2024

conectado commented Mar 22, 2024

AndrewDryga commented Mar 22, 2024

bmanifold commented Mar 22, 2024

AndrewDryga commented Mar 22, 2024 • edited Loading

thomaseizinger commented Mar 22, 2024 • edited Loading

thomaseizinger commented Mar 22, 2024

thomaseizinger commented Mar 22, 2024 • edited Loading

bmanifold commented Mar 22, 2024

ReactorScram commented Mar 22, 2024

jamilbk commented Mar 22, 2024

vercel bot commented Mar 21, 2024 •

edited

Loading

github-actions bot commented Mar 21, 2024 •

edited

Loading

github-actions bot commented Mar 21, 2024 •

edited

Loading

conectado commented Mar 22, 2024 •

edited

Loading

AndrewDryga commented Mar 22, 2024 •

edited

Loading

thomaseizinger commented Mar 22, 2024 •

edited

Loading

thomaseizinger commented Mar 22, 2024 •

edited

Loading