-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(snownet): minimize delay when roaming #4246
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
This is still a draft because there is an edge-case in which we immediately go into |
Terraform Cloud Plan Output
|
Performance Test ResultsTCP
UDP
|
@firezone/engineering I am getting pretty good results for the downtime when switching networks with this PR. I'd appreciate some testing on other platforms. My test setup was:
I am getting a downtime of about 5 seconds before the pings resume:
Note that I also didn't send the SIGHUP signal instantly, but probably had like a second or even a bit more delay so from the time the app receives the signal until it has a working connection again it might only be 4 seconds or something. |
Nice! @thomaseizinger I can test this on Apple once #4133 is nearly finished, so blocked on that atm. |
On Windows 6905491
Ping logs during that reconnection:
Not sure if I tested the right thing. |
@ReactorScram That seems like a plausible test and something users might do. Just for science reasons, can you try other ways of changing your observed public address? Moving your laptop from one WiFi with working Internet to another should also do the trick for example. Does connlib disconnect its connection at some point? i.e. do you ever see "ICE timeout" in the logs? |
Also, is "reconnect" triggered correctly as a result? You can identify that by seeing "Connected to the portal" in the logs and "Allocation mismatch". |
I've now patches our str0m fork to do what I want to do for this. We may have to come back for a different solution eventually if we can't get this upstreamed. For now, this will do and makes reconnect more stable because we don't trigger a connection failure which results in all sorts of state like cached DNS queries etc to be cleared. |
For reference, this is the patch that is now included: algesten/str0m#489 |
Just tested this PR on Android Switching networks with no noticeable downtime now 🚀 🎉 The way I tested it, load What I'd like to test next is some long-lived connection but I assume that it will work a-ok, I will make some setup to test this on android |
Can we deploy some server to download a large file? |
7e575a3
to
069e772
Compare
@bmanifold might have an idea on how to do that |
How big of a file? We already have GitHub -- we can add a repo with LFS support which supports up to 5 GB. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if we depend on a fork I think it's good to merge this ASAP since it improves UX a lot
@thomaseizinger You can also use
|
I guess we could also test using a speedtest? |
We already did depend on a fork :) |
Just tested with speed.cloudflare.com and it seems to resume connection reliably But I think something that we deploy ourselves would be better, otherwise I can't be sure we're connected through firezone |
We can upload a file to Google Cloud Storage, but you can also find a large docker image and pull it - free large binary hosting :). |
Multiple good ideas have been thrown out, but as another suggestion, if we don't care about what the file actually is, I just threw together a quick PoC in golang that creates an HTTP server that will generate an arbitrarily large random file and stream the result so that it doesn't take up any disk space or much memory on the server. I just used my personal laptop to make a curl request to download a 1GB file from my work laptop and it worked fine. All we'd need to do from there is create a docker container and then either use it in CI or we could also run it in AWS/GCP if needed. That said, we may want to be careful of downloading large files from AWS/GCP VMs outside of the AWS/GCP network as we could run up our network traffic bill. |
@bmanifold I have a small coding challenge for you. Use Elixirs Plug to stream endless random bytes :). You can see an example in our CSV export |
I built something similar for the integration test I want to write 😁 We could actually deploy that to staging too, hadn't thought of that! |
Lets hold off on this for now, I want to see if my integration tests passes with all the fixes we are putting in. Once that is done, we can think about deploying such a dummy server to staging to do more testing :) |
This has been validated on multiple systems so I am going ahead and merge it. |
😄 I like it. I might try that this weekend. |
I know for sure you can do it in a screenful of |
Building a new Android client with this. |
Currently, we need to wait for the timeout of the current candidate pair during
reconnect
before we nominate a new one. To speed this up, we can preemptively invalidate candidates we have previously discovered via ourAllocation
s, i.e. relay candidates and srflx candidates.