Skip to content

fix(windows): prevent routing loops for TCP connections#6584

Merged
thomaseizinger merged 33 commits intomainfrom
test/windows-tcp-packet-loop
Sep 5, 2024
Merged

fix(windows): prevent routing loops for TCP connections#6584
thomaseizinger merged 33 commits intomainfrom
test/windows-tcp-packet-loop

Conversation

@ReactorScram
Copy link
Contributor

@ReactorScram ReactorScram commented Sep 3, 2024

In #6032, we attempted to fix routing loops for Windows and did so successfully for UDP packets. For TCP sockets, we believed that binding the socket to an interface is enough to prevent routing loops. This assumptions is wrong.

On Windows, a call to bind() affects card selection only incoming traffic, not outgoing traffic.

Thus, on a client running in a multi-homed system (i.e., more than one interface card), it's the network stack that selects the card to use, and it makes its selection based solely on the destination IP, which in turn is based on the routing table. A call to bind() will not affect the choice of the card in any way.

On most of our testing machines, this problem didn't surface but it turns out that on some machines, especially with WiFi cards there is a conflict between the routes added on the system. In particular, with the Internet resource active, we also add a catch-all route that we want to have the most priority, i.e. Windows SHOULD send all traffic to our TUN device. Except for traffic that we generate, like TCP connections to the portal or UDP packets sent to gateways, relays or DNS servers.

It appears that on some systems, mostly with Ethernet adapters, Windows picks the "correct" interface for our socket and sends traffic via that but on other systems, it doesn't. TCP sockets are only used for the WebSocket connection to the portal. Without that one, Firezone completely stops working because we can't send any control messages.

To reliably fix this issue, we need to add a dedicated route for the target IP of each TCP socket that is more specific than the Internet resource route (0.0.0.0/0) but otherwise identical. We do this as part of creating a new TCP socket. This route is for the default interface and thus, doesn't get automatically removed when Firezone exits.

We implement a RAII guard that attempts to drop the route on a best-effort basis. Despite this RAII guard, this route can linger around in case Firezone is being forced to exit or exits in otherwise unclean ways. To avoid lingering routes, we always delete all routing table entries matching the IP of the portal just before we are about to add one.

Fixes: #6591.

@ReactorScram ReactorScram self-assigned this Sep 3, 2024
@vercel
Copy link

vercel bot commented Sep 3, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
firezone ✅ Ready (Inspect) Visit Preview 💬 Add feedback Sep 5, 2024 5:56am

@github-actions
Copy link

github-actions bot commented Sep 4, 2024

🐰Bencher

ReportThu, September 5, 2024 at 06:02:33 UTC
ProjectFirezone
Branchtest/windows-tcp-packet-loop
Testbedgithub-actions

🚨 1 ALERT: Threshold Boundary Limit exceeded!
BenchmarkMeasure (units)ViewValueLower BoundaryUpper Boundary
relayed-tcp-client2serverThroughput (bits/s)🚨 (view plot | view alert)235,923,773.80 (-5.45%)239,823,019.39 (101.65%)

Click to view all benchmark results
BenchmarkThroughputThroughput Results
bits/s | (Δ%)
Throughput Lower Boundary
bits/s | (%)
direct-tcp-client2server✅ (view plot)246,138,522.82 (-0.24%)238,243,185.63 (96.79%)
direct-tcp-server2client✅ (view plot)262,640,763.25 (+4.43%)243,505,501.87 (92.71%)
direct-udp-client2server✅ (view plot)282,585,181.30 (-2.62%)271,288,650.24 (96.00%)
direct-udp-server2client✅ (view plot)416,871,445.28 (+3.74%)388,459,068.77 (93.18%)
relayed-tcp-client2server🚨 (view plot | view alert)235,923,773.80 (-5.45%)239,823,019.39 (101.65%)
relayed-tcp-server2client✅ (view plot)261,172,962.91 (+0.59%)249,955,441.77 (95.70%)
relayed-udp-client2server✅ (view plot)233,064,089.98 (+0.41%)220,658,728.83 (94.68%)
relayed-udp-server2client✅ (view plot)326,352,740.91 (-3.26%)316,328,576.73 (96.93%)

Bencher - Continuous Benchmarking
View Public Perf Page
Docs | Repo | Chat | Help

thomaseizinger and others added 22 commits September 4, 2024 22:54
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(rust/gui-client/windows): Internet stops working in certain conditions

3 participants