Skip to content

Feature Request: More intelligent routing peer selection for relayed connections #4603

@tkloda

Description

@tkloda

Describe the problem

When a NetBird client establishes a direct P2P connection to multiple routing peers advertising networks with the same metric, it correctly measures the latency to each peer and selects the one with the lowest latency for routing. This ensures an optimal traffic path.

However, when a client's connection to the routing peers is relayed (e.g., due to a symmetric NAT or restrictive firewall), the latency is reported as 0s. In this state, the mechanism for selecting the active routing peer appears to fall back to a random or round-robin selection.

This fallback behavior can lead to a severely suboptimal routing situation. For example, a user in Europe might be assigned a routing peer in the US, even if a European peer is available, resulting in significantly increased latency and a degraded user experience for all their routed traffic.

To Reproduce

  1. Set up at least two routing peers in geographically distant locations (e.g., one in US-West and one in EU-Central).

  2. Configure both peers to advertise the same network route (e.g., 0.0.0.0/0) with the same metric.

  3. Configure a client that is behind a network environment known to cause relayed connections (e.g., a symmetric NAT).

  4. Ensure the client can connect to the management plane and both routing peers (even if relayed).

  5. Run netbird status -d on the client.

  6. Observe that the connection Type to the routing peers is Relayed.

  7. Observe that the Latency for these peers is 0s.

  8. Observe that the assigned routing peer for the advertised network is not consistently the geographically closest one and may be selected randomly upon reconnection.

Expected behavior

The expected behavior is for relayed connections to function as closely as possible to P2P connections regarding routing peer selection.

When multiple routing peers advertise the same network with an identical metric, the client should automatically select the peer with the lowest end-to-end latency, even when the connection to those peers is relayed.

This would align the logic for both connection types, providing a consistent and optimal routing experience for all users. While a direct RTT measurement isn't possible, the system should implement a mechanism to estimate or calculate this latency to inform its decision. The current fallback to random selection should be replaced with this performance-based logic.

The final outcome should be that a user is always routed through the fastest path available, regardless of whether their connection is P2P or relayed.

Are you using NetBird Cloud?

Self-host NetBird's control plane.

NetBird version

0.59.3

Is any other VPN software installed?

None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions