-
-
Couldn't load subscription status.
- Fork 106
Description
test_three_node_network_connectivity fails: asymmetric CheckConnectivity prevents NAT traversal
Summary
The test_three_node_network_connectivity test fails because peer-to-peer connections cannot establish when peers join at different times. Investigation reveals that the CheckConnectivity message flow is asymmetric - only one peer initiates an outbound connection, preventing successful NAT traversal which requires bidirectional packet exchange.
Test Configuration
- 3 nodes: 1 gateway + 2 regular peers
- All nodes configured with
min_connections=2,max_connections=2 - Peers join sequentially: Gateway at T0, Peer1 at T+12s, Peer2 at T+17s
Expected Behavior
Full mesh connectivity:
- Gateway: 2 connections (Peer1 + Peer2)
- Peer1: 2 connections (Gateway + Peer2)
- Peer2: 2 connections (Gateway + Peer1)
Actual Behavior
Partial connectivity - peers stuck at 1 connection each:
- Gateway: 2 connections ✓
- Peer1: 1 connection (Gateway only) ✗
- Peer2: 1 connection (Gateway only) ✗
Root Cause Analysis
Two Bugs Identified
Bug #1: Off-by-one error in max_connections check ✅ FIXED
Location: crates/core/src/ring/connection_manager.rs:169
Problem:
} else if total_conn >= self.max_connections { // Should be >
tracing::debug!(%peer_id, "Rejected connection, max connections reached");
falseWith max_connections=2, the condition >= means peers reject connections when total_conn=2, allowing only 1 connection instead of 2.
Fix: Change to total_conn > self.max_connections
Impact: This prevented Peer1 from accepting Peer2's connection attempt, but was not the only issue.
Bug #2: Asymmetric CheckConnectivity prevents NAT traversal ❌ PRIMARY ISSUE
Problem: When the gateway introduces two peers, it only sends CheckConnectivity to ONE of them, not both.
Evidence from debug logs:
-
Peer1 asks gateway for connections (18:58:16):
FindOptimalPeer(joiner=Peer1, ideal_location=random) → Gateway response: "No desirable peer found" (Peer2 hasn't joined yet) -
Peer2 asks gateway for connections (18:58:21):
FindOptimalPeer(joiner=Peer2, ideal_location=random) → Gateway response: "Found desirable peer: Peer1" → Gateway sends: CheckConnectivity(target=Peer1, joiner=Peer2) -
Peer1 receives CheckConnectivity:
[18:58:21.469767] Peer1: Accepting connection from, joiner: Peer2 [18:58:21.471767] Peer1: Connecting to peer, remote: Peer2 [18:58:21.471954] Peer1: Starting outbound connection to 127.0.0.1:51683 (Peer2)→ Peer1 initiates outbound NAT traversal to Peer2 ✅
-
Peer2's transport layer rejects Peer1's packets:
[18:58:21.472823] Peer2: unexpected packet from non-gateway node, remote_addr: 127.0.0.1:37039 (Peer1)→ Peer2 has NOT initiated outbound to Peer1, so rejects inbound packets ✗
Why NAT Traversal Fails
NAT traversal requires bidirectional packet exchange:
From crates/core/src/transport/connection_handler.rs:405-412:
if let Some((packets_sender, open_connection)) = ongoing_connections.remove(&remote_addr) {
// Process packet from expected peer
...
} else if !self.is_gateway {
tracing::debug!(%remote_addr, "unexpected packet from non-gateway node");
continue; // Reject packet
}Non-gateway peers only accept inbound packets from peers they've initiated outbound connections to (i.e., peers in ongoing_connections map).
The ongoing_connections map is only populated when the peer calls NodeEvent::ConnectPeer, which happens when processing CheckConnectivity (connect.rs:305-310).
Result:
- Peer1 receives
CheckConnectivity(joiner=Peer2)→ adds Peer2 toongoing_connections→ accepts Peer2's packets ✅ - Peer2 NEVER receives
CheckConnectivity(joiner=Peer1)→ never adds Peer1 toongoing_connections→ rejects Peer1's packets ✗ - NAT traversal fails because only one side is sending packets
Timing Race Condition
The asymmetry occurs because FindOptimalPeer responses depend on when the gateway learns about each peer:
-
Peer1 joins at T0: Sends
FindOptimalPeerimmediately viaaggressive_initial_connections()- Gateway hasn't seen Peer2 yet → "No desirable peer found"
- Peer1 learns about: nobody
-
Peer2 joins at T+5s: Sends
FindOptimalPeerimmediately- Gateway NOW knows about Peer1 → "Found: Peer1"
- Gateway sends
CheckConnectivity(target=Peer1, joiner=Peer2) - Peer2 learns about: Peer1
-
Later retries (T+10s, T+32s): Peer1 sends more
FindOptimalPeerrequests viaconnection_maintenance()- But Peer1 has no peer knowledge to share (only knows Gateway)
- Gateway queries Peer1: "Who should I connect to?"
- Peer1 responds: "No desirable peer found" (only knows Gateway, which is in skip list)
The early-joining peer never learns about late-joining peers.
Questions for Nacho
-
Is the asymmetric CheckConnectivity intentional?
- Should the protocol send
CheckConnectivityto BOTH peers when introducing them? - Or is there another mechanism that should trigger Peer2 to connect to Peer1?
- Should the protocol send
-
How should NAT traversal work with one-way CheckConnectivity?
- The transport layer rejects unexpected packets from non-gateway nodes (connection_handler.rs:405-412)
- But if only one peer initiates outbound, the other peer will reject packets
- Should the transport layer handle unsolicited peer connections differently?
-
Should peers retry FindOptimalPeer after initial join?
- Currently
connection_maintenance()sendsFindOptimalPeerperiodically - But if peers have no peer knowledge, they can't help the gateway find connections
- Should the gateway proactively push peer introductions when new peers join?
- Currently
-
Is this a known limitation in small networks?
- The protocol works fine in large networks where many peers have diverse connections
- But in 3-node networks, the timing race is deterministic
- Should there be special handling for small networks?
Reproduction
cd ~/code/freenet/freenet-core/main
# Apply off-by-one fix
sed -i 's/total_conn >= self.max_connections/total_conn > self.max_connections/' \
crates/core/src/ring/connection_manager.rs
# Run test (will still fail due to asymmetric CheckConnectivity)
cargo test --test connectivity test_three_node_network_connectivity -- --nocapture
# Expected: Peers stuck at 1 connection after 60s
# Logs show: Peer2 rejects Peer1's packets as "unexpected from non-gateway node"Debug Logs
Full debug logs with detailed message flow available at: /tmp/connectivity_debug.log
Key events:
- 18:58:16: Peer1 FindOptimalPeer → "No desirable peer found"
- 18:58:21: Peer2 FindOptimalPeer → "Found: Peer1"
- 18:58:21: Gateway → Peer1: CheckConnectivity(joiner=Peer2)
- 18:58:21: Peer1 initiates outbound to Peer2
- 18:58:21: Peer2 rejects Peer1's packets ("unexpected from non-gateway")
- 18:58:54: Gateway asks Peer1 for recommendations → "No desirable peer"
Related Issues
- Issue test_three_node_network_connectivity fails: peers don't connect to each other #1904: Original test failure report
- PR Fix peer recommendation to use full skip_connections set #1906: Skip-list fix (prevents recommending already-connected peers) - merged
- This issue: Documents remaining asymmetric CheckConnectivity problem
Test Location
crates/core/tests/connectivity.rs:424 - test_three_node_network_connectivity
[AI-assisted debugging and comment]
Metadata
Metadata
Assignees
Labels
Type
Projects
Status