-
-
Notifications
You must be signed in to change notification settings - Fork 107
fix: use Joiner enum to properly handle NAT peer address discovery #2158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes NAT peer address discovery by introducing a Joiner enum that explicitly models whether a peer knows its public address. NAT peers start as Unknown (just a public key) and are upgraded to Known (with full PeerId) when a gateway observes their address.
Key changes:
- Introduces the
Joinerenum to distinguish between peers with known vs unknown addresses - Gateways now observe and communicate the real source address to NAT peers via
ObservedAddressmessages - NAT peers update their internal address when receiving
ObservedAddress
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| crates/core/src/operations/connect.rs | Adds Joiner enum and updates connection logic to handle address discovery |
| crates/core/src/ring/connection_manager.rs | Adds update_peer_address method for NAT peers to update their address |
| crates/core/src/ring/mod.rs | Passes is_gateway flag to initiate_join_request |
| crates/core/src/node/p2p_impl.rs | Passes is_gateway flag to initiate_join_request |
| crates/core/src/node/network_bridge/p2p_protoc.rs | Updates peer initialization logic based on gateway status and NAT configuration |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| pub fn with_observed_address(&self, addr: SocketAddr) -> Self { | ||
| match self { | ||
| Joiner::Unknown(key) => Joiner::Known(PeerId::new(addr, key.clone())), | ||
| Joiner::Known(peer_id) => Joiner::Known(PeerId::new(addr, peer_id.pub_key.clone())), |
Copilot
AI
Nov 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Joiner::Known branch creates a new PeerId even if the address hasn't changed. Consider checking if the address is different before creating a new instance to avoid unnecessary allocations: if peer_id.addr == addr { self.clone() } else { Joiner::Known(PeerId::new(addr, peer_id.pub_key.clone())) }
| Joiner::Known(peer_id) => Joiner::Known(PeerId::new(addr, peer_id.pub_key.clone())), | |
| Joiner::Known(peer_id) => { | |
| if peer_id.addr == addr { | |
| self.clone() | |
| } else { | |
| Joiner::Known(PeerId::new(addr, peer_id.pub_key.clone())) | |
| } | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Applied - avoids unnecessary allocation when address hasn't changed.
| let mut this_peer = self.peer_key.lock(); | ||
| if let Some(ref mut peer) = *this_peer { | ||
| // Only update if current address is a placeholder (localhost) | ||
| if peer.addr.ip().is_loopback() { |
Copilot
AI
Nov 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check is_loopback() only validates the IP, but the comment mentions "placeholder (localhost)" suggesting a specific address. Consider also checking for the specific 127.0.0.1 placeholder or documenting why any loopback address should be treated as a placeholder. This could prevent unintended updates if a peer legitimately uses a non-127.0.0.1 loopback address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loopback check is correct - we use 127.0.0.1:port as the placeholder when a peer doesn't know its public address. Any loopback IP (127.x.x.x) should be treated as a placeholder since peers shouldn't legitimately have a loopback public address. The comment and code are consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this placeholder business, seems a bit hackish, cannot we make it explicit that the public address is not known by encoding it with an option?
| tracing::warn!( | ||
| joiner = %self.request.joiner, | ||
| "connect: cannot accept joiner without known address" |
Copilot
AI
Nov 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This warning is logged when a joiner without a known address reaches the acceptance logic, but this should never happen after the observed_addr upgrade at line 361. This suggests a logic issue: if observed_addr is Some, the joiner is upgraded to Known at line 361, so the else branch at line 391 should be unreachable. Consider adding a debug assertion or restructuring the logic to make this invariant clearer.
| tracing::warn!( | |
| joiner = %self.request.joiner, | |
| "connect: cannot accept joiner without known address" | |
| debug_assert!( | |
| false, | |
| "connect: joiner should always have a known address after observed_addr upgrade; joiner: {}", | |
| self.request.joiner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This warning is actually valid and not unreachable. The observed_addr upgrade at line 361 only happens when req.observed_addr is Some. But in early protocol stages (before gateway sends ObservedAddress), the joiner may still be Unknown. This warning helps identify cases where acceptance logic is reached without address discovery - useful for debugging protocol issues.
NAT peers cannot know their public address at startup. Previously they would advertise 127.0.0.1 which caused connection failures when other peers tried to connect to them. This PR introduces a `Joiner` enum with two variants: - `Unknown(TransportPublicKey)` - NAT peer before address discovery - `Known(PeerId)` - peer with known address (gateway or after discovery) When a gateway receives a ConnectRequest with `Unknown` joiner, it: 1. Observes the real public address from the UDP source 2. Sends an ObservedAddress message back to the joiner 3. Upgrades the joiner to `Known` with the discovered address E2E tested with technic (NAT peer) connecting to nova (gateway): - Peer sends: `joiner: Unknown(v6MWKgqHgBpgJJmn)` - Gateway observes and sends ObservedAddress - Peer updates: `old_addr: 127.0.0.1:31337, new_addr: 136.62.52.28:31337` Closes #2156 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
6123273 to
d9dfc64
Compare
|
@claude Please carefully review this PR, both at a high level (is it the right approach broadly), and the details. |
|
Claude finished @sanity's task —— View job Comprehensive PR ReviewI've completed a thorough review of PR #2158 covering both the high-level architectural approach and implementation details. Here are my findings: ✅ High-Level Approach: Excellent DesignThe introduction of the
🔍 Implementation Review: Strong with Minor ConcernsCritical Issues: None FoundThe implementation is solid. All 244 tests pass, E2E testing validates the flow, and the code handles the state transitions correctly. Areas of Concern1. Placeholder Address Pattern (iduartgomez's concern is valid)Location: pub fn update_peer_address(&self, new_addr: SocketAddr) -> bool {
// Only update if current address is a placeholder (localhost)
if peer.addr.ip().is_loopback() { ... }
}Issue: While the
Recommendation: Consider making However, this change would be invasive (affecting many parts of the codebase). If you decide to keep the current approach:
2. Optimization Opportunity (Already Applied)Location: pub fn with_observed_address(&self, addr: SocketAddr) -> Self {
match self {
Joiner::Unknown(key) => Joiner::Known(PeerId::new(addr, key.clone())),
Joiner::Known(peer_id) => {
// Avoid allocation if address hasn't changed
if peer_id.addr == addr {
self.clone()
} else {
Joiner::Known(PeerId::new(addr, peer_id.pub_key.clone()))
}
}
}
}✅ Good: You already applied the copilot suggestion to avoid unnecessary allocations when the address hasn't changed. This is a solid optimization. 3. Warning Log Reachability (copilot concern, but your response is correct)Location: } else {
tracing::warn!(
joiner = %self.request.joiner,
"connect: cannot accept joiner without known address"
);
}The copilot reviewer suggested this is unreachable after the However, the code could be clearer. Consider adding a comment: } else {
// This can happen in early protocol stages before ObservedAddress is sent.
// Useful for debugging protocol issues where acceptance logic is reached
// before address discovery completes.
tracing::warn!(
joiner = %self.request.joiner,
"connect: cannot accept joiner without known address"
);
}4. Panic Message Clarity (copilot suggestion, but not critical)Location: .expect("Gateway must have public_address configured");The copilot suggestion to make this more specific ("in network_api settings or peer_id") is reasonable but not critical. The current message is clear enough for developers. Low priority. 📋 Code Quality ObservationsStrengths
Areas for Improvement
🎯 Specific Code Review Notes
|
iduartgomez
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR seems extreme hackish (like a lot of the connect/handshake changes) but since all this is being reworked in other PRs and I guess we are basing off this one will merge.
…way param The update_peer_address method was added by PR #2158, which was reverted. Remove the call and add the missing is_gateway parameter to initiate_join_request. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Problem
NAT peers cannot know their public address at startup (issue #2156). Previously, they would advertise
127.0.0.1as their address, which caused connection failures when other peers tried to connect back to them.The root cause: when a NAT peer starts, it doesn't know its public IP. The existing code stored a placeholder address (
127.0.0.1:port), but this placeholder was treated as a real address and propagated through the network.Previous Approach
An earlier PR (#2088) attempted to fix this by rewriting addresses at the transport boundary - a hacky approach that treated the symptom rather than the cause.
This Solution
Introduce a
Joinerenum that explicitly models the two states:Key insight: If we don't know our external address, we shouldn't store a placeholder - we should model that uncertainty explicitly. This forces correct handling throughout the codebase.
Flow for NAT peers:
ConnectRequestwithjoiner: Unknown(pub_key)ObservedAddressmessage with the discovered addressKnown(PeerId)with the correct addressObservedAddressand updates its own addressChanges:
Joinerenum inconnect.rswith helper methodsConnectRequest.joinerfromPeerKeyLocationtoJoinerRelayState::handle_requestto upgrade joiner whenobserved_addris availableis_gatewayparameter toinitiate_join_requestto determine if peer knows its addressis_gatewayTesting
Unit tests: All 244 tests pass
E2E test with technic (NAT peer behind residential NAT) connecting to nova (gateway):
Fixes
Closes #2156
[AI-assisted - Claude]