-
-
Notifications
You must be signed in to change notification settings - Fork 107
fix: stabilize three-node connectivity handshake #2136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@claude Please review this PR |
|
Claude finished @sanity's task —— View job PR Review for fix-three-node-flakeTasks
Review SummaryThis PR effectively fixes the root cause of the three-node connectivity flake. The solution is clean, well-tested, and properly scoped. ✅ Strengths
🔍 Code Analysishandshake.rs changes (crates/core/src/node/network_bridge/handshake.rs:127-173):
connectivity.rs changes (crates/core/tests/connectivity.rs:11-14, 262, 281-284):
🤔 Minor Observations
💡 Questions / Suggestions
🎯 Testing ValidationThe PR description mentions:
This is solid evidence the fix works. The test is now deterministic and reproducible. ✅ VerdictLGTM - This is a high-quality fix that:
The only suggestion is the optional logging enhancement for unexpected inbound connections, but that's not blocking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a race condition in three-node connectivity tests where peers sharing the same IP address were being mis-identified during inbound connection handshakes. The gateway's expectation tracker previously fell back to any peer with a matching IP, causing peer1 connections to be incorrectly recorded as peer2.
Key Changes:
- Refactored
ExpectedInboundTrackerto use exact socket address matching instead of IP-based lookup with port fallback - Added regression test ensuring peers on the same IP remain distinct during handshake
- Introduced deterministic node locations for reproducible three-node test failures
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| crates/core/src/node/network_bridge/handshake.rs | Replaced IP-based tracking (HashMap<IpAddr, Vec<ExpectedInbound>>) with exact socket tracking (HashMap<SocketAddr, ExpectedInbound>), simplified register/drop_peer/consume methods, removed IP fallback logic, and added regression test |
| crates/core/tests/connectivity.rs | Added fixed_three_node_locations() helper for deterministic ring positions and configured test_three_node_network_connectivity to use it for reproducible failures |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Thanks for the thorough review! I added a warning when we get an inbound connection without a matching expectation (socket lookup still handles the peer correctly). Patch is in the latest push. 🙌} |
…cture PR #2136 changed ExpectedInboundTracker from HashMap<IpAddr, Vec<ExpectedInbound>> to HashMap<SocketAddr, ExpectedInbound>. The transactions_for() test helper was added in this branch before that change and wasn't updated during the merge. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Problem
test_three_node_network_connectivity was flaking on main (see CI run 19622111181) with only one live connection and PUT timeouts. The gateway sometimes mis-identified inbound peers when multiple peers shared the same IP because the inbound expectation tracker fell back to any entry with the same IP. That caused peer1 dials to be recorded as peer2, leaving peer1 disconnected.
Solution
Testing