-
-
Notifications
You must be signed in to change notification settings - Fork 107
Open
Labels
A-developer-xpArea: developer experienceArea: developer experienceA-networkingArea: Networking, ring protocol, peer discoveryArea: Networking, ring protocol, peer discoveryE-mediumExperience needed to fix/implement: Medium / intermediateExperience needed to fix/implement: Medium / intermediateP-mediumMedium priorityMedium priorityT-bugType: Something is brokenType: Something is broken
Description
Problem
test_ping_blocked_peers_solution completes successfully (all contract operations succeed, state propagates correctly), but fails with a WebSocket protocol error during teardown:
Connection reset without closing handshake
Context
This issue was discovered while fixing two related bugs:
- PR fix: propagate connection errors for blocked peer fallback (#2021) #2105: Connection error propagation for blocked peers
- Stacked work: PUT/subscribe parent-child completion (branch
fix/put-subscribe-completion)
After fixing the PUT completion and counter overflow bugs, the test's functional requirements all pass:
- ✅ Contract PUT succeeds
- ✅ Nodes GET the contract despite being mutually blocked
- ✅ Subscribe operations complete
- ✅ State updates propagate through gateway
- ✅ All nodes converge to consistent state
The WebSocket reset occurs after operations complete, suggesting a teardown/cleanup issue rather than a functional bug.
Investigation Findings
From previous debugging (see /tmp/blocked-peers-final.txt):
- Counter overflow spam is gone after DashMap fixes
- Test runs to completion functionally
- WebSocket reset happens late in execution
- Likely related to connection lifecycle/cleanup during test shutdown
Possible Causes
- Test teardown race - WebSocket clients closing before nodes fully shut down
- Connection cleanup timing - Transport layer closing connections while WS still active
- Pre-existing issue - May have existed but was masked by other failures
Next Steps
- Add detailed logging around WebSocket lifecycle (connect, close, shutdown)
- Check if issue reproduces in other WebSocket-based tests
- Investigate connection cleanup order during node shutdown
- Consider if this is test-specific or affects production scenarios
Test Status
Test is currently marked #[ignore] with TODO-MUST-FIX to unblock PR merges for the functional fixes. The test should be re-enabled once the WebSocket issue is resolved.
Related
- PR fix: propagate connection errors for blocked peer fallback (#2021) #2105: Connection error propagation
- Issue Observing high send packet rate #2092: Reserved connection counter underflow (fixed as part of stacked work)
[AI-assisted - Claude]
Metadata
Metadata
Assignees
Labels
A-developer-xpArea: developer experienceArea: developer experienceA-networkingArea: Networking, ring protocol, peer discoveryArea: Networking, ring protocol, peer discoveryE-mediumExperience needed to fix/implement: Medium / intermediateExperience needed to fix/implement: Medium / intermediateP-mediumMedium priorityMedium priorityT-bugType: Something is brokenType: Something is broken
Type
Projects
Status
Triage