-
-
Notifications
You must be signed in to change notification settings - Fork 105
fix: register subscribers for locally-cached contracts (issue #2001) #2004
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…multiple_clients_subscription Addresses issue #2001 where the test_multiple_clients_subscription integration test exhibits 20-40% flakiness due to a race condition. The test was starting UPDATE operations before cross-node subscriptions had fully propagated through the ring, causing notifications to be missed by Client 3 (on a different node). This fix adds a 5-second delay after all subscription confirmations are received, allowing time for subscriptions to propagate across nodes before the UPDATE operation begins. This is a pragmatic short-term solution. A proper architectural fix would involve making SUBSCRIBE operations truly synchronous with acknowledgment from the target node (see issue #2001 for discussion). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
## Root Cause When a contract was already cached locally, the SUBSCRIBE operation took a fast path that: 1. Verified contract exists locally 2. Sent SubscribeResponse to client 3. **Never called add_subscriber() to register the subscription** This caused UPDATE operations to find no subscribers for locally-cached contracts, silently dropping notifications. ## Why It Was Flaky - Contract on remote node: Normal flow → add_subscriber() called → works - Contract locally cached: Fast path → add_subscriber() skipped → breaks - Timing-dependent: Whether contract is local when subscription occurs ## The Fix Added call to add_subscriber() in the local subscription fast path (crates/core/src/operations/subscribe.rs:91-98), ensuring subscriptions are registered regardless of cache location. ## Verification Ran test 15 times - all passes (100% success rate vs. previous 60-80%) ## Impact - Fixes intermittent test failures - Ensures UPDATE notifications work correctly for locally-cached contracts - No performance impact (single DashMap insertion) - Removes need for 5-second delay workaround in test [AI-assisted debugging and comment]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Fixes a critical bug where subscriptions for locally-cached contracts were not being registered, causing UPDATE notifications to be silently dropped. This resolves the flaky test_multiple_clients_subscription test that was failing 20-40% of the time.
Key Changes:
- Added
add_subscriber()call in the local subscription fast path to register subscriptions in the DashMap - Added informational logging to track subscription registration flow
- Removed unnecessary 5-second delay workaround from the test
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| crates/core/src/operations/subscribe.rs | Added subscriber registration logic for locally-cached contracts to ensure UPDATE notifications are properly delivered |
| crates/core/tests/operations.rs | Added logging statement to track subscription completion timing |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, me what happens if the peer acquires downstream/upstream external subcriptions later on subsequent updates?
|
@iduartgomez Great question! The fix handles this correctly. Here's what happens: Scenario: Local subscription first, external subscriptions laterInitial state (what the fix addresses):
Later UPDATE from another peer (external subscription chain):
Why this works correctly
Edge cases handled
The fix ensures that when a contract is locally cached, the local subscription is registered immediately before completing the SUBSCRIBE operation. Any subsequent external subscriptions augment this list, creating a proper subscription tree where UPDATE notifications flow to all registered paths. [AI-assisted debugging and comment] |
Summary
Fixes the flaky
test_multiple_clients_subscriptiontest (issue #2001) by ensuring subscriptions are properly registered for locally-cached contracts.Problem
The test exhibited 20-40% failure rate due to missed UPDATE notifications. Through systematic investigation with targeted logging, I discovered the root cause:
When a contract was already cached locally, the SUBSCRIBE operation took a fast path that:
add_subscriber()to register the subscriptionThis caused UPDATE operations to find no subscribers for locally-cached contracts, silently dropping notifications.
Why It Was Flaky
add_subscriber()called → worksadd_subscriber()skipped → breaksThe Fix
File:
crates/core/src/operations/subscribe.rs:91-98Added call to
add_subscriber()in the local subscription fast path, ensuring subscriptions are registered regardless of cache location:Verification
Ran test 15 times - all passes (100% success rate vs. previous 60-80%)
Impact
Test Changes
Also removed the 5-second delay workaround from
test_multiple_clients_subscriptionthat was added in commit cbd0849, as it's no longer needed with the proper fix.Closes #2001
[AI-assisted debugging and comment]