snapshot: track completed sync terms for idempotent standby requests#461
snapshot: track completed sync terms for idempotent standby requests#461
Conversation
WalkthroughThis PR adds completion tracking per standby node and term to SnapshotManager via a new unordered_map-based data structure, introduces three helper methods to manage completion state (check, mark, erase), and integrates completion checks into snapshot sync registration and SyncWithStandby flow to prevent duplicate syncs and clean up stale data. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip You can make CodeRabbit's review stricter and more nitpicky using the `assertive` profile, if that's what you prefer.Change the |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tx_service/src/store/snapshot_manager.cpp (1)
102-117:⚠️ Potential issue | 🟠 MajorTreat superseded standby terms as safe duplicates here too.
Line 105 and Line 123 only dedupe exact completed terms. Once the same node already has a newer barrier, pending task, or completed term, an out-of-order request for an older term still falls through to
falsehere, even thoughRegisterSubscriptionBarrier()already ignores those older terms as stale. In a reordered RPC sequence, termTcan start getting rejected after termT+1is known, which breaks the idempotent/stale-request behavior this PR is aiming for. Consider returningtruewhenever the request term is older than the newest known barrier/pending/completed term for that node.Also applies to: 120-135
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tx_service/src/store/snapshot_manager.cpp` around lines 102 - 117, When handling an incoming snapshot sync request, treat any request whose term is older than the newest known term for that standby as a safe duplicate: when node_it == subscription_barrier_.end() first check IsSnapshotSyncCompletedLocked(req->standby_node_id(), req->standby_node_term()) OR whether the node’s newest known term (from subscription_barrier_, any pending-task structures, or the completed-term tracking used by IsSnapshotSyncCompletedLocked) is greater than req->standby_node_term(); if so return true instead of false. Likewise, when node_it != subscription_barrier_.end(), compare req->standby_node_term() against the stored barrier/pending term (node_it->second or the pending task entry) and return true if the request term is strictly less (stale). Apply the same “older-than-newest-known-term => return true” logic to the other similar block around lines 120-135 so superseded terms are treated as idempotent duplicates; use subscription_barrier_, IsSnapshotSyncCompletedLocked, and the pending/completed-term tracking already present to derive the newest known term.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@tx_service/src/store/snapshot_manager.cpp`:
- Around line 102-117: When handling an incoming snapshot sync request, treat
any request whose term is older than the newest known term for that standby as a
safe duplicate: when node_it == subscription_barrier_.end() first check
IsSnapshotSyncCompletedLocked(req->standby_node_id(), req->standby_node_term())
OR whether the node’s newest known term (from subscription_barrier_, any
pending-task structures, or the completed-term tracking used by
IsSnapshotSyncCompletedLocked) is greater than req->standby_node_term(); if so
return true instead of false. Likewise, when node_it !=
subscription_barrier_.end(), compare req->standby_node_term() against the stored
barrier/pending term (node_it->second or the pending task entry) and return true
if the request term is strictly less (stale). Apply the same
“older-than-newest-known-term => return true” logic to the other similar block
around lines 120-135 so superseded terms are treated as idempotent duplicates;
use subscription_barrier_, IsSnapshotSyncCompletedLocked, and the
pending/completed-term tracking already present to derive the newest known term.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6ee12fea-568a-4e7e-8713-2ce1b311fb0a
📒 Files selected for processing (2)
tx_service/include/store/snapshot_manager.htx_service/src/store/snapshot_manager.cpp
Summary by CodeRabbit
Release Notes