Replace fixed-size standby message buffer with memory-bounded queue#211
Replace fixed-size standby message buffer with memory-bounded queue#211
Conversation
WalkthroughConvert raw Changes
Sequence Diagram(s)sequenceDiagram
participant Op as Operation (commit/abort)
participant CCE as CcEntry / LruEntry
participant NB as KeyGapLockAndExtraData
participant CcShard as CcShard
participant Stream as StreamSender
Note over Op: Commit/Truncate triggers standby forward
Op->>CCE: request standby forward
CCE->>NB: SetForwardEntry(std::unique_ptr) — ownership stored
CCE->>CcShard: ForwardStandbyMessage(std::move(unique_ptr)) — transfer ownership
CcShard->>CcShard: assign sequence_id, enqueue in history_standby_msg_ and seq_id_to_entry_map_
alt memory limit exceeded
CcShard->>CcShard: CheckAndFreeUnneededEntries() → evict oldest entries, set out_of_sync
end
CcShard->>Stream: send message (uses raw ptr from unique_ptr)
Stream-->>CcShard: send result (success/fail)
alt send failed
CcShard->>CcShard: schedule retry (lookup by seq_id in map) or trigger resubscribe if evicted
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (4)
🧰 Additional context used🧠 Learnings (2)📚 Learning: 2025-11-11T07:10:40.346ZApplied to files:
📚 Learning: 2025-10-09T03:56:58.811ZApplied to files:
🧬 Code graph analysis (2)src/cc/cc_shard.cpp (1)
include/cc/cc_shard.h (1)
🪛 markdownlint-cli2 (0.18.1).cursor_knowledge/standby-message-buffer-memory-bounded.md52-52: Fenced code blocks should have a language specified (MD040, fenced-code-language) 🔇 Additional comments (11)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/remote/cc_node_service.cpp (1)
1622-1637: Roll back candidate tracking when start-following fails
IfCheckLeaderTermfails on any shard after earlier shards have already executedAddCandidateStandby, we return early withresponse->set_error(true)but never remove the entries that were inserted on those earlier shards. The new candidate bookkeeping therefore leaves the standby marked as “in progress” even though the start-following attempt failed, which can skew memory-bounded eviction logic downstream. Please add a rollback path (e.g., enqueue aWaitableCcover all shards that callsRemoveCandidateStandby(node_id)) before returning in the error branch, or defer callingAddCandidateStandbyuntil after the loop succeeds for every shard. Otherwise the candidate set can permanently accumulate stale nodes.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
include/cc/catalog_cc_map.h(2 hunks)include/cc/cc_entry.h(1 hunks)include/cc/cc_shard.h(5 hunks)include/cc/non_blocking_lock.h(4 hunks)include/cc/object_cc_map.h(9 hunks)include/standby.h(1 hunks)src/cc/cc_shard.cpp(10 hunks)src/cc/non_blocking_lock.cpp(1 hunks)src/remote/cc_node_service.cpp(2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-11-11T07:10:40.324Z
Learnt from: lzxddz
Repo: eloqdata/tx_service PR: 199
File: include/cc/local_cc_shards.h:233-234
Timestamp: 2025-11-11T07:10:40.324Z
Learning: In the LocalCcShards class in include/cc/local_cc_shards.h, the EnqueueCcRequest methods use `shard_code & 0x3FF` followed by `% cc_shards_.size()` to distribute work across processor cores for load balancing. This is intentional and separate from partition ID calculation. The 0x3FF mask creates a consistent distribution range (0-1023) before modulo by actual core count.
Applied to files:
include/cc/cc_shard.hsrc/cc/cc_shard.cpp
📚 Learning: 2025-10-09T03:56:58.811Z
Learnt from: thweetkomputer
Repo: eloqdata/tx_service PR: 150
File: include/cc/local_cc_shards.h:626-631
Timestamp: 2025-10-09T03:56:58.811Z
Learning: For the LocalCcShards class in include/cc/local_cc_shards.h: Writer locks (unique_lock) should continue using the original meta_data_mux_ (std::shared_mutex) rather than fast_meta_data_mux_ (FastMetaDataMutex) at this stage. Only reader locks may use the FastMetaDataMutex wrapper.
Applied to files:
include/cc/cc_shard.hsrc/cc/cc_shard.cppsrc/cc/non_blocking_lock.cpp
🧬 Code graph analysis (5)
include/cc/cc_entry.h (1)
src/cc/non_blocking_lock.cpp (4)
SetForwardEntry(753-757)SetForwardEntry(753-754)ReleaseForwardEntry(764-768)ReleaseForwardEntry(765-765)
include/cc/non_blocking_lock.h (2)
include/cc/cc_entry.h (1)
SetForwardEntry(383-387)src/cc/non_blocking_lock.cpp (4)
SetForwardEntry(753-757)SetForwardEntry(753-754)ReleaseForwardEntry(764-768)ReleaseForwardEntry(765-765)
include/cc/cc_shard.h (1)
src/cc/cc_shard.cpp (10)
CcShard(58-200)CcShard(202-210)AddCandidateStandby(2541-2547)AddCandidateStandby(2541-2541)RemoveCandidateStandby(2549-2558)RemoveCandidateStandby(2549-2549)CheckAndFreeUnneededEntries(2560-2642)CheckAndFreeUnneededEntries(2560-2560)CalculateEntryMemorySize(2532-2539)CalculateEntryMemorySize(2532-2533)
src/cc/cc_shard.cpp (1)
include/cc/cc_shard.h (4)
standby_buffer_memory_limit_(1170-1170)CcShard(233-926)total_standby_buffer_memory_usage_(1168-1168)next_forward_sequence_id_(1171-1171)
src/cc/non_blocking_lock.cpp (1)
include/cc/cc_entry.h (2)
SetForwardEntry(383-387)ForwardEntry(377-381)
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/cc/cc_shard.cpp (1)
2757-2777: CRITICAL: Fix backwards candidate eviction condition.The condition
if (oldest_seq_id <= candidate_it->second)incorrectly evicts candidates. This matches the issue flagged in the past review comment.Problem: When we evict
oldest_seq_id=10and a candidate needsstart_seq_id=15, the condition10 <= 15evaluates totrue, causing premature removal even though the candidate doesn't need that message yet. Conversely, whenoldest_seq_id=15and candidate needsstart_seq_id=10, the condition15 <= 10evaluates tofalse, so we don't remove the candidate even though we just evicted an entry they need.Fix: Reverse the comparison:
- if (oldest_seq_id <= candidate_it->second) + if (oldest_seq_id >= candidate_it->second)This ensures candidates are only removed when we evict an entry at or after their required starting sequence.
🧹 Nitpick comments (2)
include/cc/object_cc_map.h (1)
953-955: Null out the locals after moving the entry.After
ReleaseForwardEntry()succeeds, the localforward_entry(and the adjacentforward_req) still look non-null even though ownership has moved away. Later guards such asif (forward_entry)will keep triggering despite the entry being gone, which is confusing and makes accidental reuse easier. Please set both locals tonullptrimmediately after the move so their state mirrors reality.src/cc/cc_shard.cpp (1)
2648-2653: Consider accepting unique_ptr parameter for clearer ownership semantics.Taking a raw pointer and immediately wrapping it in
unique_ptrobscures ownership transfer at call sites. Consider changing the signature to:void ForwardStandbyMessage(std::unique_ptr<StandbyForwardEntry> entry)This makes ownership transfer explicit in the function signature.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
include/cc/catalog_cc_map.h(2 hunks)include/cc/cc_entry.h(1 hunks)include/cc/cc_shard.h(5 hunks)include/cc/non_blocking_lock.h(4 hunks)include/cc/object_cc_map.h(9 hunks)include/standby.h(1 hunks)src/cc/cc_shard.cpp(10 hunks)src/cc/non_blocking_lock.cpp(1 hunks)src/remote/cc_node_service.cpp(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- include/cc/cc_entry.h
- src/remote/cc_node_service.cpp
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-11-11T07:10:40.324Z
Learnt from: lzxddz
Repo: eloqdata/tx_service PR: 199
File: include/cc/local_cc_shards.h:233-234
Timestamp: 2025-11-11T07:10:40.324Z
Learning: In the LocalCcShards class in include/cc/local_cc_shards.h, the EnqueueCcRequest methods use `shard_code & 0x3FF` followed by `% cc_shards_.size()` to distribute work across processor cores for load balancing. This is intentional and separate from partition ID calculation. The 0x3FF mask creates a consistent distribution range (0-1023) before modulo by actual core count.
Applied to files:
src/cc/cc_shard.cppinclude/cc/cc_shard.h
📚 Learning: 2025-10-09T03:56:58.811Z
Learnt from: thweetkomputer
Repo: eloqdata/tx_service PR: 150
File: include/cc/local_cc_shards.h:626-631
Timestamp: 2025-10-09T03:56:58.811Z
Learning: For the LocalCcShards class in include/cc/local_cc_shards.h: Writer locks (unique_lock) should continue using the original meta_data_mux_ (std::shared_mutex) rather than fast_meta_data_mux_ (FastMetaDataMutex) at this stage. Only reader locks may use the FastMetaDataMutex wrapper.
Applied to files:
src/cc/cc_shard.cppinclude/cc/cc_shard.hinclude/cc/object_cc_map.hsrc/cc/non_blocking_lock.cpp
🧬 Code graph analysis (4)
src/cc/cc_shard.cpp (1)
include/cc/cc_shard.h (4)
standby_buffer_memory_limit_(1170-1170)CcShard(233-926)total_standby_buffer_memory_usage_(1168-1168)next_forward_sequence_id_(1171-1171)
include/cc/cc_shard.h (1)
src/cc/cc_shard.cpp (10)
CcShard(58-200)CcShard(202-210)AddCandidateStandby(2541-2547)AddCandidateStandby(2541-2541)RemoveCandidateStandby(2549-2558)RemoveCandidateStandby(2549-2549)CheckAndFreeUnneededEntries(2560-2642)CheckAndFreeUnneededEntries(2560-2560)CalculateEntryMemorySize(2532-2539)CalculateEntryMemorySize(2532-2533)
include/cc/non_blocking_lock.h (2)
include/cc/cc_entry.h (1)
SetForwardEntry(383-387)src/cc/non_blocking_lock.cpp (4)
SetForwardEntry(753-757)SetForwardEntry(753-754)ReleaseForwardEntry(764-768)ReleaseForwardEntry(765-765)
src/cc/non_blocking_lock.cpp (1)
include/cc/cc_entry.h (2)
SetForwardEntry(383-387)ForwardEntry(377-381)
🔇 Additional comments (13)
include/cc/cc_shard.h (5)
31-31: LGTM: Include added for new data structure.The
<deque>include supports the newhistory_standby_msg_member.
238-238: LGTM: Destructor declaration added.The destructor is needed to properly clean up the new memory-bounded standby buffer structures.
955-957: LGTM: Public API additions for candidate standby tracking.The new methods provide clean lifecycle management for candidate standby nodes transitioning to subscribed state.
1074-1076: LGTM: Memory accounting helper added.The private helper encapsulates entry size calculation for memory tracking.
1159-1170: LGTM: Memory-bounded buffer replaces fixed-size approach.The new design uses:
history_standby_msg_(deque of unique_ptr) for ownership and FIFO evictionseq_id_to_entry_map_for O(1) lookups during resendcandidate_standby_nodes_to track nodes transitioning from candidate to subscribed- Memory tracking fields to enforce the configured limit
This addresses the PR objectives to improve memory efficiency.
src/cc/cc_shard.cpp (8)
111-113: LGTM: Standby buffer memory limit configured.Allocating 10% of node memory per shard for the standby buffer provides a reasonable bound while allowing flexibility for replication lag tolerance.
202-210: LGTM: Destructor cleanup order is correct.Clearing
history_standby_msg_first ensures entries are freed before clearingseq_id_to_entry_map_, which contains non-owning pointers.
221-225: LGTM: Initialization is explicit and safe.While the structures are already empty after construction, explicit clearing makes the initialization intent clear and safe for potential reinitialization paths.
2541-2547: LGTM: Candidate tracking added with logging.The implementation correctly tracks candidate standby nodes. The map insertion will overwrite if a node is added twice, which is acceptable for this use case.
2549-2558: LGTM: Safe candidate removal with logging.The implementation safely checks for existence before removal and logs the operation for observability.
2560-2642: LGTM: Entry garbage collection logic is sound.The function correctly:
- Computes the minimum sequence ID needed across all subscribed and candidate standbys
- Removes entries from the front of the deque (FIFO) that are no longer needed
- Updates memory accounting as entries are freed
The logic properly handles the case where no standbys exist (clears everything).
2796-2907: LGTM: Resend logic correctly handles memory-bounded buffer.The refactored resend logic properly:
- Uses
seq_id_to_entry_map_for O(1) lookups instead of linear scan- Handles evicted entries gracefully by sending out-of-sync notifications to lagging standbys
- Calls
CheckAndFreeUnneededEntries()after retries to reclaim memory from entries no longer neededThe batch limit of 500 messages per standby prevents excessive blocking.
3007-3022: LGTM: Complete reset of standby state.The function properly clears all memory-bounded buffer structures and resets memory accounting when standby sequence is reset.
1ed633d to
c2da4ad
Compare
|
@CodeRabbit review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
.cursor_knowledge/standby-message-buffer-memory-bounded.md (1)
216-228: Reinforce memory calculation accuracy concerns and add validation guidance.The "Open Questions" section raises valid concerns about memory calculation accuracy and eviction churn. Consider adding:
- Memory calculation accuracy: Protobuf heap allocation patterns vary. Suggest caching per-entry size at allocation time rather than recalculating during eviction.
- Test coverage: Add guidance for test scenarios validating eviction behavior, candidate tracking lifecycle, and out-of-sync notification handling.
- Monitoring: Recommend observability points for tracking memory pressure events and eviction frequency to tune the 10% memory limit in production.
Consider expanding the "Open Questions & Known Risks" section with concrete mitigation strategies and test case patterns for validation.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.cursor_knowledge/standby-message-buffer-memory-bounded.md(1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
.cursor_knowledge/standby-message-buffer-memory-bounded.md
50-50: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
| ``` | ||
| tx_service/include/cc/ | ||
| - cc_shard.h (data structures, method declarations) | ||
| - cc_entry.h (SetForwardEntry, ReleaseForwardEntry) | ||
| - non_blocking_lock.h (forward_entry_ as unique_ptr) | ||
| - object_cc_map.h (allocation point, ownership transfer) | ||
| - catalog_cc_map.h (catalog allocation point) | ||
|
|
||
| tx_service/src/cc/ | ||
| - cc_shard.cpp (ForwardStandbyMessage, ResendFailedForwardMessages, memory tracking, eviction) | ||
| - cc_entry.cpp (SetForwardEntry, ReleaseForwardEntry implementations) | ||
| - non_blocking_lock.cpp (SetForwardEntry, ReleaseForwardEntry implementations) | ||
|
|
||
| tx_service/src/remote/ | ||
| - cc_node_service.cpp (StandbyStartFollowing, ResetStandbySequenceId handlers) | ||
|
|
||
| tx_service/include/ | ||
| - standby.h (StandbyForwardEntry structure, Remove Reset() method) | ||
| ``` |
There was a problem hiding this comment.
Add language identifier to code fence.
The file structure code block lacks a language specification. Use text or plaintext for the fenced code block.
-**File Structure:**
-```
+**File Structure:**
+```text
tx_service/include/cc/
- cc_shard.h (data structures, method declarations)
- cc_entry.h (SetForwardEntry, ReleaseForwardEntry)
- non_blocking_lock.h (forward_entry_ as unique_ptr)
- object_cc_map.h (allocation point, ownership transfer)
- catalog_cc_map.h (catalog allocation point)
tx_service/src/cc/
- cc_shard.cpp (ForwardStandbyMessage, ResendFailedForwardMessages, memory tracking, eviction)
- cc_entry.cpp (SetForwardEntry, ReleaseForwardEntry implementations)
- non_blocking_lock.cpp (SetForwardEntry, ReleaseForwardEntry implementations)
tx_service/src/remote/
- cc_node_service.cpp (StandbyStartFollowing, ResetStandbySequenceId handlers)
tx_service/include/
- standby.h (StandbyForwardEntry structure, Remove Reset() method)
-```
+```🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
50-50: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
In .cursor_knowledge/standby-message-buffer-memory-bounded.md around lines 50 to
68 the fenced file-structure block is missing a language identifier; update the
opening fence from ``` to ```text (or ```plaintext) so the block reads ```text
and keep the closing ``` as-is, ensuring the diffs reflect this change.
|
@CodeRabbit review |
✅ Actions performedReview triggered.
|
MrGuin
left a comment
There was a problem hiding this comment.
Should CheckAndFreeUnneededEntries when history_standby_msg_ is no longer needed, like RemoveSubscribedStandby and ResetStandbySequenceId.
src/cc/cc_shard.cpp
Outdated
| assert(entry != nullptr); | ||
| // Memory size = protobuf message size + overhead | ||
| // Overhead: bool in_use_ + uint64_t sequence_id_ = 9 bytes | ||
| return entry->Message().ByteSizeLong() + sizeof(bool) + sizeof(uint64_t); |
There was a problem hiding this comment.
From ChatGpt, seems that ByteSizeLong iterates all the protobuf message fields, the CcMessage. If it takes quite some time maybe we should only calucalte the necessary fields set.
And ByteSizeLong might not be consistent with the real memory allocated.
| // In ResetStandbySequenceId handler: | ||
| ccs.RemoveCandidateStandby(node_id); | ||
| ccs.AddSubscribedStandby(node_id, start_seq_id, standby_node_term); | ||
| ccs.CheckAndFreeUnneededEntries(); // Free entries no longer needed by this candidate |
There was a problem hiding this comment.
From the code, ResetStandbySequenceId handler does not call ccs.CheckAndFreeUnneededEntries().
There was a problem hiding this comment.
yes I have removed this line since candidate standby is not removed here, it is just moved to official standby and the referenced msg has not changed
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
.cursor_knowledge/standby-message-buffer-memory-bounded.md (2)
50-68: Add language identifier to fenced code block.Use "text" to satisfy MD040.
Apply:
-**File Structure:** -``` +**File Structure:** +```text ...--- `88-94`: **Fix stale overhead comment and formula (no in_use_ field).** Remove sizeof(bool) and describe actual members. Apply: ```diff -uint64_t CalculateEntryMemorySize(const StandbyForwardEntry *entry) const { - // protobuf size + overhead (bool in_use_ + uint64_t sequence_id_) - return entry->Message().ByteSizeLong() + sizeof(bool) + sizeof(uint64_t); -} +uint64_t CalculateEntryMemorySize(const StandbyForwardEntry *entry) const { + // protobuf size + overhead (uint64_t sequence_id_) + return entry->Message().ByteSizeLong() + sizeof(uint64_t); +}src/cc/cc_shard.cpp (1)
2532-2539: Fix memory accounting: remove nonexistent bool and stale comment.Incorrect overhead skews eviction decisions.
Apply:
-uint64_t CcShard::CalculateEntryMemorySize( - const StandbyForwardEntry *entry) const -{ - assert(entry != nullptr); - // Memory size = protobuf message size + overhead - // Overhead: bool in_use_ + uint64_t sequence_id_ = 9 bytes - return entry->Message().ByteSizeLong() + sizeof(bool) + sizeof(uint64_t); -} +uint64_t CcShard::CalculateEntryMemorySize( + const StandbyForwardEntry *entry) const +{ + assert(entry != nullptr); + // Memory size = protobuf message size + overhead + // Overhead: uint64_t sequence_id_ + return entry->Message().ByteSizeLong() + sizeof(uint64_t); +}
🧹 Nitpick comments (2)
src/cc/cc_shard.cpp (2)
2549-2558: Free memory promptly after removing a candidate.Call CheckAndFreeUnneededEntries() to drop entries that are no longer needed once a candidate is removed.
Apply:
void CcShard::RemoveCandidateStandby(uint32_t node_id) { auto it = candidate_standby_nodes_.find(node_id); if (it != candidate_standby_nodes_.end()) { candidate_standby_nodes_.erase(it); LOG(INFO) << "Removed candidate standby node " << node_id << " from shard " << core_id_; + CheckAndFreeUnneededEntries(); } }
2738-2756: Avoid repeated ByteSizeLong() calls in hot path.Cache entry size once at enqueue time (store in StandbyForwardEntry) to reduce protobuf traversal during add/evict loops.
I can draft a small change adding a cached_size_ field in StandbyForwardEntry and using it here; say the word.
Also applies to: 2628-2634
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
.cursor_knowledge/standby-message-buffer-memory-bounded.md(1 hunks)include/cc/catalog_cc_map.h(2 hunks)include/cc/cc_entry.h(1 hunks)include/cc/cc_shard.h(6 hunks)include/cc/non_blocking_lock.h(4 hunks)include/cc/object_cc_map.h(9 hunks)include/standby.h(1 hunks)src/cc/cc_shard.cpp(10 hunks)src/cc/non_blocking_lock.cpp(1 hunks)src/remote/cc_node_service.cpp(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- include/cc/catalog_cc_map.h
- src/cc/non_blocking_lock.cpp
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-11-11T07:10:40.346Z
Learnt from: lzxddz
Repo: eloqdata/tx_service PR: 199
File: include/cc/local_cc_shards.h:233-234
Timestamp: 2025-11-11T07:10:40.346Z
Learning: In the LocalCcShards class in include/cc/local_cc_shards.h, the EnqueueCcRequest methods use `shard_code & 0x3FF` followed by `% cc_shards_.size()` to distribute work across processor cores for load balancing. This is intentional and separate from partition ID calculation. The 0x3FF mask creates a consistent distribution range (0-1023) before modulo by actual core count.
Applied to files:
include/cc/cc_shard.hsrc/cc/cc_shard.cppsrc/remote/cc_node_service.cpp
📚 Learning: 2025-10-09T03:56:58.811Z
Learnt from: thweetkomputer
Repo: eloqdata/tx_service PR: 150
File: include/cc/local_cc_shards.h:626-631
Timestamp: 2025-10-09T03:56:58.811Z
Learning: For the LocalCcShards class in include/cc/local_cc_shards.h: Writer locks (unique_lock) should continue using the original meta_data_mux_ (std::shared_mutex) rather than fast_meta_data_mux_ (FastMetaDataMutex) at this stage. Only reader locks may use the FastMetaDataMutex wrapper.
Applied to files:
include/cc/cc_shard.hinclude/cc/object_cc_map.hinclude/cc/non_blocking_lock.hsrc/cc/cc_shard.cpp
🧬 Code graph analysis (4)
include/cc/cc_entry.h (1)
src/cc/non_blocking_lock.cpp (4)
SetForwardEntry(753-757)SetForwardEntry(753-754)ReleaseForwardEntry(764-768)ReleaseForwardEntry(765-765)
include/cc/cc_shard.h (1)
src/cc/cc_shard.cpp (10)
CcShard(58-200)CcShard(202-210)AddCandidateStandby(2541-2547)AddCandidateStandby(2541-2541)RemoveCandidateStandby(2549-2558)RemoveCandidateStandby(2549-2549)CheckAndFreeUnneededEntries(2560-2642)CheckAndFreeUnneededEntries(2560-2560)CalculateEntryMemorySize(2532-2539)CalculateEntryMemorySize(2532-2533)
include/cc/non_blocking_lock.h (2)
include/cc/cc_entry.h (1)
SetForwardEntry(383-387)src/cc/non_blocking_lock.cpp (4)
SetForwardEntry(753-757)SetForwardEntry(753-754)ReleaseForwardEntry(764-768)ReleaseForwardEntry(765-765)
src/cc/cc_shard.cpp (1)
include/cc/cc_shard.h (3)
standby_buffer_memory_limit_(1171-1171)total_standby_buffer_memory_usage_(1169-1169)next_forward_sequence_id_(1172-1172)
🪛 markdownlint-cli2 (0.18.1)
.cursor_knowledge/standby-message-buffer-memory-bounded.md
50-50: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (4)
include/cc/object_cc_map.h (1)
873-877: Ownership transfer of StandbyForwardEntry looks correct.Make-unique + SetForwardEntry, then ReleaseForwardEntry to either forward or free is clean and leak-safe. LGTM.
Also applies to: 953-956, 1116-1119, 1135-1137, 1187-1195, 1393-1396, 1436-1438, 2432-2435
src/cc/cc_shard.cpp (3)
3012-3017: Reset path LGTM.History cleared and counters reset; consistent with new model.
2648-2650: Forwarding ownership flow LGTM.Taking ownership, sending, and conditional enqueue/evict are consistent and safe.
Also applies to: 2715-2796
2560-2642: Entry reclamation triggers integrated well.Min-needed logic and post-retry/post-unsubscribe cleanup look correct.
Also applies to: 2899-2907, 2934-2938
include/cc/cc_shard.h
Outdated
| // Calculate memory size of a StandbyForwardEntry | ||
| // Returns: msg.ByteSizeLong() + sizeof(bool) + sizeof(uint64_t) | ||
| uint64_t CalculateEntryMemorySize(const StandbyForwardEntry *entry) const; | ||
|
|
There was a problem hiding this comment.
Correct the memory-size comment for StandbyForwardEntry.
The comment still mentions a removed bool; reflect only sequence_id_.
Apply:
-// Calculate memory size of a StandbyForwardEntry
-// Returns: msg.ByteSizeLong() + sizeof(bool) + sizeof(uint64_t)
+// Calculate memory size of a StandbyForwardEntry
+// Returns: msg.ByteSizeLong() + sizeof(uint64_t)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Calculate memory size of a StandbyForwardEntry | |
| // Returns: msg.ByteSizeLong() + sizeof(bool) + sizeof(uint64_t) | |
| uint64_t CalculateEntryMemorySize(const StandbyForwardEntry *entry) const; | |
| // Calculate memory size of a StandbyForwardEntry | |
| // Returns: msg.ByteSizeLong() + sizeof(uint64_t) | |
| uint64_t CalculateEntryMemorySize(const StandbyForwardEntry *entry) const; |
🤖 Prompt for AI Agents
In include/cc/cc_shard.h around lines 1075 to 1078, the comment for
CalculateEntryMemorySize incorrectly mentions a removed bool; update the comment
to reflect that StandbyForwardEntry memory size is msg.ByteSizeLong() plus only
the size of sequence_id_ (uint64_t). Replace the Returns line with something
like "Returns: msg.ByteSizeLong() + sizeof(uint64_t)" or "Returns:
msg.ByteSizeLong() + sizeof(sequence_id_)" to accurately describe the
calculation.
Replace the fixed-size vector-based standby message buffer with a memory-bounded queue that tracks memory usage and evicts oldest messages when the limit is reached. This change improves memory efficiency by allocating entries on-demand and freeing them immediately when no longer needed.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (4)
.cursor_knowledge/standby-message-buffer-memory-bounded.md (2)
50-68: Add language identifier to code fence.The file structure code block lacks a language specification.
Apply this diff:
**File Structure:** -``` +```text tx_service/include/cc/ - cc_shard.h (data structures, method declarations) ...--- `88-94`: **Update memory calculation comment to reflect actual struct members.** The comment references `in_use_` which was removed from `StandbyForwardEntry`. The struct only contains `sequence_id_` and `msg`. Apply this diff: ```diff uint64_t CalculateEntryMemorySize(const StandbyForwardEntry *entry) const { - // protobuf size + overhead (bool in_use_ + uint64_t sequence_id_) - return entry->Message().ByteSizeLong() + sizeof(bool) + sizeof(uint64_t); + // protobuf size + overhead (uint64_t sequence_id_) + return entry->Message().ByteSizeLong() + sizeof(uint64_t); }include/cc/cc_shard.h (1)
1075-1078: Correct the memory-size comment to match actual struct.The comment incorrectly references a removed
boolfield.StandbyForwardEntrycontains onlysequence_id_andmsg.Apply:
// Calculate memory size of a StandbyForwardEntry -// Returns: msg.ByteSizeLong() + sizeof(bool) + sizeof(uint64_t) +// Returns: msg.ByteSizeLong() + sizeof(uint64_t)src/cc/cc_shard.cpp (1)
2532-2539: Fix stale comment and size calculationThe comment references a removed
in_use_field. Also, accounting should exclude it.- // Overhead: bool in_use_ + uint64_t sequence_id_ = 9 bytes - return entry->Message().ByteSizeLong() + sizeof(bool) + sizeof(uint64_t); + // Overhead: uint64_t sequence_id_ (8 bytes) + return entry->Message().ByteSizeLong() + sizeof(uint64_t);#!/bin/bash # Ensure StandbyForwardEntry fields match the comment rg -nU "struct\s+StandbyForwardEntry\b" -n -A40 include | sed -n '1,80p'
🧹 Nitpick comments (3)
include/cc/object_cc_map.h (1)
953-956: Pass unique_ptr directly to ForwardStandbyMessage (optional)You release the ptr here and re-wrap inside ForwardStandbyMessage. Consider adding an overload that takes std::unique_ptr to avoid raw-pointer hops.
src/cc/cc_shard.cpp (2)
111-114: Avoid float math in limit calculationUse integer math to prevent truncation surprises.
- standby_buffer_memory_limit_ = - (uint64_t) MB(node_memory_limit_mb) * 0.1 / core_cnt_; + standby_buffer_memory_limit_ = + (uint64_t)(MB(node_memory_limit_mb) / core_cnt_ / 10);
2648-2653: Take unique_ptr directly (optional)Accept a
std::unique_ptr<StandbyForwardEntry>parameter to avoid raw→unique_ptr round-trip here. Keep current overload for compatibility.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
.cursor_knowledge/standby-message-buffer-memory-bounded.md(1 hunks)include/cc/catalog_cc_map.h(2 hunks)include/cc/cc_entry.h(1 hunks)include/cc/cc_shard.h(6 hunks)include/cc/non_blocking_lock.h(4 hunks)include/cc/object_cc_map.h(9 hunks)include/standby.h(1 hunks)src/cc/cc_shard.cpp(10 hunks)src/cc/non_blocking_lock.cpp(1 hunks)src/remote/cc_node_service.cpp(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- include/cc/catalog_cc_map.h
- include/cc/cc_entry.h
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-11-11T07:10:40.346Z
Learnt from: lzxddz
Repo: eloqdata/tx_service PR: 199
File: include/cc/local_cc_shards.h:233-234
Timestamp: 2025-11-11T07:10:40.346Z
Learning: In the LocalCcShards class in include/cc/local_cc_shards.h, the EnqueueCcRequest methods use `shard_code & 0x3FF` followed by `% cc_shards_.size()` to distribute work across processor cores for load balancing. This is intentional and separate from partition ID calculation. The 0x3FF mask creates a consistent distribution range (0-1023) before modulo by actual core count.
Applied to files:
src/remote/cc_node_service.cppinclude/cc/cc_shard.hsrc/cc/cc_shard.cpp
📚 Learning: 2025-10-09T03:56:58.811Z
Learnt from: thweetkomputer
Repo: eloqdata/tx_service PR: 150
File: include/cc/local_cc_shards.h:626-631
Timestamp: 2025-10-09T03:56:58.811Z
Learning: For the LocalCcShards class in include/cc/local_cc_shards.h: Writer locks (unique_lock) should continue using the original meta_data_mux_ (std::shared_mutex) rather than fast_meta_data_mux_ (FastMetaDataMutex) at this stage. Only reader locks may use the FastMetaDataMutex wrapper.
Applied to files:
include/cc/cc_shard.hinclude/cc/object_cc_map.hsrc/cc/cc_shard.cppinclude/cc/non_blocking_lock.hsrc/cc/non_blocking_lock.cpp.cursor_knowledge/standby-message-buffer-memory-bounded.md
🧬 Code graph analysis (4)
include/cc/cc_shard.h (1)
src/cc/cc_shard.cpp (10)
CcShard(58-200)CcShard(202-210)AddCandidateStandby(2541-2547)AddCandidateStandby(2541-2541)RemoveCandidateStandby(2549-2558)RemoveCandidateStandby(2549-2549)CheckAndFreeUnneededEntries(2560-2642)CheckAndFreeUnneededEntries(2560-2560)CalculateEntryMemorySize(2532-2539)CalculateEntryMemorySize(2532-2533)
src/cc/cc_shard.cpp (1)
include/cc/cc_shard.h (3)
standby_buffer_memory_limit_(1171-1171)total_standby_buffer_memory_usage_(1169-1169)next_forward_sequence_id_(1172-1172)
include/cc/non_blocking_lock.h (2)
include/cc/cc_entry.h (1)
SetForwardEntry(383-387)src/cc/non_blocking_lock.cpp (4)
SetForwardEntry(753-757)SetForwardEntry(753-754)ReleaseForwardEntry(764-768)ReleaseForwardEntry(765-765)
src/cc/non_blocking_lock.cpp (1)
include/cc/cc_entry.h (2)
SetForwardEntry(383-387)ForwardEntry(377-381)
🪛 markdownlint-cli2 (0.18.1)
.cursor_knowledge/standby-message-buffer-memory-bounded.md
50-50: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (26)
include/cc/cc_shard.h (3)
238-238: LGTM: Destructor properly manages new data structures.The destructor appropriately cleans up the memory-bounded queue, lookup map, and candidate tracking structures introduced in this PR.
955-957: LGTM: New candidate standby lifecycle APIs are well-designed.The public methods provide clear lifecycle management for candidate followers, preventing premature eviction of messages needed during the two-phase subscription process.
1160-1171: LGTM: Memory-bounded data structures are well-designed.The ownership model is clear (deque owns via unique_ptr, map holds raw pointers), and the memory tracking fields enable proper budgeting and eviction.
include/standby.h (1)
42-43: LGTM: Proper initialization for new ownership model.Setting
out_of_synctofalseandsequence_id_toUINT64_MAXprovides appropriate defaults for the exclusive-ownership lifecycle.src/cc/non_blocking_lock.cpp (3)
753-757: LGTM: Proper ownership transfer into member.The method correctly accepts a
unique_ptrand transfers ownership viastd::move.
759-762: LGTM: Non-owning accessor maintains compatibility.Returns a raw pointer for read access without affecting ownership, maintaining the existing public API surface.
764-768: LGTM: Proper ownership transfer to caller.The method correctly transfers ownership via
std::move, allowing the caller to take exclusive ownership of the entry.src/remote/cc_node_service.cpp (2)
1601-1656: LGTM: Robust rollback ensures atomic candidate tracking.The rollback logic correctly removes candidate standbys from all previously successful shards when a later shard errors, preventing inconsistent partial-subscription state.
1863-1866: LGTM: Proper candidate-to-subscriber transition with cleanup.The code correctly removes the candidate before subscribing. Note that
AddSubscribedStandbyinternally callsCheckAndFreeUnneededEntries()(line 973 of cc_shard.h), ensuring entries no longer needed by this candidate are freed.include/cc/object_cc_map.h (4)
873-877: Ownership transfer LGTMUsing std::unique_ptr for StandbyForwardEntry and moving into SetForwardEntry prevents leaks/double-frees.
1116-1118: Correct: release on Block/Unlock pathsReleasing and letting RAII free the forward entry on Block/Unlock avoids dangling buffers.
Please verify no downstream code relies on forward_entry after these branches.Also applies to: 1135-1137
1192-1195: Commit path forwarding LGTMReleaseForwardEntry + forward call correctly hands ownership to shard sender.
1434-1438: Abort path cleanup LGTMReleasing the forward entry on abort avoids stale per-key forwarding state.
include/cc/non_blocking_lock.h (4)
421-423: Reset uses unique_ptr correctly
forward_entry_.reset();is correct and consistent with the new ownership model.
566-568: New API surface for ownership transfer is goodSetForwardEntry(std::unique_ptr<...>) + ReleaseForwardEntry() provide clear ownership semantics.
594-596: Member type change LGTMStoring
std::unique_ptr<StandbyForwardEntry>aligns storage with ownership;ForwardEntry()exposes a raw observer.
39-39: Forward-declaration approach requires additional refactoring for full feasibilityThe suggestion to forward-declare
struct StandbyForwardEntry;and move the destructor out-of-line is sound in principle for reducing rebuild surface. However, theReset()method at lines 410-423 is inline and callsforward_entry_.reset()(line 421), which requiresStandbyForwardEntryto be a complete type. Simply implementing the suggested changes would fail to compile.To make this refactoring work,
Reset()would also need to be moved out-of-line into the .cpp file alongside the destructor. This adds complexity beyond the suggested scope. The current state (with#include "standby.h") works correctly. Evaluate whether the rebuild reduction justifies the refactoring effort of moving multiple methods out-of-line.src/cc/cc_shard.cpp (9)
202-211: Destructor is safe and clearProperly clears history and maps to release memory on shutdown.
221-226: Init path reset LGTMResets standby history state and counters; good for clean startup.
2541-2548: Candidate add API LGTMLightweight map update and trace; fine.
2549-2558: Candidate remove API LGTMErases safely with logging.
2560-2642: Free only unneeded entries: logic looks correctComputes min needed seq across subscribers and candidates; prunes from front accordingly. Good guard for empty state.
2696-2705: any_send_failed handling is correctProperly tracks send failures to decide retention.
2738-2784: Eviction + candidate check LGTMUses >= comparison to drop only candidates that truly fell behind; fixes earlier bug.
Please confirm this replaced the prior<=condition everywhere.
2801-2805: Initialize stream sender before retryGood defensive init.
2906-2907: Post-retry trimming is goodRuns CheckAndFreeUnneededEntries after retries to reclaim memory.
Replace the fixed-size vector-based standby message buffer with a memory-bounded queue that tracks memory usage and evicts oldest messages when the limit is reached. This change improves memory efficiency by allocating entries on-demand and freeing them immediately when no longer needed.
Fixes bug in #206
Summary by CodeRabbit
Refactor
New Features