Fix FindSliceKeys crashed for data_sync_vec empty#172
Conversation
WalkthroughAdded defensive empty-batch and post-merge guards: DataSyncForRangePartition now captures merged vector size, aborts and resets range-scan context when the merged Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant RangeCacheSender
Note right of RangeCacheSender #EEF2FF: FindSliceKeys (empty-batch guard)
Caller->>RangeCacheSender: FindSliceKeys(batch_records)
alt batch_records empty
RangeCacheSender-->>Caller: log warning and return
else batch_records non-empty
RangeCacheSender->>RangeCacheSender: compute slice keys
RangeCacheSender-->>Caller: continue flow
end
sequenceDiagram
participant Caller
participant DataSync
Note right of DataSync #F6FFF0: DataSyncForRangePartition (post-merge guard)
Caller->>DataSync: DataSyncForRangePartition()
DataSync->>DataSync: merge vectors -> data_sync_vec
DataSync->>DataSync: capture data_sync_vec_size
alt data_sync_vec empty
DataSync-->>Caller: log warning, reset range-scan context, abort
else not empty
DataSync->>DataSync: continue per-shard processing
DataSync-->>Caller: complete batch
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧠 Learnings (1)📚 Learning: 2025-10-09T03:56:58.811ZApplied to files:
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/cc/local_cc_shards.cpp (1)
3828-3887: Do notcontinuehere; it breaks the batching pipelineThis early
continueskips all of the clean-up that has to happen after each scan batch (scan_cc.Reset(), per-core vectorclear()s, and the flush quota bookkeeping). On the next loop iteration we keep the old scan buffers and outstanding quota, so we double-count records, leak the reserveddata_sync_mem_controller_budget, and eventually deadlock the worker. Please restructure this block so that we still run the normal reset/clear path (and release the quota) even when the merged batch becomes empty, instead of jumping straight to the next iteration.
src/cc/local_cc_shards.cpp
Outdated
| if (batch_records.empty()) | ||
| { | ||
| LOG(WARNING) | ||
| << "FindSliceKeys: Empty batch records. Skip sending cache."; | ||
| return; | ||
| } | ||
|
|
There was a problem hiding this comment.
Empty batch guard leaves stale slice state
When batch_records is empty we return without clearing batch_slice_key_vec_, yet AppendSliceDataRequest is still called right after this, so it works with stale slice keys (and even hits its assert). Make the empty-batch path reset the slice vector and ensure the caller skips AppendSliceDataRequest in that case.
🤖 Prompt for AI Agents
In src/cc/local_cc_shards.cpp around lines 6158-6164, the early return on empty
batch_records leaves batch_slice_key_vec_ populated which causes
AppendSliceDataRequest to operate on stale slice keys; clear
batch_slice_key_vec_ before returning and update the caller to skip calling
AppendSliceDataRequest when batch_records is empty (or check
batch_slice_key_vec_.empty() after this function) so no stale keys are used and
the assert is not triggered.
11d59ce to
9c1785e
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/cc/local_cc_shards.cpp (1)
6164-6169: Empty batch still leaves stale slice stateThis early return happens before you clear
batch_slice_key_vec_or resetpaused_pos_, and the caller still proceeds toAppendSliceDataRequest. That recreates the stale-slice scenario from the previous review: append runs with the prior batch’s keys/pause state and can still trigger the same assert. Please reset the per-batch state here and signal the caller to skip the append (e.g. return a boolean and gate the call). As it stands the original bug is still present.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/cc/local_cc_shards.cpp(3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-09T03:56:58.811Z
Learnt from: thweetkomputer
PR: eloqdata/tx_service#150
File: include/cc/local_cc_shards.h:626-631
Timestamp: 2025-10-09T03:56:58.811Z
Learning: For the LocalCcShards class in include/cc/local_cc_shards.h: Writer locks (unique_lock) should continue using the original meta_data_mux_ (std::shared_mutex) rather than fast_meta_data_mux_ (FastMetaDataMutex) at this stage. Only reader locks may use the FastMetaDataMutex wrapper.
Applied to files:
src/cc/local_cc_shards.cpp
| } | ||
|
|
||
| if (data_sync_vec->empty()) | ||
| { | ||
| LOG(WARNING) << "data_sync_vec becomes empty after erase, old " | ||
| "size of data_sync_vec_size: " | ||
| << data_sync_vec_size; | ||
| // Reset | ||
| scan_cc.Reset(); | ||
| continue; |
There was a problem hiding this comment.
Release flush quota before early continue
We reserve flush_data_size via AllocateFlushDataMemQuota(...) just above, but if this branch trips we continue without ever giving the quota back (no flush task is enqueued to do it later). After a few hits the controller stays pegged and subsequent scans block forever waiting for capacity. Please release the quota before the continue.
if (data_sync_vec->empty())
{
LOG(WARNING) << "data_sync_vec becomes empty after erase, old "
"size of data_sync_vec_size: "
<< data_sync_vec_size;
+ data_sync_mem_controller_.DeallocateFlushMemQuota(
+ flush_data_size);
// Reset
scan_cc.Reset();
continue;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| } | |
| if (data_sync_vec->empty()) | |
| { | |
| LOG(WARNING) << "data_sync_vec becomes empty after erase, old " | |
| "size of data_sync_vec_size: " | |
| << data_sync_vec_size; | |
| // Reset | |
| scan_cc.Reset(); | |
| continue; | |
| } | |
| if (data_sync_vec->empty()) | |
| { | |
| LOG(WARNING) << "data_sync_vec becomes empty after erase, old " | |
| "size of data_sync_vec_size: " | |
| << data_sync_vec_size; | |
| data_sync_mem_controller_.DeallocateFlushMemQuota( | |
| flush_data_size); | |
| // Reset | |
| scan_cc.Reset(); | |
| continue; |
Here are some reminders before you submit the pull request
fixes eloqdb/tx_service#issue_id./mtr --suite=mono_main,mono_multi,mono_basicSummary by CodeRabbit