-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(storage): issue upload task to uploader to avoid buffering batches in uploader #2185
Conversation
…tches in uploader
Codecov Report
@@ Coverage Diff @@
## main #2185 +/- ##
==========================================
+ Coverage 71.02% 71.10% +0.07%
==========================================
Files 657 657
Lines 83592 83860 +268
==========================================
+ Hits 59373 59625 +252
- Misses 24219 24235 +16
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
) -> HummockResult<usize> { | ||
let sorted_items = Self::build_shared_buffer_item_batches(kv_pairs, epoch); | ||
|
||
let batch_size = SharedBufferBatch::measure_batch_size(&sorted_items); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
measure_batch_size
will be called again in SharedBufferBatch::new
. Can we avoid the recalculation?
Ok(ssts) => { | ||
guard.add_uncommitted_ssts(epoch, ssts); | ||
if let Some(shared_buffer) = guard.get_shared_buffer(epoch) { | ||
shared_buffer.write().success_upload_task(task_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits: success_upload_task
-> succeed_upload_task
What's changed and what's your intention?
Currently, when we want to upload the shared buffer batches to S3, we need to first send
Arc
to the write batches to the shared buffer uploader, and then issueUpload
commands to tell the uploader to compact the batches and upload the SSTs to S3. Therefore, besides shared buffer, we also store the shared buffer batches in the shared buffer uploader, and this leads to extra work to keep track of the memory usage, and this PR will remove this two-step upload logic, and instead only issue upload tasks, which include the write batches to upload, to the uploader, so that the uploader will not buffer any write batches.In this PR, we mainly introduce the following change:
SharedBufferBatchInner
, which acts as a guard to the shared buffer batch payload. Whenever the payload is being dropped, it will decrement a global memory usage counter by the batch size. In this way, the memory usage tracking gets easier.LocalVersionManager
, we will pin a hummock version as the initial version, so that we can avoid usingOption<LocalVersion>
inLocalVersionManager
.Checklist
Refer to a related PR or issue link (optional)