Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(storage): issue upload task to uploader to avoid buffering batches in uploader #2185

Merged
merged 4 commits into from
Apr 29, 2022

Conversation

wenym1
Copy link
Contributor

@wenym1 wenym1 commented Apr 28, 2022

What's changed and what's your intention?

Currently, when we want to upload the shared buffer batches to S3, we need to first send Arc to the write batches to the shared buffer uploader, and then issue Upload commands to tell the uploader to compact the batches and upload the SSTs to S3. Therefore, besides shared buffer, we also store the shared buffer batches in the shared buffer uploader, and this leads to extra work to keep track of the memory usage, and this PR will remove this two-step upload logic, and instead only issue upload tasks, which include the write batches to upload, to the uploader, so that the uploader will not buffer any write batches.

In this PR, we mainly introduce the following change:

  • Change the upload logic to issue upload tasks to the uploader and avoid buffering batches in uploader.
  • Implement SharedBufferBatchInner, which acts as a guard to the shared buffer batch payload. Whenever the payload is being dropped, it will decrement a global memory usage counter by the batch size. In this way, the memory usage tracking gets easier.
  • During initialization of LocalVersionManager, we will pin a hummock version as the initial version, so that we can avoid using Option<LocalVersion> in LocalVersionManager.

Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests

Refer to a related PR or issue link (optional)

@codecov
Copy link

codecov bot commented Apr 28, 2022

Codecov Report

Merging #2185 (0947a81) into main (1332556) will increase coverage by 0.07%.
The diff coverage is 87.61%.

@@            Coverage Diff             @@
##             main    #2185      +/-   ##
==========================================
+ Coverage   71.02%   71.10%   +0.07%     
==========================================
  Files         657      657              
  Lines       83592    83860     +268     
==========================================
+ Hits        59373    59625     +252     
- Misses      24219    24235      +16     
Flag Coverage Δ
rust 71.10% <87.61%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/bench/ss_bench/operations/write_batch.rs 0.00% <0.00%> (ø)
src/storage/src/store.rs 100.00% <ø> (ø)
...rc/hummock/shared_buffer/shared_buffer_uploader.rs 81.73% <72.34%> (-5.77%) ⬇️
src/storage/src/hummock/state_store.rs 74.30% <83.33%> (+2.37%) ⬆️
src/storage/src/hummock/mod.rs 85.54% <85.71%> (-1.37%) ⬇️
src/storage/src/hummock/shared_buffer/mod.rs 90.18% <86.74%> (-4.56%) ⬇️
...e/src/hummock/shared_buffer/shared_buffer_batch.rs 94.16% <88.88%> (-1.47%) ⬇️
src/storage/src/hummock/local_version_manager.rs 89.84% <89.09%> (+0.79%) ⬆️
src/storage/src/hummock/compactor.rs 66.66% <100.00%> (ø)
src/storage/src/hummock/conflict_detector.rs 98.87% <100.00%> (+0.03%) ⬆️
... and 22 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

) -> HummockResult<usize> {
let sorted_items = Self::build_shared_buffer_item_batches(kv_pairs, epoch);

let batch_size = SharedBufferBatch::measure_batch_size(&sorted_items);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

measure_batch_size will be called again in SharedBufferBatch::new. Can we avoid the recalculation?

Ok(ssts) => {
guard.add_uncommitted_ssts(epoch, ssts);
if let Some(shared_buffer) = guard.get_shared_buffer(epoch) {
shared_buffer.write().success_upload_task(task_id);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: success_upload_task -> succeed_upload_task

@wenym1 wenym1 merged commit 14f8192 into main Apr 29, 2022
@wenym1 wenym1 deleted the yiming/refactor_buffer_uploader branch April 29, 2022 09:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants