refactor(storage): issue upload task to uploader to avoid buffering batches in uploader #2185

wenym1 · 2022-04-28T03:01:03Z

What's changed and what's your intention?

Currently, when we want to upload the shared buffer batches to S3, we need to first send Arc to the write batches to the shared buffer uploader, and then issue Upload commands to tell the uploader to compact the batches and upload the SSTs to S3. Therefore, besides shared buffer, we also store the shared buffer batches in the shared buffer uploader, and this leads to extra work to keep track of the memory usage, and this PR will remove this two-step upload logic, and instead only issue upload tasks, which include the write batches to upload, to the uploader, so that the uploader will not buffer any write batches.

In this PR, we mainly introduce the following change:

Change the upload logic to issue upload tasks to the uploader and avoid buffering batches in uploader.
Implement SharedBufferBatchInner, which acts as a guard to the shared buffer batch payload. Whenever the payload is being dropped, it will decrement a global memory usage counter by the batch size. In this way, the memory usage tracking gets easier.
During initialization of LocalVersionManager, we will pin a hummock version as the initial version, so that we can avoid using Option<LocalVersion> in LocalVersionManager.

Checklist

I have written necessary docs and comments
I have added necessary unit tests and integration tests

Refer to a related PR or issue link (optional)

…tches in uploader

codecov · 2022-04-28T03:09:37Z

Codecov Report

Merging #2185 (0947a81) into main (1332556) will increase coverage by 0.07%.
The diff coverage is 87.61%.

@@            Coverage Diff             @@
##             main    #2185      +/-   ##
==========================================
+ Coverage   71.02%   71.10%   +0.07%     
==========================================
  Files         657      657              
  Lines       83592    83860     +268     
==========================================
+ Hits        59373    59625     +252     
- Misses      24219    24235      +16

Flag	Coverage Δ
rust	`71.10% <87.61%> (+0.07%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/bench/ss_bench/operations/write_batch.rs	`0.00% <0.00%> (ø)`
src/storage/src/store.rs	`100.00% <ø> (ø)`
...rc/hummock/shared_buffer/shared_buffer_uploader.rs	`81.73% <72.34%> (-5.77%)`	⬇️
src/storage/src/hummock/state_store.rs	`74.30% <83.33%> (+2.37%)`	⬆️
src/storage/src/hummock/mod.rs	`85.54% <85.71%> (-1.37%)`	⬇️
src/storage/src/hummock/shared_buffer/mod.rs	`90.18% <86.74%> (-4.56%)`	⬇️
...e/src/hummock/shared_buffer/shared_buffer_batch.rs	`94.16% <88.88%> (-1.47%)`	⬇️
src/storage/src/hummock/local_version_manager.rs	`89.84% <89.09%> (+0.79%)`	⬆️
src/storage/src/hummock/compactor.rs	`66.66% <100.00%> (ø)`
src/storage/src/hummock/conflict_detector.rs	`98.87% <100.00%> (+0.03%)`	⬆️
... and 22 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

hzxa21 · 2022-04-29T05:45:53Z

src/storage/src/hummock/local_version_manager.rs

+    ) -> HummockResult<usize> {
+        let sorted_items = Self::build_shared_buffer_item_batches(kv_pairs, epoch);
+
+        let batch_size = SharedBufferBatch::measure_batch_size(&sorted_items);


measure_batch_size will be called again in SharedBufferBatch::new. Can we avoid the recalculation?

hzxa21 · 2022-04-29T05:51:13Z

src/storage/src/hummock/local_version_manager.rs

+                    Ok(ssts) => {
+                        guard.add_uncommitted_ssts(epoch, ssts);
+                        if let Some(shared_buffer) = guard.get_shared_buffer(epoch) {
+                            shared_buffer.write().success_upload_task(task_id);


nits: success_upload_task -> succeed_upload_task

refactor(storage): issue upload task to uploader to avoid tracking ba…

c4feb72

…tches in uploader

wenym1 requested review from hzxa21 and MrCroxx April 28, 2022 03:01

github-actions bot added the type/refactor label Apr 28, 2022

fix some typo and naming

f3b7cf7

hzxa21 reviewed Apr 29, 2022

View reviewed changes

modify according to PR comment

3cbbce8

hzxa21 approved these changes Apr 29, 2022

View reviewed changes

Merge branch 'main' into yiming/refactor_buffer_uploader

0947a81

wenym1 merged commit 14f8192 into main Apr 29, 2022

wenym1 deleted the yiming/refactor_buffer_uploader branch April 29, 2022 09:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(storage): issue upload task to uploader to avoid buffering batches in uploader #2185

refactor(storage): issue upload task to uploader to avoid buffering batches in uploader #2185

wenym1 commented Apr 28, 2022

codecov bot commented Apr 28, 2022 •

edited

hzxa21 Apr 29, 2022

hzxa21 Apr 29, 2022

refactor(storage): issue upload task to uploader to avoid buffering batches in uploader #2185

refactor(storage): issue upload task to uploader to avoid buffering batches in uploader #2185

Conversation

wenym1 commented Apr 28, 2022

What's changed and what's your intention?

Checklist

Refer to a related PR or issue link (optional)

codecov bot commented Apr 28, 2022 • edited

Codecov Report

hzxa21 Apr 29, 2022

Choose a reason for hiding this comment

hzxa21 Apr 29, 2022

Choose a reason for hiding this comment

codecov bot commented Apr 28, 2022 •

edited