feat(storage): support concurrent shared buffer flush #3289

wenym1 · 2022-06-17T03:41:50Z

What's changed and what's your intention?

Currently, when we write data to shared buffer and the memory usage has already reached the threshold, we will issue a flush task and wait until the upload task finished. The current design has the following problems. First, flush is only triggered when we reach the threshold. So memory usage is always near the threshold, and the write will be blocked frequently. Second, assume that we have blocked multiple write when we detect that the memory usage has reached the threshold. As soon as memory is freed by flush task, all of these write will be unblocked, and the memory usage will suddenly reach to much higher than the threshold, which may lead to OOM.

In this PR, we introduce a flush threshold and a block write threshold. When the memory reach the flush threshold, we will issue a flush task. Only when the memory reach the block write threshold, we block the write. We will spawn a buffer tracker worker thread, and all memory related event, including buffer added, buffer released, task added, task finished, start sync epoch, and epoch synced. Write will be allowed directly when the memory is below the lower flush threshold. When the memory reaches above the flush threshold, a write request will be sent to the buffer tracker worker through a channel. The write request is granted through sending event in a oneshot channel.

The block write threshold is the original buffer tracker capacity. The flush threshold is set to 0.8 * block write threshold.

Checklist

I have written necessary docs and comments
I have added necessary unit tests and integration tests
All checks passed in ./risedev check (or alias, ./risedev c)

Refer to a related PR or issue link (optional)

codecov · 2022-06-17T03:52:49Z

Codecov Report

Merging #3289 (5200fca) into main (321c259) will decrease coverage by 0.01%.
The diff coverage is 67.05%.

@@            Coverage Diff             @@
##             main    #3289      +/-   ##
==========================================
- Coverage   74.40%   74.39%   -0.02%     
==========================================
  Files         768      768              
  Lines      107647   107784     +137     
==========================================
+ Hits        80098    80183      +85     
- Misses      27549    27601      +52

Flag	Coverage Δ
rust	`74.39% <67.05%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/storage/src/hummock/local_version_manager.rs	`79.27% <59.34%> (-5.27%)`	⬇️
src/storage/src/hummock/shared_buffer/mod.rs	`90.58% <94.23%> (+0.13%)`	⬆️
src/storage/src/hummock/iterator/test_utils.rs	`97.53% <100.00%> (+0.09%)`	⬆️
src/storage/src/hummock/local_version.rs	`98.91% <100.00%> (-0.06%)`	⬇️
...e/src/hummock/shared_buffer/shared_buffer_batch.rs	`93.35% <100.00%> (-0.26%)`	⬇️
...rc/hummock/shared_buffer/shared_buffer_uploader.rs	`88.37% <100.00%> (+0.26%)`	⬆️
src/connector/src/filesystem/file_common.rs	`80.35% <0.00%> (-0.45%)`	⬇️
src/frontend/src/expr/utils.rs	`98.99% <0.00%> (-0.26%)`	⬇️
src/common/src/types/ordered_float.rs	`24.70% <0.00%> (-0.20%)`	⬇️
... and 3 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

…red-buffer-flush

hzxa21 · 2022-06-21T02:37:33Z

src/storage/src/hummock/local_version_manager.rs

+                match event {
+                    SharedBufferEvent::WriteRequest(size, sender) => {
+                        if local_version_manager.buffer_tracker.can_write() {
+                            grant_write_request(size, sender);


IIUC, writes above the write_threshold will trigger a flush with the following steps:

Caller sends SharedBufferEvent::WriteRequest to buffer tracker.

Buffer tracker checks can_write and notifies caller.

Caller adds a batch to shared buffer.

Caller sends SharedBufferEvent::BatchAdd to buffer tracker.

Buffer tracker finally triggers the flush.

This process looks too cumbersome to me. Can we send the batch along with SharedBufferEvent::WriteRequest and 1) triggers a flush immediately if can_write() == true or 2) put the batch in pending_write_requests for future flushes? In this case, we can simplify the flushing process and remove one round of message passing.

hzxa21 · 2022-06-21T03:15:13Z

src/storage/src/hummock/local_version_manager.rs

+                        // An upload task has finished. There may be possibility that we need a new
+                        // upload task.


Can you explain more on why we need a new upload task without waiting for buffer release? Writes only look at buffer size, not buffer size - upload task size.

hzxa21

LGTM. You can run a bench to verify whether the two threshold take effect before merging.

Little-Wallace · 2022-06-23T11:15:28Z

Good job. Is there any test could show that hummock could flush writebatch by itself without block write_batch ?

Little-Wallace · 2022-06-23T11:18:24Z

src/storage/src/hummock/local_version_manager.rs

+                            .global_buffer_size
+                            .fetch_sub(size, Relaxed);
+                        while !pending_write_requests.is_empty()
+                            && local_version_manager.buffer_tracker.can_write()


Why we must check can_write before trigger a new flush task? If the memory exceeds the block threshold, we can not flush data, can we?

can_write is checked to decide whether to grant the pending write request stored in pending_write_requests. When a write request is granted and new data is written to shared buffer, the global_buffer_size may exceed the threshold again and we may trigger the flush.

When memory exceeds block threshold, we need flush data to free some memory space to unblock the write.

wenym1 · 2022-06-27T07:22:55Z

You can run a bench to verify whether the two threshold take effect before merging.

I just ran a benchmark. I set the threshold to very small (40MB) and added some logs. The system runs well and the logs show that the async flush and write blocking is taking effects.

The state between epochs is not too large, which is about 40 MB per epoch and is not the main memory usage, so while running the benchmark, the memory usage did not significantly drop compared to main.

wenym1 · 2022-06-27T11:46:14Z

Is there any test could show that hummock could flush writebatch by itself without block write_batch?

Not yet. I will add it in later PR.

The bench mentioned above shows that as long as the memory usage has not reached block write threshold, the batch can be flushed without being blocked. So the current implementation seems correct.

wenym1 added 4 commits June 15, 2022 01:06

feat(storage): support concurrent shared buffer flush

80e50f0

fix concurrency bug

6465edf

add high and low threshol

fa34660

Merge branch 'main' into yiming/concurrent-shared-buffer-flush

c5e40cd

github-actions bot added the type/feature label Jun 17, 2022

wenym1 marked this pull request as ready for review June 17, 2022 04:00

wenym1 requested review from hzxa21 and Little-Wallace June 17, 2022 04:00

Merge remote-tracking branch 'origin/main' into yiming/concurrent-sha…

185d7aa

…red-buffer-flush

hzxa21 reviewed Jun 21, 2022

View reviewed changes

wenym1 added 4 commits June 21, 2022 15:20

keep track of global upload task size in shared buffer

b284ad9

Merge branch 'main' into yiming/concurrent-shared-buffer-flush

12622de

reduce some shared buffer event; write shared buffer in buffer tracker

e6a11c4

remove unused import

00d7513

hzxa21 approved these changes Jun 23, 2022

View reviewed changes

Little-Wallace reviewed Jun 23, 2022

View reviewed changes

wenym1 added 2 commits June 27, 2022 13:31

refactor and add log

ea8a34e

fix bug

79fc93e

wenym1 added the mergify/can-merge Indicates that the PR can be added to the merge queue label Jun 27, 2022

Merge branch 'main' into yiming/concurrent-shared-buffer-flush

5200fca

mergify bot merged commit 3a9e9ae into main Jun 27, 2022

mergify bot deleted the yiming/concurrent-shared-buffer-flush branch June 27, 2022 11:59

wenym1 mentioned this pull request Jun 27, 2022

feat(compaction): monitor current s3 uploading size #3469

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(storage): support concurrent shared buffer flush #3289

feat(storage): support concurrent shared buffer flush #3289

wenym1 commented Jun 17, 2022 •

edited

codecov bot commented Jun 17, 2022 •

edited

hzxa21 Jun 21, 2022

hzxa21 Jun 21, 2022

hzxa21 left a comment

Little-Wallace commented Jun 23, 2022

Little-Wallace Jun 23, 2022

wenym1 Jun 24, 2022

wenym1 commented Jun 27, 2022

wenym1 commented Jun 27, 2022

		// An upload task has finished. There may be possibility that we need a new
		// upload task.

feat(storage): support concurrent shared buffer flush #3289

feat(storage): support concurrent shared buffer flush #3289

Conversation

wenym1 commented Jun 17, 2022 • edited

What's changed and what's your intention?

Checklist

Refer to a related PR or issue link (optional)

codecov bot commented Jun 17, 2022 • edited

Codecov Report

hzxa21 Jun 21, 2022

Choose a reason for hiding this comment

hzxa21 Jun 21, 2022

Choose a reason for hiding this comment

hzxa21 left a comment

Choose a reason for hiding this comment

Little-Wallace commented Jun 23, 2022

Little-Wallace Jun 23, 2022

Choose a reason for hiding this comment

wenym1 Jun 24, 2022

Choose a reason for hiding this comment

wenym1 commented Jun 27, 2022

wenym1 commented Jun 27, 2022

wenym1 commented Jun 17, 2022 •

edited

codecov bot commented Jun 17, 2022 •

edited