Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(storage): support concurrent shared buffer flush #3289

Merged
merged 12 commits into from
Jun 27, 2022

Conversation

wenym1
Copy link
Contributor

@wenym1 wenym1 commented Jun 17, 2022

What's changed and what's your intention?

Currently, when we write data to shared buffer and the memory usage has already reached the threshold, we will issue a flush task and wait until the upload task finished. The current design has the following problems. First, flush is only triggered when we reach the threshold. So memory usage is always near the threshold, and the write will be blocked frequently. Second, assume that we have blocked multiple write when we detect that the memory usage has reached the threshold. As soon as memory is freed by flush task, all of these write will be unblocked, and the memory usage will suddenly reach to much higher than the threshold, which may lead to OOM.

In this PR, we introduce a flush threshold and a block write threshold. When the memory reach the flush threshold, we will issue a flush task. Only when the memory reach the block write threshold, we block the write. We will spawn a buffer tracker worker thread, and all memory related event, including buffer added, buffer released, task added, task finished, start sync epoch, and epoch synced. Write will be allowed directly when the memory is below the lower flush threshold. When the memory reaches above the flush threshold, a write request will be sent to the buffer tracker worker through a channel. The write request is granted through sending event in a oneshot channel.

The block write threshold is the original buffer tracker capacity. The flush threshold is set to 0.8 * block write threshold.

Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

Refer to a related PR or issue link (optional)

@codecov
Copy link

codecov bot commented Jun 17, 2022

Codecov Report

Merging #3289 (5200fca) into main (321c259) will decrease coverage by 0.01%.
The diff coverage is 67.05%.

@@            Coverage Diff             @@
##             main    #3289      +/-   ##
==========================================
- Coverage   74.40%   74.39%   -0.02%     
==========================================
  Files         768      768              
  Lines      107647   107784     +137     
==========================================
+ Hits        80098    80183      +85     
- Misses      27549    27601      +52     
Flag Coverage Δ
rust 74.39% <67.05%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/storage/src/hummock/local_version_manager.rs 79.27% <59.34%> (-5.27%) ⬇️
src/storage/src/hummock/shared_buffer/mod.rs 90.58% <94.23%> (+0.13%) ⬆️
src/storage/src/hummock/iterator/test_utils.rs 97.53% <100.00%> (+0.09%) ⬆️
src/storage/src/hummock/local_version.rs 98.91% <100.00%> (-0.06%) ⬇️
...e/src/hummock/shared_buffer/shared_buffer_batch.rs 93.35% <100.00%> (-0.26%) ⬇️
...rc/hummock/shared_buffer/shared_buffer_uploader.rs 88.37% <100.00%> (+0.26%) ⬆️
src/connector/src/filesystem/file_common.rs 80.35% <0.00%> (-0.45%) ⬇️
src/frontend/src/expr/utils.rs 98.99% <0.00%> (-0.26%) ⬇️
src/common/src/types/ordered_float.rs 24.70% <0.00%> (-0.20%) ⬇️
... and 3 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@wenym1 wenym1 marked this pull request as ready for review June 17, 2022 04:00
match event {
SharedBufferEvent::WriteRequest(size, sender) => {
if local_version_manager.buffer_tracker.can_write() {
grant_write_request(size, sender);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, writes above the write_threshold will trigger a flush with the following steps:

  1. Caller sends SharedBufferEvent::WriteRequest to buffer tracker.
  2. Buffer tracker checks can_write and notifies caller.
  3. Caller adds a batch to shared buffer.
  4. Caller sends SharedBufferEvent::BatchAdd to buffer tracker.
  5. Buffer tracker finally triggers the flush.

This process looks too cumbersome to me. Can we send the batch along with SharedBufferEvent::WriteRequest and 1) triggers a flush immediately if can_write() == true or 2) put the batch in pending_write_requests for future flushes? In this case, we can simplify the flushing process and remove one round of message passing.

Comment on lines 685 to 686
// An upload task has finished. There may be possibility that we need a new
// upload task.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain more on why we need a new upload task without waiting for buffer release? Writes only look at buffer size, not buffer size - upload task size.

Copy link
Collaborator

@hzxa21 hzxa21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. You can run a bench to verify whether the two threshold take effect before merging.

@Little-Wallace
Copy link
Contributor

Good job. Is there any test could show that hummock could flush writebatch by itself without block write_batch ?

.global_buffer_size
.fetch_sub(size, Relaxed);
while !pending_write_requests.is_empty()
&& local_version_manager.buffer_tracker.can_write()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we must check can_write before trigger a new flush task? If the memory exceeds the block threshold, we can not flush data, can we?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can_write is checked to decide whether to grant the pending write request stored in pending_write_requests. When a write request is granted and new data is written to shared buffer, the global_buffer_size may exceed the threshold again and we may trigger the flush.

When memory exceeds block threshold, we need flush data to free some memory space to unblock the write.

@wenym1
Copy link
Contributor Author

wenym1 commented Jun 27, 2022

You can run a bench to verify whether the two threshold take effect before merging.

I just ran a benchmark. I set the threshold to very small (40MB) and added some logs. The system runs well and the logs show that the async flush and write blocking is taking effects.

The state between epochs is not too large, which is about 40 MB per epoch and is not the main memory usage, so while running the benchmark, the memory usage did not significantly drop compared to main.

@wenym1
Copy link
Contributor Author

wenym1 commented Jun 27, 2022

Is there any test could show that hummock could flush writebatch by itself without block write_batch?

Not yet. I will add it in later PR.

The bench mentioned above shows that as long as the memory usage has not reached block write threshold, the batch can be flushed without being blocked. So the current implementation seems correct.

@wenym1 wenym1 added the mergify/can-merge Indicates that the PR can be added to the merge queue label Jun 27, 2022
@mergify mergify bot merged commit 3a9e9ae into main Jun 27, 2022
@mergify mergify bot deleted the yiming/concurrent-shared-buffer-flush branch June 27, 2022 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mergify/can-merge Indicates that the PR can be added to the merge queue type/feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants