-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(storage): support concurrent shared buffer flush #3289
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3289 +/- ##
==========================================
- Coverage 74.40% 74.39% -0.02%
==========================================
Files 768 768
Lines 107647 107784 +137
==========================================
+ Hits 80098 80183 +85
- Misses 27549 27601 +52
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
match event { | ||
SharedBufferEvent::WriteRequest(size, sender) => { | ||
if local_version_manager.buffer_tracker.can_write() { | ||
grant_write_request(size, sender); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, writes above the write_threshold will trigger a flush with the following steps:
- Caller sends
SharedBufferEvent::WriteRequest
to buffer tracker. - Buffer tracker checks
can_write
and notifies caller. - Caller adds a batch to shared buffer.
- Caller sends
SharedBufferEvent::BatchAdd
to buffer tracker. - Buffer tracker finally triggers the flush.
This process looks too cumbersome to me. Can we send the batch along with SharedBufferEvent::WriteRequest
and 1) triggers a flush immediately if can_write() == true
or 2) put the batch in pending_write_requests
for future flushes? In this case, we can simplify the flushing process and remove one round of message passing.
// An upload task has finished. There may be possibility that we need a new | ||
// upload task. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain more on why we need a new upload task without waiting for buffer release? Writes only look at buffer size, not buffer size - upload task size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. You can run a bench to verify whether the two threshold take effect before merging.
Good job. Is there any test could show that hummock could flush writebatch by itself without block |
.global_buffer_size | ||
.fetch_sub(size, Relaxed); | ||
while !pending_write_requests.is_empty() | ||
&& local_version_manager.buffer_tracker.can_write() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we must check can_write
before trigger a new flush task? If the memory exceeds the block threshold, we can not flush data, can we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can_write
is checked to decide whether to grant the pending write request stored in pending_write_requests
. When a write request is granted and new data is written to shared buffer, the global_buffer_size
may exceed the threshold again and we may trigger the flush.
When memory exceeds block threshold, we need flush data to free some memory space to unblock the write.
I just ran a benchmark. I set the threshold to very small (40MB) and added some logs. The system runs well and the logs show that the async flush and write blocking is taking effects. The state between epochs is not too large, which is about 40 MB per epoch and is not the main memory usage, so while running the benchmark, the memory usage did not significantly drop compared to main. |
Not yet. I will add it in later PR. The bench mentioned above shows that as long as the memory usage has not reached block write threshold, the batch can be flushed without being blocked. So the current implementation seems correct. |
What's changed and what's your intention?
Currently, when we write data to shared buffer and the memory usage has already reached the threshold, we will issue a flush task and wait until the upload task finished. The current design has the following problems. First, flush is only triggered when we reach the threshold. So memory usage is always near the threshold, and the write will be blocked frequently. Second, assume that we have blocked multiple write when we detect that the memory usage has reached the threshold. As soon as memory is freed by flush task, all of these write will be unblocked, and the memory usage will suddenly reach to much higher than the threshold, which may lead to OOM.
In this PR, we introduce a flush threshold and a block write threshold. When the memory reach the flush threshold, we will issue a flush task. Only when the memory reach the block write threshold, we block the write. We will spawn a buffer tracker worker thread, and all memory related event, including buffer added, buffer released, task added, task finished, start sync epoch, and epoch synced. Write will be allowed directly when the memory is below the lower flush threshold. When the memory reaches above the flush threshold, a write request will be sent to the buffer tracker worker through a channel. The write request is granted through sending event in a oneshot channel.
The block write threshold is the original buffer tracker capacity. The flush threshold is set to 0.8 * block write threshold.
Checklist
./risedev check
(or alias,./risedev c
)Refer to a related PR or issue link (optional)