-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Write Stalling System #1562
Conversation
Summary: Current write stalling system has the problem of lacking of positive feedback if the restricted rate is already too low. Users sometimes stack in very low slowdown value. With the diff, we add a positive feedback (increasing the slowdown value) if we recover from slowdown state back to normal. To avoid the positive feedback to keep the slowdown value to be to high, we add issue a negative feedback every time we are close to the stop condition. Experiments show it is easier to reach a relative balance than before. Test Plan: Run ./db_bench --benchmarks=fillrandom --num=10000000 --write_buffer_size=4000000 --level0_slowdown_writes_trigger=16 -max_write_buffer_number=8 --max_background_flushes=8 --level0_stop_writes_trigger=24 --max_bytes_for_level_base=10000000000 Before we'll stuck in very low slowdown value. Now we can reach a balance with a much higher slowdown value
@siying has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments inline. And random thought: maybe we can have a single formula taking the following into account:
- max_write_buffer_number and pending flush memtable
- level0_slowdown_writes_trigger and L0 files
- pending compaction bytes and soft/hard pending compaction bytes limit.
- user input delayed write rate
- for how long we have delayed.
and return a delayed write rate.
uint64_t write_rate = write_controller->delayed_write_rate(); | ||
write_rate = static_cast<uint64_t>(static_cast<double>(write_rate) * | ||
kSlowdownRatio); | ||
if (write_rate > write_controller->max_delayed_write_rate()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move the logic into WriteController::set_delayed_write_rate() ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
// If the DB recovers from delay conditions, we reward with reducing | ||
// double the slowdown ratio. This is to balance the long term slowdown | ||
// increase signal. | ||
if (needed_delay) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we recover only when we are not adding delay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here seems we resume the write rate regardless whether we call SetupDelay above. Why not just resume write rate only when SetupDelay is not called? I ask just to understand the logic here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we only call it if SetupDelay is NOT called. This is included in the else starting from 710.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, okay, I misread it.
// condition. | ||
write_rate = static_cast<uint64_t>( | ||
static_cast<double>(write_rate) / | ||
(kSlowdownRatio * kSlowdownRatio * kSlowdownRatio)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make it another constant? maybe it is not necessary to be kSlowdownRatio^3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
write_rate = kMinWriteRate; | ||
} | ||
} | ||
if (was_stopped || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we want to slowdown after we recover from stop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't reduce write rate when we fall into stop, because we don't determine delay value if stop condition hits. Here we pay the debt. Also I want to penalize going to stop condition. This is something we want to avoid.
@yiwu-arbug good suggestion. We should follow up with it later. I'm trying to addressing a concrete problem here. |
@siying definitely. |
@siying updated the pull request - view changes - changes since last import |
@siying updated the pull request - view changes - changes since last import |
Test failures are not related. Landing it. |
Summary:
Current write stalling system has the problem of lacking of positive feedback if the restricted rate is already too low. Users sometimes stack in very low slowdown value. With the diff, we add a positive feedback (increasing the slowdown value) if we recover from slowdown state back to normal. To avoid the positive feedback to keep the slowdown value to be to high, we add issue a negative feedback every time we are close to the stop condition. Experiments show it is easier to reach a relative balance than before.
Also increase level0_stop_writes_trigger default from 24 to 32. Since level0_slowdown_writes_trigger default is 20, stop trigger 24 only gives four files as the buffer time to slowdown writes. In order to avoid stop in four files while 20 files have been accumulated, the slowdown value must be very low, which is amost the same as stop. It also doesn't give enough time for the slowdown value to converge. Increase it to 32 will smooth out the system.
Test Plan: Run
./db_bench --benchmarks=fillrandom --num=10000000 --write_buffer_size=4000000 --level0_slowdown_writes_trigger=16 -max_write_buffer_number=8 --max_background_flushes=8 --level0_stop_writes_trigger=24 --max_bytes_for_level_base=10000000000
Before we'll stuck in very low slowdown value. Now we can reach a balance with a much higher slowdown value