Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Write Stalling System #1562

Closed
wants to merge 3 commits into from
Closed

Conversation

siying
Copy link
Contributor

@siying siying commented Nov 22, 2016

Summary:
Current write stalling system has the problem of lacking of positive feedback if the restricted rate is already too low. Users sometimes stack in very low slowdown value. With the diff, we add a positive feedback (increasing the slowdown value) if we recover from slowdown state back to normal. To avoid the positive feedback to keep the slowdown value to be to high, we add issue a negative feedback every time we are close to the stop condition. Experiments show it is easier to reach a relative balance than before.

Also increase level0_stop_writes_trigger default from 24 to 32. Since level0_slowdown_writes_trigger default is 20, stop trigger 24 only gives four files as the buffer time to slowdown writes. In order to avoid stop in four files while 20 files have been accumulated, the slowdown value must be very low, which is amost the same as stop. It also doesn't give enough time for the slowdown value to converge. Increase it to 32 will smooth out the system.

Test Plan: Run

./db_bench --benchmarks=fillrandom --num=10000000 --write_buffer_size=4000000 --level0_slowdown_writes_trigger=16 -max_write_buffer_number=8 --max_background_flushes=8 --level0_stop_writes_trigger=24 --max_bytes_for_level_base=10000000000

Before we'll stuck in very low slowdown value. Now we can reach a balance with a much higher slowdown value

Summary:
Current write stalling system has the problem of lacking of positive feedback if the restricted rate is already too low. Users sometimes stack in very low slowdown value. With the diff, we add a positive feedback (increasing the slowdown value) if we recover from slowdown state back to normal. To avoid the positive feedback to keep the slowdown value to be to high, we add issue a negative feedback every time we are close to the stop condition. Experiments show it is easier to reach a relative balance than before.

Test Plan: Run

./db_bench --benchmarks=fillrandom --num=10000000 --write_buffer_size=4000000 --level0_slowdown_writes_trigger=16 -max_write_buffer_number=8 --max_background_flushes=8 --level0_stop_writes_trigger=24 --max_bytes_for_level_base=10000000000

Before we'll stuck in very low slowdown value. Now we can reach a balance with a much higher slowdown value
@facebook-github-bot
Copy link
Contributor

@siying has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@yiwu-arbug yiwu-arbug self-assigned this Nov 22, 2016
Copy link
Contributor

@yiwu-arbug yiwu-arbug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments inline. And random thought: maybe we can have a single formula taking the following into account:

  • max_write_buffer_number and pending flush memtable
  • level0_slowdown_writes_trigger and L0 files
  • pending compaction bytes and soft/hard pending compaction bytes limit.
  • user input delayed write rate
  • for how long we have delayed.
    and return a delayed write rate.

uint64_t write_rate = write_controller->delayed_write_rate();
write_rate = static_cast<uint64_t>(static_cast<double>(write_rate) *
kSlowdownRatio);
if (write_rate > write_controller->max_delayed_write_rate()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move the logic into WriteController::set_delayed_write_rate() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

// If the DB recovers from delay conditions, we reward with reducing
// double the slowdown ratio. This is to balance the long term slowdown
// increase signal.
if (needed_delay) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we recover only when we are not adding delay?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the question.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here seems we resume the write rate regardless whether we call SetupDelay above. Why not just resume write rate only when SetupDelay is not called? I ask just to understand the logic here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we only call it if SetupDelay is NOT called. This is included in the else starting from 710.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, okay, I misread it.

// condition.
write_rate = static_cast<uint64_t>(
static_cast<double>(write_rate) /
(kSlowdownRatio * kSlowdownRatio * kSlowdownRatio));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make it another constant? maybe it is not necessary to be kSlowdownRatio^3.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

write_rate = kMinWriteRate;
}
}
if (was_stopped ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we want to slowdown after we recover from stop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't reduce write rate when we fall into stop, because we don't determine delay value if stop condition hits. Here we pay the debt. Also I want to penalize going to stop condition. This is something we want to avoid.

@siying
Copy link
Contributor Author

siying commented Nov 22, 2016

@yiwu-arbug good suggestion. We should follow up with it later. I'm trying to addressing a concrete problem here.

@yiwu-arbug
Copy link
Contributor

@siying definitely.

@facebook-github-bot
Copy link
Contributor

@siying updated the pull request - view changes - changes since last import

@facebook-github-bot
Copy link
Contributor

@siying updated the pull request - view changes - changes since last import

@siying
Copy link
Contributor Author

siying commented Nov 23, 2016

Test failures are not related. Landing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants