Improve Write Stalling System #1562

siying · 2016-11-22T02:42:27Z

Summary:
Current write stalling system has the problem of lacking of positive feedback if the restricted rate is already too low. Users sometimes stack in very low slowdown value. With the diff, we add a positive feedback (increasing the slowdown value) if we recover from slowdown state back to normal. To avoid the positive feedback to keep the slowdown value to be to high, we add issue a negative feedback every time we are close to the stop condition. Experiments show it is easier to reach a relative balance than before.

Also increase level0_stop_writes_trigger default from 24 to 32. Since level0_slowdown_writes_trigger default is 20, stop trigger 24 only gives four files as the buffer time to slowdown writes. In order to avoid stop in four files while 20 files have been accumulated, the slowdown value must be very low, which is amost the same as stop. It also doesn't give enough time for the slowdown value to converge. Increase it to 32 will smooth out the system.

Test Plan: Run

./db_bench --benchmarks=fillrandom --num=10000000 --write_buffer_size=4000000 --level0_slowdown_writes_trigger=16 -max_write_buffer_number=8 --max_background_flushes=8 --level0_stop_writes_trigger=24 --max_bytes_for_level_base=10000000000

Before we'll stuck in very low slowdown value. Now we can reach a balance with a much higher slowdown value

Summary: Current write stalling system has the problem of lacking of positive feedback if the restricted rate is already too low. Users sometimes stack in very low slowdown value. With the diff, we add a positive feedback (increasing the slowdown value) if we recover from slowdown state back to normal. To avoid the positive feedback to keep the slowdown value to be to high, we add issue a negative feedback every time we are close to the stop condition. Experiments show it is easier to reach a relative balance than before. Test Plan: Run ./db_bench --benchmarks=fillrandom --num=10000000 --write_buffer_size=4000000 --level0_slowdown_writes_trigger=16 -max_write_buffer_number=8 --max_background_flushes=8 --level0_stop_writes_trigger=24 --max_bytes_for_level_base=10000000000 Before we'll stuck in very low slowdown value. Now we can reach a balance with a much higher slowdown value

facebook-github-bot · 2016-11-22T02:42:49Z

@siying has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

yiwu-arbug

Some comments inline. And random thought: maybe we can have a single formula taking the following into account:

max_write_buffer_number and pending flush memtable
level0_slowdown_writes_trigger and L0 files
pending compaction bytes and soft/hard pending compaction bytes limit.
user input delayed write rate
for how long we have delayed.
and return a delayed write rate.

yiwu-arbug · 2016-11-22T19:10:39Z

db/column_family.cc

+        uint64_t write_rate = write_controller->delayed_write_rate();
+        write_rate = static_cast<uint64_t>(static_cast<double>(write_rate) *
+                                           kSlowdownRatio);
+        if (write_rate > write_controller->max_delayed_write_rate()) {


move the logic into WriteController::set_delayed_write_rate() ?

yiwu-arbug · 2016-11-22T19:11:16Z

db/column_family.cc

+      // If the DB recovers from delay conditions, we reward with reducing
+      // double the slowdown ratio. This is to balance the long term slowdown
+      // increase signal.
+      if (needed_delay) {


should we recover only when we are not adding delay?

I don't understand the question.

Here seems we resume the write rate regardless whether we call SetupDelay above. Why not just resume write rate only when SetupDelay is not called? I ask just to understand the logic here.

Here we only call it if SetupDelay is NOT called. This is included in the else starting from 710.

ah, okay, I misread it.

yiwu-arbug · 2016-11-22T19:14:49Z

db/column_family.cc

+      // condition.
+      write_rate = static_cast<uint64_t>(
+          static_cast<double>(write_rate) /
+          (kSlowdownRatio * kSlowdownRatio * kSlowdownRatio));


make it another constant? maybe it is not necessary to be kSlowdownRatio^3.

yiwu-arbug · 2016-11-22T19:19:13Z

db/column_family.cc

+        write_rate = kMinWriteRate;
+      }
+    }
+    if (was_stopped ||


why we want to slowdown after we recover from stop?

We don't reduce write rate when we fall into stop, because we don't determine delay value if stop condition hits. Here we pay the debt. Also I want to penalize going to stop condition. This is something we want to avoid.

siying · 2016-11-22T22:10:35Z

@yiwu-arbug good suggestion. We should follow up with it later. I'm trying to addressing a concrete problem here.

yiwu-arbug · 2016-11-22T22:31:45Z

@siying definitely.

facebook-github-bot · 2016-11-22T22:44:07Z

@siying updated the pull request - view changes - changes since last import

facebook-github-bot · 2016-11-23T01:14:59Z

@siying updated the pull request - view changes - changes since last import

siying · 2016-11-23T17:16:51Z

Test failures are not related. Landing it.

yiwu-arbug self-assigned this Nov 22, 2016

yiwu-arbug reviewed Nov 22, 2016

View reviewed changes

Address comment and increase default stop L0 trigger to 32

90ea859

yiwu-arbug approved these changes Nov 22, 2016

View reviewed changes

Fix unit tests

9e8f563

facebook-github-bot closed this in cd7c414 Nov 23, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Write Stalling System #1562

Improve Write Stalling System #1562

siying commented Nov 22, 2016 •

edited

Loading

facebook-github-bot commented Nov 22, 2016

yiwu-arbug left a comment •

edited

Loading

yiwu-arbug Nov 22, 2016

siying Nov 22, 2016

yiwu-arbug Nov 22, 2016

siying Nov 22, 2016

yiwu-arbug Nov 22, 2016

siying Nov 22, 2016

yiwu-arbug Nov 23, 2016

yiwu-arbug Nov 22, 2016

siying Nov 22, 2016

yiwu-arbug Nov 22, 2016

siying Nov 22, 2016

siying commented Nov 22, 2016

yiwu-arbug commented Nov 22, 2016

facebook-github-bot commented Nov 22, 2016

facebook-github-bot commented Nov 23, 2016

siying commented Nov 23, 2016

Improve Write Stalling System #1562

Improve Write Stalling System #1562

Conversation

siying commented Nov 22, 2016 • edited Loading

facebook-github-bot commented Nov 22, 2016

yiwu-arbug left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

siying commented Nov 22, 2016

yiwu-arbug commented Nov 22, 2016

facebook-github-bot commented Nov 22, 2016

facebook-github-bot commented Nov 23, 2016

siying commented Nov 23, 2016

siying commented Nov 22, 2016 •

edited

Loading

yiwu-arbug left a comment •

edited

Loading