From ea51912c6b43085496658cbab39bde0ac0eebb97 Mon Sep 17 00:00:00 2001 From: CalvinNeo Date: Thu, 31 Aug 2023 11:20:47 +0800 Subject: [PATCH 1/6] add Signed-off-by: CalvinNeo --- ...08-31-reduce-wait-index-in-random-write.md | 27 +++++++++++++++++++ 1 file changed, 27 insertions(+) create mode 100644 docs/design/2023-08-31-reduce-wait-index-in-random-write.md diff --git a/docs/design/2023-08-31-reduce-wait-index-in-random-write.md b/docs/design/2023-08-31-reduce-wait-index-in-random-write.md new file mode 100644 index 00000000000..7c996b7253b --- /dev/null +++ b/docs/design/2023-08-31-reduce-wait-index-in-random-write.md @@ -0,0 +1,27 @@ +# Reduce raft wait index in random write + +- Author: [Rongzhen Luo](https://github.com/CalvinNeo) + +## Introduction + +This RFC introduces a new flush condition for every Region in TiFlash's KVStore, which will be triggered when the count of applied raft entries since the Region's last flush time has reached a certain threshold. + +Meanwhile, we will also deprecate the flush condition based on random timeout. + +## Background + +After received a CompactLog command, there are three conditions that determine whether an actual flush will take place. Two of them are associated with the size of the actual data in the Region. When the number of rows or the size of the data in the Region exceeds the corresponding threshold, flushing to disk will be triggered. The other condition is a random timeout. If the time since the last flush exceeds this timeout, a flush will be performed. + +In scenarios where small transactions are replicated, the first two size-related conditions are not easily triggered. The flush frequency is determined only by the random timeout. Furthermore, if the writes are highly random, when a flush is triggered, only a small amount of data is to be written, which is a waste to our write bandwidth. + +Therefore, we need to replace this random timeout condition. + +## Detailed Design + +When flushing each Region, the current `applied_index` is recorded into `last_flushed_applied_index`. Different from using random timeout, after the flush operation triggered by Split/Merge commands, `last_flushed_applied_index` also needs to be updated. After receiving the CompactLog command, the difference between the corresponding `applied_index` and `last_flushed_applied_index` is checked to see if it exceeds the log gap threshold configured. If it does, a flush is triggered; otherwise, the CompactLog is treated as an empty raft log entry. + +After TiFlash restarts, the `applied_index` recorded in the disk will be stored into `last_flushed_applied_index`. Because after restarted, TiFlash needs to catch up on logs from TiKV, which could result in a significant number of active Regions even for highly random writes. Therefore, after startup, the threshold for triggering a flush based on log gap will be reduced to half. + +In the current architecture, reducing the frequency of flushes often increases memory overhead. This is because we need to cache more entries in the raft entry cache and more KV data in the KVStore. Choosing `applied_index` can more accurately reflect the size of memory, thus avoiding additional flushes. + +The original random timeout based configuration will be ignored after upgrading. \ No newline at end of file From b44102949500cb39a6a7195aac0c104f19f9408e Mon Sep 17 00:00:00 2001 From: Calvin Neo Date: Mon, 4 Sep 2023 16:17:25 +0800 Subject: [PATCH 2/6] Update docs/design/2023-08-31-reduce-wait-index-in-random-write.md Co-authored-by: Lloyd-Pottiger <60744015+Lloyd-Pottiger@users.noreply.github.com> --- docs/design/2023-08-31-reduce-wait-index-in-random-write.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/docs/design/2023-08-31-reduce-wait-index-in-random-write.md b/docs/design/2023-08-31-reduce-wait-index-in-random-write.md index 7c996b7253b..36548aa2bb1 100644 --- a/docs/design/2023-08-31-reduce-wait-index-in-random-write.md +++ b/docs/design/2023-08-31-reduce-wait-index-in-random-write.md @@ -10,7 +10,11 @@ Meanwhile, we will also deprecate the flush condition based on random timeout. ## Background -After received a CompactLog command, there are three conditions that determine whether an actual flush will take place. Two of them are associated with the size of the actual data in the Region. When the number of rows or the size of the data in the Region exceeds the corresponding threshold, flushing to disk will be triggered. The other condition is a random timeout. If the time since the last flush exceeds this timeout, a flush will be performed. +After receiving a CompactLog command, there are three conditions that determine whether an actual flush will take place: + +1. The number of rows in the Region exceeds the corresponding threshold. +2. The size of the data in the Region exceeds the corresponding threshold. +3. The time since the last flush exceeds a random timeout. In scenarios where small transactions are replicated, the first two size-related conditions are not easily triggered. The flush frequency is determined only by the random timeout. Furthermore, if the writes are highly random, when a flush is triggered, only a small amount of data is to be written, which is a waste to our write bandwidth. From c78a1f6cbc8ce8057473e6d29314037dc5a8af7d Mon Sep 17 00:00:00 2001 From: Calvin Neo Date: Mon, 4 Sep 2023 16:17:37 +0800 Subject: [PATCH 3/6] Update docs/design/2023-08-31-reduce-wait-index-in-random-write.md Co-authored-by: Lloyd-Pottiger <60744015+Lloyd-Pottiger@users.noreply.github.com> --- docs/design/2023-08-31-reduce-wait-index-in-random-write.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/design/2023-08-31-reduce-wait-index-in-random-write.md b/docs/design/2023-08-31-reduce-wait-index-in-random-write.md index 36548aa2bb1..2b9de23ef90 100644 --- a/docs/design/2023-08-31-reduce-wait-index-in-random-write.md +++ b/docs/design/2023-08-31-reduce-wait-index-in-random-write.md @@ -16,7 +16,7 @@ After receiving a CompactLog command, there are three conditions that determine 2. The size of the data in the Region exceeds the corresponding threshold. 3. The time since the last flush exceeds a random timeout. -In scenarios where small transactions are replicated, the first two size-related conditions are not easily triggered. The flush frequency is determined only by the random timeout. Furthermore, if the writes are highly random, when a flush is triggered, only a small amount of data is to be written, which is a waste to our write bandwidth. +In scenarios where small transactions are replicated, the first two size-related conditions can not be easily triggered. The flush frequency is only determined by the random timeout. Furthermore, if the writes are highly random, when a flush is triggered, only a small amount of data in a Region is to be written, which is a waste of write bandwidth. Therefore, we need to replace this random timeout condition. From 52faba237894791f38b0eb8b15c95cfa6f305151 Mon Sep 17 00:00:00 2001 From: Calvin Neo Date: Mon, 4 Sep 2023 16:17:43 +0800 Subject: [PATCH 4/6] Update docs/design/2023-08-31-reduce-wait-index-in-random-write.md Co-authored-by: Lloyd-Pottiger <60744015+Lloyd-Pottiger@users.noreply.github.com> --- docs/design/2023-08-31-reduce-wait-index-in-random-write.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/design/2023-08-31-reduce-wait-index-in-random-write.md b/docs/design/2023-08-31-reduce-wait-index-in-random-write.md index 2b9de23ef90..56423a70d28 100644 --- a/docs/design/2023-08-31-reduce-wait-index-in-random-write.md +++ b/docs/design/2023-08-31-reduce-wait-index-in-random-write.md @@ -24,7 +24,7 @@ Therefore, we need to replace this random timeout condition. When flushing each Region, the current `applied_index` is recorded into `last_flushed_applied_index`. Different from using random timeout, after the flush operation triggered by Split/Merge commands, `last_flushed_applied_index` also needs to be updated. After receiving the CompactLog command, the difference between the corresponding `applied_index` and `last_flushed_applied_index` is checked to see if it exceeds the log gap threshold configured. If it does, a flush is triggered; otherwise, the CompactLog is treated as an empty raft log entry. -After TiFlash restarts, the `applied_index` recorded in the disk will be stored into `last_flushed_applied_index`. Because after restarted, TiFlash needs to catch up on logs from TiKV, which could result in a significant number of active Regions even for highly random writes. Therefore, after startup, the threshold for triggering a flush based on log gap will be reduced to half. +After TiFlash restarts, the `applied_index` recorded in the disk will be stored into `last_flushed_applied_index`. And because TiFlash needs to catch up on logs from TiKV after restarting, which could result in a significant number of active Regions even for highly random writes. Therefore, after startup, the threshold for triggering a flush based on the log gap will be reduced to half. In the current architecture, reducing the frequency of flushes often increases memory overhead. This is because we need to cache more entries in the raft entry cache and more KV data in the KVStore. Choosing `applied_index` can more accurately reflect the size of memory, thus avoiding additional flushes. From 681ddd507e12598c73c5ae220fbaef5b52b9bbb7 Mon Sep 17 00:00:00 2001 From: Calvin Neo Date: Mon, 4 Sep 2023 16:17:50 +0800 Subject: [PATCH 5/6] Update docs/design/2023-08-31-reduce-wait-index-in-random-write.md Co-authored-by: Lloyd-Pottiger <60744015+Lloyd-Pottiger@users.noreply.github.com> --- docs/design/2023-08-31-reduce-wait-index-in-random-write.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/design/2023-08-31-reduce-wait-index-in-random-write.md b/docs/design/2023-08-31-reduce-wait-index-in-random-write.md index 56423a70d28..d34c732a7ce 100644 --- a/docs/design/2023-08-31-reduce-wait-index-in-random-write.md +++ b/docs/design/2023-08-31-reduce-wait-index-in-random-write.md @@ -22,7 +22,7 @@ Therefore, we need to replace this random timeout condition. ## Detailed Design -When flushing each Region, the current `applied_index` is recorded into `last_flushed_applied_index`. Different from using random timeout, after the flush operation triggered by Split/Merge commands, `last_flushed_applied_index` also needs to be updated. After receiving the CompactLog command, the difference between the corresponding `applied_index` and `last_flushed_applied_index` is checked to see if it exceeds the log gap threshold configured. If it does, a flush is triggered; otherwise, the CompactLog is treated as an empty raft log entry. +When flushing each Region, the current `applied_index` is recorded as `last_flushed_applied_index`. Each time the flush operation is triggered by Split/Merge commands, `last_flushed_applied_index` will be updated. Different from using a random timeout, after receiving the CompactLog command, the difference between the corresponding `applied_index` and `last_flushed_applied_index` is checked to see if it exceeds the log gap threshold configured. If it does, a flush is triggered; otherwise, the CompactLog is treated as an empty raft log entry. After TiFlash restarts, the `applied_index` recorded in the disk will be stored into `last_flushed_applied_index`. And because TiFlash needs to catch up on logs from TiKV after restarting, which could result in a significant number of active Regions even for highly random writes. Therefore, after startup, the threshold for triggering a flush based on the log gap will be reduced to half. From 5ced840860ff58fc068c0820717f6ad2c18d23e5 Mon Sep 17 00:00:00 2001 From: Calvin Neo Date: Tue, 5 Sep 2023 14:14:07 +0800 Subject: [PATCH 6/6] Update docs/design/2023-08-31-reduce-wait-index-in-random-write.md Co-authored-by: JaySon --- docs/design/2023-08-31-reduce-wait-index-in-random-write.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/design/2023-08-31-reduce-wait-index-in-random-write.md b/docs/design/2023-08-31-reduce-wait-index-in-random-write.md index d34c732a7ce..ac190cdc880 100644 --- a/docs/design/2023-08-31-reduce-wait-index-in-random-write.md +++ b/docs/design/2023-08-31-reduce-wait-index-in-random-write.md @@ -26,6 +26,6 @@ When flushing each Region, the current `applied_index` is recorded as `last_flus After TiFlash restarts, the `applied_index` recorded in the disk will be stored into `last_flushed_applied_index`. And because TiFlash needs to catch up on logs from TiKV after restarting, which could result in a significant number of active Regions even for highly random writes. Therefore, after startup, the threshold for triggering a flush based on the log gap will be reduced to half. -In the current architecture, reducing the frequency of flushes often increases memory overhead. This is because we need to cache more entries in the raft entry cache and more KV data in the KVStore. Choosing `applied_index` can more accurately reflect the size of memory, thus avoiding additional flushes. +In the current architecture, reducing the frequency of flushes often increases memory overhead. This is because we need to cache more entries in the raft entry cache and more KV data in the KVStore. Instead of the random timeout mechanism, the gap from `last_flushed_applied_index` can more accurately reflect the memory overhead, thus avoiding additional flushes. The original random timeout based configuration will be ignored after upgrading. \ No newline at end of file