Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add doc for reducing wait index at random write #8044

Merged
merged 10 commits into from
Sep 5, 2023
27 changes: 27 additions & 0 deletions docs/design/2023-08-31-reduce-wait-index-in-random-write.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Reduce raft wait index in random write

- Author: [Rongzhen Luo](https://github.com/CalvinNeo)

## Introduction

This RFC introduces a new flush condition for every Region in TiFlash's KVStore, which will be triggered when the count of applied raft entries since the Region's last flush time has reached a certain threshold.

Meanwhile, we will also deprecate the flush condition based on random timeout.

## Background

After received a CompactLog command, there are three conditions that determine whether an actual flush will take place. Two of them are associated with the size of the actual data in the Region. When the number of rows or the size of the data in the Region exceeds the corresponding threshold, flushing to disk will be triggered. The other condition is a random timeout. If the time since the last flush exceeds this timeout, a flush will be performed.
CalvinNeo marked this conversation as resolved.
Show resolved Hide resolved

In scenarios where small transactions are replicated, the first two size-related conditions are not easily triggered. The flush frequency is determined only by the random timeout. Furthermore, if the writes are highly random, when a flush is triggered, only a small amount of data is to be written, which is a waste to our write bandwidth.
CalvinNeo marked this conversation as resolved.
Show resolved Hide resolved

Therefore, we need to replace this random timeout condition.

## Detailed Design

When flushing each Region, the current `applied_index` is recorded into `last_flushed_applied_index`. Different from using random timeout, after the flush operation triggered by Split/Merge commands, `last_flushed_applied_index` also needs to be updated. After receiving the CompactLog command, the difference between the corresponding `applied_index` and `last_flushed_applied_index` is checked to see if it exceeds the log gap threshold configured. If it does, a flush is triggered; otherwise, the CompactLog is treated as an empty raft log entry.
CalvinNeo marked this conversation as resolved.
Show resolved Hide resolved

After TiFlash restarts, the `applied_index` recorded in the disk will be stored into `last_flushed_applied_index`. Because after restarted, TiFlash needs to catch up on logs from TiKV, which could result in a significant number of active Regions even for highly random writes. Therefore, after startup, the threshold for triggering a flush based on log gap will be reduced to half.
CalvinNeo marked this conversation as resolved.
Show resolved Hide resolved

In the current architecture, reducing the frequency of flushes often increases memory overhead. This is because we need to cache more entries in the raft entry cache and more KV data in the KVStore. Choosing `applied_index` can more accurately reflect the size of memory, thus avoiding additional flushes.
CalvinNeo marked this conversation as resolved.
Show resolved Hide resolved

The original random timeout based configuration will be ignored after upgrading.