Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add doc for reducing wait index at random write #8044

Merged
merged 10 commits into from
Sep 5, 2023
31 changes: 31 additions & 0 deletions docs/design/2023-08-31-reduce-wait-index-in-random-write.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Reduce raft wait index in random write

- Author: [Rongzhen Luo](https://github.com/CalvinNeo)

## Introduction

This RFC introduces a new flush condition for every Region in TiFlash's KVStore, which will be triggered when the count of applied raft entries since the Region's last flush time has reached a certain threshold.

Meanwhile, we will also deprecate the flush condition based on random timeout.

## Background

After receiving a CompactLog command, there are three conditions that determine whether an actual flush will take place:

1. The number of rows in the Region exceeds the corresponding threshold.
2. The size of the data in the Region exceeds the corresponding threshold.
3. The time since the last flush exceeds a random timeout.

In scenarios where small transactions are replicated, the first two size-related conditions can not be easily triggered. The flush frequency is only determined by the random timeout. Furthermore, if the writes are highly random, when a flush is triggered, only a small amount of data in a Region is to be written, which is a waste of write bandwidth.

Therefore, we need to replace this random timeout condition.

## Detailed Design

When flushing each Region, the current `applied_index` is recorded as `last_flushed_applied_index`. Each time the flush operation is triggered by Split/Merge commands, `last_flushed_applied_index` will be updated. Different from using a random timeout, after receiving the CompactLog command, the difference between the corresponding `applied_index` and `last_flushed_applied_index` is checked to see if it exceeds the log gap threshold configured. If it does, a flush is triggered; otherwise, the CompactLog is treated as an empty raft log entry.

After TiFlash restarts, the `applied_index` recorded in the disk will be stored into `last_flushed_applied_index`. And because TiFlash needs to catch up on logs from TiKV after restarting, which could result in a significant number of active Regions even for highly random writes. Therefore, after startup, the threshold for triggering a flush based on the log gap will be reduced to half.

In the current architecture, reducing the frequency of flushes often increases memory overhead. This is because we need to cache more entries in the raft entry cache and more KV data in the KVStore. Instead of the random timeout mechanism, the gap from `last_flushed_applied_index` can more accurately reflect the memory overhead, thus avoiding additional flushes.

The original random timeout based configuration will be ignored after upgrading.