From d90f002f6af10da17ba86713ac9bff3eecc13d95 Mon Sep 17 00:00:00 2001 From: Ran Date: Wed, 17 Jun 2020 15:45:56 +0800 Subject: [PATCH 1/4] tutorial: add load-base-split --- TOC.md | 1 + configure-load-base-split.md | 74 ++++++++++++++++++++++++++++++++++++ 2 files changed, 75 insertions(+) create mode 100644 configure-load-base-split.md diff --git a/TOC.md b/TOC.md index 31442104ad26c..29fbf03b7335c 100644 --- a/TOC.md +++ b/TOC.md @@ -116,6 +116,7 @@ + [PD Scheduling](/best-practices/pd-scheduling-best-practices.md) + [TiKV Performance Tuning with Massive Regions](/best-practices/massive-regions-best-practices.md) + [Use Placement Rules](/configure-placement-rules.md) + + [Use Load Base Split](/configure-load-base-split.md) + TiDB Ecosystem Tools + [Overview](/ecosystem-tool-user-guide.md) + [Use Cases](/ecosystem-tool-user-case.md) diff --git a/configure-load-base-split.md b/configure-load-base-split.md new file mode 100644 index 0000000000000..792bc2a9dc35a --- /dev/null +++ b/configure-load-base-split.md @@ -0,0 +1,74 @@ +--- +title: Load Base Split +summary: Learn the feature of Load Base Split. +category: how-to +--- + +# Load Base Split + +Load Base Split is a new feature introduced in TiDB 4.0. It aims to solve the hotspot issue caused by unbalanced access between Regions, such as full table scans for small tables. + +## Scenarios + +In TiDB, it is easy to create hotspots when traffic is concentrated on certain nodes. PD tries to schedule the Hot Regions so that they are distributed as evenly as possible across all nodes for better performance. + +However, the minimum unit for PD scheduling is Region. If the number of hotspots in a cluster is smaller than the number of nodes, or if a few hotspots have far more traffic than other Regions, PD can only move the hotspot from one node to another, but not make the entire cluster share the load. + +This scenario is especially common with workloads that are mostly read requests, such as full table scans and index lookups for small tables, or frequent access to some fields. + +Previously, the solution to this problem was to manually execute a command to split one or more hotspot Regions, but this approach has two problems: + +- Evenly splitting a Region is not always the best choice, because requests might be concentrated on a few keys. In such cases, hotspots might still be on one of the Regions after evenly splitting, and it might take multiple even splits to realize the goal. +- Human intervention is not timely or simple. + +## Implementation principles + +Load Base Split automatically splits the Region based on statistics. It identifies the Regions whose read traffic consistently exceeds the threshold for 10 seconds, and splits these Regions at proper positions. When choosing the split position, Load Base Split tries to balance the access traffic of both Regions after the split and avoid access across Regions. + +The Region split by Load Base Split will not be merged quickly. On the one hand, PD's `MergeChecker` skips the hot Region; on the other hand, PD also determines whether to merge two Regions according to `QPS` in the heartbeat information, to avoid the merge of two Regions with high `QPS`. + +## Usage + +The Load Base Split feature is currently controlled by the `split.qps-threshold` parameter. If the sum of all types of read requests per second for a Region exceeds the value of `split.qps-threshold` for 10 seconds on end, split the Region. + +Load Base Split is enabled by default, but the parameter is set to a rather low value, defaulting to `3000`. If you want to disable this feature, set the threshold high enough. + +To modify the parameter, take either of the following two methods: + +- Use a SQL statement: + + {{< copyable "sql" >}} + + ```sql + set config tikv split.qps-threshold=3000 + ``` + +- Use TiKV: + + {{< copyable "shell-regular" >}} + + ```shell + curl -X POST "http://ip:status_port/config" -H "accept: application/json" -d '{"split.qps-threshold":"3000"}' + ``` + +Accordingly, you can view the configuration by either of the following two methods: + +- Use a SQL statement: + + {{< copyable "sql" >}} + + ```sql + show config where type='tikv' and name like '%split.qps-threshold%' + ``` + +- Use TiKV: + + {{< copyable "shell-regular" >}} + + ```shell + curl "http://ip:status_port/config" + ``` + +> **Note:** +> +> Starting from v4.0.0-rc.2, you can modify and view the configuration using SQL statements. From b5d70e8cb8b6da1b34e58432817389218685af59 Mon Sep 17 00:00:00 2001 From: Ran Date: Wed, 17 Jun 2020 17:21:17 +0800 Subject: [PATCH 2/4] Update configure-load-base-split.md Co-authored-by: lhy1024 --- configure-load-base-split.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/configure-load-base-split.md b/configure-load-base-split.md index 792bc2a9dc35a..3a3a6ddf0649c 100644 --- a/configure-load-base-split.md +++ b/configure-load-base-split.md @@ -31,7 +31,7 @@ The Region split by Load Base Split will not be merged quickly. On the one hand, The Load Base Split feature is currently controlled by the `split.qps-threshold` parameter. If the sum of all types of read requests per second for a Region exceeds the value of `split.qps-threshold` for 10 seconds on end, split the Region. -Load Base Split is enabled by default, but the parameter is set to a rather low value, defaulting to `3000`. If you want to disable this feature, set the threshold high enough. +Load Base Split is enabled by default, but the parameter is set to a rather high value, defaulting to `3000`. If you want to disable this feature, set the threshold high enough. To modify the parameter, take either of the following two methods: From 7564efa56658edc9b24decc53eb8cc6ccc193bca Mon Sep 17 00:00:00 2001 From: Ran Date: Wed, 17 Jun 2020 17:27:57 +0800 Subject: [PATCH 3/4] replace traffic with load --- configure-load-base-split.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/configure-load-base-split.md b/configure-load-base-split.md index 3a3a6ddf0649c..4aae3d949ff8f 100644 --- a/configure-load-base-split.md +++ b/configure-load-base-split.md @@ -10,9 +10,9 @@ Load Base Split is a new feature introduced in TiDB 4.0. It aims to solve the ho ## Scenarios -In TiDB, it is easy to create hotspots when traffic is concentrated on certain nodes. PD tries to schedule the Hot Regions so that they are distributed as evenly as possible across all nodes for better performance. +In TiDB, it is easy to create hotspots when the load is concentrated on certain nodes. PD tries to schedule the Hot Regions so that they are distributed as evenly as possible across all nodes for better performance. -However, the minimum unit for PD scheduling is Region. If the number of hotspots in a cluster is smaller than the number of nodes, or if a few hotspots have far more traffic than other Regions, PD can only move the hotspot from one node to another, but not make the entire cluster share the load. +However, the minimum unit for PD scheduling is Region. If the number of hotspots in a cluster is smaller than the number of nodes, or if a few hotspots have far more load than other Regions, PD can only move the hotspot from one node to another, but not make the entire cluster share the load. This scenario is especially common with workloads that are mostly read requests, such as full table scans and index lookups for small tables, or frequent access to some fields. @@ -23,7 +23,7 @@ Previously, the solution to this problem was to manually execute a command to sp ## Implementation principles -Load Base Split automatically splits the Region based on statistics. It identifies the Regions whose read traffic consistently exceeds the threshold for 10 seconds, and splits these Regions at proper positions. When choosing the split position, Load Base Split tries to balance the access traffic of both Regions after the split and avoid access across Regions. +Load Base Split automatically splits the Region based on statistics. It identifies the Regions whose read load consistently exceeds the threshold for 10 seconds, and splits these Regions at proper positions. When choosing the split position, Load Base Split tries to balance the access load of both Regions after the split and avoid access across Regions. The Region split by Load Base Split will not be merged quickly. On the one hand, PD's `MergeChecker` skips the hot Region; on the other hand, PD also determines whether to merge two Regions according to `QPS` in the heartbeat information, to avoid the merge of two Regions with high `QPS`. From c63dcc4351a7fe4fd6bca2241790d014bf15e7be Mon Sep 17 00:00:00 2001 From: Ran Date: Tue, 30 Jun 2020 14:55:59 +0800 Subject: [PATCH 4/4] Apply suggestions from code review Co-authored-by: Lilian Lee Co-authored-by: lhy1024 --- configure-load-base-split.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/configure-load-base-split.md b/configure-load-base-split.md index 4aae3d949ff8f..4e28b4cdc76f8 100644 --- a/configure-load-base-split.md +++ b/configure-load-base-split.md @@ -1,7 +1,7 @@ --- title: Load Base Split summary: Learn the feature of Load Base Split. -category: how-to +category: tutorials --- # Load Base Split @@ -10,7 +10,7 @@ Load Base Split is a new feature introduced in TiDB 4.0. It aims to solve the ho ## Scenarios -In TiDB, it is easy to create hotspots when the load is concentrated on certain nodes. PD tries to schedule the Hot Regions so that they are distributed as evenly as possible across all nodes for better performance. +In TiDB, it is easy to generate hotspots when the load is concentrated on certain nodes. PD tries to schedule the hot Regions so that they are distributed as evenly as possible across all nodes for better performance. However, the minimum unit for PD scheduling is Region. If the number of hotspots in a cluster is smaller than the number of nodes, or if a few hotspots have far more load than other Regions, PD can only move the hotspot from one node to another, but not make the entire cluster share the load. @@ -23,9 +23,9 @@ Previously, the solution to this problem was to manually execute a command to sp ## Implementation principles -Load Base Split automatically splits the Region based on statistics. It identifies the Regions whose read load consistently exceeds the threshold for 10 seconds, and splits these Regions at proper positions. When choosing the split position, Load Base Split tries to balance the access load of both Regions after the split and avoid access across Regions. +Load Base Split automatically splits the Region based on statistics. It identifies the Regions whose read load consistently exceeds the threshold for 10 seconds, and splits these Regions at a proper position. When choosing the split position, Load Base Split tries to balance the access load of both Regions after the split and avoid access across Regions. -The Region split by Load Base Split will not be merged quickly. On the one hand, PD's `MergeChecker` skips the hot Region; on the other hand, PD also determines whether to merge two Regions according to `QPS` in the heartbeat information, to avoid the merge of two Regions with high `QPS`. +The Region split by Load Base Split will not be merged quickly. On the one hand, PD's `MergeChecker` skips the hot Regions; on the other hand, PD also determines whether to merge two Regions according to `QPS` in the heartbeat information, to avoid the merging of two Regions with high `QPS`. ## Usage