From 573a267e2aa8cba2483d60c710497ea02039ed61 Mon Sep 17 00:00:00 2001 From: toutdesuite Date: Tue, 7 Apr 2020 20:07:41 +0800 Subject: [PATCH 01/10] tiflash: add the maintain.md doc --- TOC.md | 1 + reference/tiflash/maintain.md | 150 ++++++++++++++++++++++++++++++++++ 2 files changed, 151 insertions(+) create mode 100644 reference/tiflash/maintain.md diff --git a/TOC.md b/TOC.md index 4d3b48547e1d1..cf3a3b5455507 100644 --- a/TOC.md +++ b/TOC.md @@ -307,6 +307,7 @@ - [Overview](/reference/tiflash/overview.md) - [Deploy a TiFlash Cluster](/reference/tiflash/deploy.md) - [Use TiFlash](/reference/tiflash/use-tiflash.md) + - [Maintain a TiFlash Cluster](/reference/tiflash/maintain.md) + TiDB Binlog - [Overview](/reference/tidb-binlog/overview.md) - [Deploy](/reference/tidb-binlog/deploy.md) diff --git a/reference/tiflash/maintain.md b/reference/tiflash/maintain.md new file mode 100644 index 0000000000000..eb2d79252db2d --- /dev/null +++ b/reference/tiflash/maintain.md @@ -0,0 +1,150 @@ +--- +title: Maintain a TiFlash Cluster +summary: Learn common operations when you maintain a TiFlash cluster. +category: reference +--- + +# Maintain a TiFlash Cluster + +This document describes common operations when you maintain a TiFlash cluster, including checking the version, node logout, troubleshooting, critical logs, and a system table. + +## Check the TiFlash version + +There are two ways to check the TiFlash version: + +- If the binary file name of TiFlash is `tiflash`, you can check the version by executing the `./tiflash version` command. + + However, to execute the above command, you need to add the directory path which includes the `libtiflash_proxy.so` dynamic library to the `LD_LIBRARY_PATH` environment variable. This is because the running of TiFlash relies on the `libtiflash_proxy.so` dynamic library. + + For example, when `tiflash` and `libtiflash_proxy.so` are in the same directory, you can first switch to this directory, and then use the following command to check the TiFlash version: + + {{< copyable "shell-regular" >}} + + ```shell + LD_LIBRARY_PATH=./ ./tiflash version + ``` + +- Check the TiFlash version by referring to the TiFlash log. For the log path, see [[logger] in the tiflash.toml configuration file](/reference/tiflash/configuration.md#configuration-file-tiflashtoml). For example: + + ``` + : TiFlash version: TiFlash 0.2.0 master-375035282451103999f3863c691e2fc2 + ``` + +## Logout a TiFlash node + +Logouting a TiFlash node differs from [Scaling in the TiFlash node](/reference/tiflash/scale.md#scale-in-tiflash-node) in that the logout doesn't remove the node from TiDB Ansible; instead, it just safely shutdown the process. + +Take the following steps to logout a TiFlash node: + +> **Note:** +> +> After you logout the TiFlash node, if the number of the remaining nodes in the TiFlash cluster is greater than or equal to the maximum replicas of all data tables, you can go directly to step 3. + +1. For a TiDB server, if the number of replicas of tables is greater than or equal to that of the remaining TiFlash nodes in the cluster, execute the following command: + + {{< copyable "sql" >}} + + ```sql + alter table . set tiflash replica 0; + ``` + +2. To ensure TiFlash replicas of related tables are removed, see [View the Table Replication Progress](/reference/tiflash/use-tiflash.md#view-the-table-replication-progress). If you cannot view the replication progress of the related tables, it means that the replicas are removed. + +3. Input the `store` command into [pd-ctl](/reference/tools/pd-control.md) (the binary file in `resources/bin` in the tidb-ansible directory) to view the `store id` of the TiFlash node. + +4. Input `store delete ` into `pd-ctl`. Here `` refers to the `store id` in step 3. + +5. When the corresponding `store` of the node disappeared, or when `state_name` is changed to `Tomestone`, shutdown the TiFlash process. + +> **Note:** +> +> If you don't cancel all tables replicated to TiFlash before all TiFlash nodes in a cluster stop running, you need to manually delete the replication rule in PD. Or you cannot successfully logout the TiFlash node. +> +> To manually delete the replication rule in PD, send the `DELETE` request `http://:/pd/api/v1/config/rule/tiflash/`. `rule_id` refers to the `id` of the `rule` to be deleted. + +## TiFlash troubleshooting + +This section describes some common questions of TiFlash, the reasons, and the solutions. + +### TiFlash replica is always in an unusable state + +This is because TiFlash is in the exception status caused by the configuration error or the environment problems. You can take the following steps to identify the problem component: + +1. Check whether PD enables the `Placement Rules` feature (to enable the feature, see the step 2 of [Add a TiFlash component in an existing TiDB Cluster](/reference/tiflash/deploy.md#add-a-TiFlash-component-in-an-existing-TiDB-cluster): + + {{< copyable "shell-regular" >}} + + ```shell + echo 'config show replication' | /path/to/pd-ctl -u http://: + ``` + + The expected result is `"enable-placement-rules": "true"`. + +2. Check whether the TiFlash process in the operation system is working correctly using `UpTime` of the TiFlash-Summary monitor panel. + +3. Check whether the TiFlash proxy status is normal through `pd-ctl`. + + {{< copyable "shell-regular" >}} + + ```shell + echo "store" | /path/to/pd-ctl -u http://: + ``` + + If `store.labels` includes information such as `{"key": "engine", "value": "tiflash"}`, it refers to the TiFlash proxy. + +4. Check whether `pd buddy` can print the logs correctly (the value of `log` in the [flash.flash_cluster] configuration item of the log path, is by default the `tmp` directory configured by the TiFlash configuration file). + +5. Check whether the value of `max-replicas` in PD is less than or equal to the number of TiKV nodes in the cluster. If not, PD cannot replicate data to TiFlash: + + {{< copyable "shell-regular" >}} + + ```shell + echo 'config show replication' | /path/to/pd-ctl -u http://: + ``` + + Reconfirm the value of `max-replicas`. + +6. Check whether the remaining disk space of the machine (where `store` of the TiFlash node is) is sufficient. By default, when the remaining disk space is less than 20% of the `store` capacity (which is controlled by the `low-space-ratio` parameter), PD cannot schedule data to TiFlash. + +### TiFlash query time is unstable, and error log prints many `Lock Exception` messages + +This is because large amounts of data are written to the cluster, which leads to the situation that the TiFlash query encounters a lock and requires query retry. + +You can set the query timestamp to one second earlier in TiDB (for example, `set @@tidb_snapshot=412881237115666555;`), to reduce the possibility that TiFlash query encounters a lock; thereby mitigating the risk of unstable query time. + +### Partial queries return `Region Unavailable` + +If the load pressure in TiFlash is so heavy that TiFlash data replication falls behind. Some queries might return error message `Region Unavailable`. + +In this case, you can share the pressure by adding TiFlash nodes. + +### Data file corruption + +Take the following steps to handle the data file corruption: + +1. Refer to [Logout a TiFlash node](/reference/tiflash/maintain.md#logout-a-tiflash-node) to logout the corresponding TiFlash node. +2. Delete the related data of the TiFlash node. +3. Redeploy the TiFlash node in the cluster. + +## TiFlash critical logs + +| Log Information | Log Description | +|---------------|-------------------| +| [ 23 ] KVStore: Start to persist [region 47, applied: term 6 index 10] | Data starts to be replicated (the number in the square brackets at the start of the log refers to the thread ID | +| [ 30 ] CoprocessorHandler: grpc::Status DB::CoprocessorHandler::execute() | `Handling DAG request` refers to that TiFlash starts to handle a Coprocessor request | +| [ 30 ] CoprocessorHandler: grpc::Status DB::CoprocessorHandler::execute() | `Handle DAG request done` refers to that TiFlash finishes a Coprocessor request | + +You can find the beginning or the end of a Coprocessor request, and then locate the related logs of the Coprocessor request through the thread ID printed at the start of the log. + +## TiFlash system table + +The column names and their descriptions of the `information_schema.tiflash_replica` system table are as follows: + +| Column Name | Description | +|---------------|-----------| +| TABLE_SCHEMA | database name | +| TABLE_NAME | table name | +| TABLE_ID | table ID | +| REPLICA_COUNT | number of TiFlash replicas | +| AVAILABLE | available or not (0/1)| +| PROGRESS | replication progress [0.0~1.0] | From eff18a836b7976d43fee5622b796cddba5354f5b Mon Sep 17 00:00:00 2001 From: toutdesuite Date: Wed, 8 Apr 2020 17:41:19 +0800 Subject: [PATCH 02/10] remove dead link --- reference/tiflash/maintain.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/reference/tiflash/maintain.md b/reference/tiflash/maintain.md index eb2d79252db2d..a570916a92654 100644 --- a/reference/tiflash/maintain.md +++ b/reference/tiflash/maintain.md @@ -32,8 +32,6 @@ There are two ways to check the TiFlash version: ## Logout a TiFlash node -Logouting a TiFlash node differs from [Scaling in the TiFlash node](/reference/tiflash/scale.md#scale-in-tiflash-node) in that the logout doesn't remove the node from TiDB Ansible; instead, it just safely shutdown the process. - Take the following steps to logout a TiFlash node: > **Note:** From 8225471e7cda07f443347a6423557da3592ecd3c Mon Sep 17 00:00:00 2001 From: toutdesuite Date: Wed, 8 Apr 2020 17:51:35 +0800 Subject: [PATCH 03/10] modify anchor --- reference/tiflash/maintain.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/reference/tiflash/maintain.md b/reference/tiflash/maintain.md index a570916a92654..7d0d9e36c1c47 100644 --- a/reference/tiflash/maintain.md +++ b/reference/tiflash/maintain.md @@ -120,7 +120,7 @@ In this case, you can share the pressure by adding TiFlash nodes. Take the following steps to handle the data file corruption: -1. Refer to [Logout a TiFlash node](/reference/tiflash/maintain.md#logout-a-tiflash-node) to logout the corresponding TiFlash node. +1. Refer to [Logout a TiFlash node](#logout-a-tiflash-node) to logout the corresponding TiFlash node. 2. Delete the related data of the TiFlash node. 3. Redeploy the TiFlash node in the cluster. From e7ff63c2b7fc2ba6cd4be963ec6ecb69388a8d4e Mon Sep 17 00:00:00 2001 From: toutdesuite Date: Wed, 8 Apr 2020 18:04:20 +0800 Subject: [PATCH 04/10] modify anchor --- reference/tiflash/maintain.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/reference/tiflash/maintain.md b/reference/tiflash/maintain.md index 7d0d9e36c1c47..a20fa94f99cd0 100644 --- a/reference/tiflash/maintain.md +++ b/reference/tiflash/maintain.md @@ -24,7 +24,7 @@ There are two ways to check the TiFlash version: LD_LIBRARY_PATH=./ ./tiflash version ``` -- Check the TiFlash version by referring to the TiFlash log. For the log path, see [[logger] in the tiflash.toml configuration file](/reference/tiflash/configuration.md#configuration-file-tiflashtoml). For example: +- Check the TiFlash version by referring to the TiFlash log. For the log path, see the [logger] part in [Configure the `tiflash.toml` file](/reference/tiflash/configuration.md#configure-the-`tiflash.toml`-file). For example: ``` : TiFlash version: TiFlash 0.2.0 master-375035282451103999f3863c691e2fc2 From 7b97ba9d1caad3c9d386b26ed0338e17713801df Mon Sep 17 00:00:00 2001 From: toutdesuite Date: Wed, 8 Apr 2020 18:25:33 +0800 Subject: [PATCH 05/10] modify anchor --- reference/tiflash/maintain.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/reference/tiflash/maintain.md b/reference/tiflash/maintain.md index a20fa94f99cd0..0473725baa215 100644 --- a/reference/tiflash/maintain.md +++ b/reference/tiflash/maintain.md @@ -24,7 +24,7 @@ There are two ways to check the TiFlash version: LD_LIBRARY_PATH=./ ./tiflash version ``` -- Check the TiFlash version by referring to the TiFlash log. For the log path, see the [logger] part in [Configure the `tiflash.toml` file](/reference/tiflash/configuration.md#configure-the-`tiflash.toml`-file). For example: +- Check the TiFlash version by referring to the TiFlash log. For the log path, see the `[logger]` part in [configure the `tiflash.toml` file](/reference/tiflash/configuration.md#configure-the-tiflashtoml-file). For example: ``` : TiFlash version: TiFlash 0.2.0 master-375035282451103999f3863c691e2fc2 From a765c01a3c7e1d0f2dd3a6331069c52347f31667 Mon Sep 17 00:00:00 2001 From: toutdesuite Date: Wed, 8 Apr 2020 18:49:35 +0800 Subject: [PATCH 06/10] address comments --- reference/tiflash/maintain.md | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/reference/tiflash/maintain.md b/reference/tiflash/maintain.md index 0473725baa215..80c81c3e4c62c 100644 --- a/reference/tiflash/maintain.md +++ b/reference/tiflash/maintain.md @@ -6,7 +6,7 @@ category: reference # Maintain a TiFlash Cluster -This document describes common operations when you maintain a TiFlash cluster, including checking the version, node logout, troubleshooting, critical logs, and a system table. +This document describes how to perform common operations when you maintain a TiFlash cluster, including checking the TiFlash version, taking TiFlash nodes down, and troubleshooting TiFlash. This document also introduces critical logs and a system table of TiFlash. ## Check the TiFlash version @@ -30,13 +30,15 @@ There are two ways to check the TiFlash version: : TiFlash version: TiFlash 0.2.0 master-375035282451103999f3863c691e2fc2 ``` -## Logout a TiFlash node +## Take a TiFlash node down -Take the following steps to logout a TiFlash node: +Taking a TiFlash node down differs from [Scaling in a TiFlash node](/reference/tiflash/scale.md#scale-in-a-tiflash-node) in that the former doesn't remove the node from TiDB Ansible; instead, it just safely shutdown the process. + +Follow the steps below to take a TiFlash node down: > **Note:** > -> After you logout the TiFlash node, if the number of the remaining nodes in the TiFlash cluster is greater than or equal to the maximum replicas of all data tables, you can go directly to step 3. +> After you take the TiFlash node down, if the number of the remaining nodes in the TiFlash cluster is greater than or equal to the maximum replicas of all data tables, you can go directly to step 3. 1. For a TiDB server, if the number of replicas of tables is greater than or equal to that of the remaining TiFlash nodes in the cluster, execute the following command: @@ -56,7 +58,7 @@ Take the following steps to logout a TiFlash node: > **Note:** > -> If you don't cancel all tables replicated to TiFlash before all TiFlash nodes in a cluster stop running, you need to manually delete the replication rule in PD. Or you cannot successfully logout the TiFlash node. +> If you don't cancel all tables replicated to TiFlash before all TiFlash nodes in a cluster stop running, you need to manually delete the replication rule in PD. Or you cannot successfully take the TiFlash node down. > > To manually delete the replication rule in PD, send the `DELETE` request `http://:/pd/api/v1/config/rule/tiflash/`. `rule_id` refers to the `id` of the `rule` to be deleted. @@ -120,7 +122,7 @@ In this case, you can share the pressure by adding TiFlash nodes. Take the following steps to handle the data file corruption: -1. Refer to [Logout a TiFlash node](#logout-a-tiflash-node) to logout the corresponding TiFlash node. +1. Refer to [Take a TiFlash node down](#take-a-tiflash-node-down) to take the corresponding TiFlash node down. 2. Delete the related data of the TiFlash node. 3. Redeploy the TiFlash node in the cluster. From e339babac1b996a3cf14d8ab0fc6e9eb37ac5136 Mon Sep 17 00:00:00 2001 From: toutdesuite Date: Wed, 8 Apr 2020 19:55:54 +0800 Subject: [PATCH 07/10] address comments --- reference/tiflash/maintain.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/reference/tiflash/maintain.md b/reference/tiflash/maintain.md index 80c81c3e4c62c..ee3e12bca6d8c 100644 --- a/reference/tiflash/maintain.md +++ b/reference/tiflash/maintain.md @@ -32,7 +32,7 @@ There are two ways to check the TiFlash version: ## Take a TiFlash node down -Taking a TiFlash node down differs from [Scaling in a TiFlash node](/reference/tiflash/scale.md#scale-in-a-tiflash-node) in that the former doesn't remove the node from TiDB Ansible; instead, it just safely shutdown the process. +Taking a TiFlash node down differs from [Scaling in a TiFlash node](/reference/tiflash/scale.md#scale-in-a-tiflash-node) in that the former doesn't remove the node in TiDB Ansible; instead, it just safely shuts down the TiFlash process. Follow the steps below to take a TiFlash node down: @@ -40,7 +40,7 @@ Follow the steps below to take a TiFlash node down: > > After you take the TiFlash node down, if the number of the remaining nodes in the TiFlash cluster is greater than or equal to the maximum replicas of all data tables, you can go directly to step 3. -1. For a TiDB server, if the number of replicas of tables is greater than or equal to that of the remaining TiFlash nodes in the cluster, execute the following command: +1. If the number of replicas of tables is greater than or equal to that of the remaining TiFlash nodes in the cluster, execute the following command on those tables in the TiDB client: {{< copyable "sql" >}} @@ -48,27 +48,27 @@ Follow the steps below to take a TiFlash node down: alter table . set tiflash replica 0; ``` -2. To ensure TiFlash replicas of related tables are removed, see [View the Table Replication Progress](/reference/tiflash/use-tiflash.md#view-the-table-replication-progress). If you cannot view the replication progress of the related tables, it means that the replicas are removed. +2. To ensure that the TiFlash replicas of these tables are removed, see [View the Table Replication Progress](/reference/tiflash/use-tiflash.md#view-the-table-replication-progress). If you cannot view the replication progress of the related tables, it means that the replicas are removed. -3. Input the `store` command into [pd-ctl](/reference/tools/pd-control.md) (the binary file in `resources/bin` in the tidb-ansible directory) to view the `store id` of the TiFlash node. +3. Input the `store` command into [pd-ctl](/reference/tools/pd-control.md) (the binary file is in `resources/bin` of the tidb-ansible directory) to view the `store id` of the TiFlash node. 4. Input `store delete ` into `pd-ctl`. Here `` refers to the `store id` in step 3. -5. When the corresponding `store` of the node disappeared, or when `state_name` is changed to `Tomestone`, shutdown the TiFlash process. +5. When the corresponding `store` of the node disappears, or when `state_name` is changed to `Tomestone`, stop the TiFlash process. > **Note:** > -> If you don't cancel all tables replicated to TiFlash before all TiFlash nodes in a cluster stop running, you need to manually delete the replication rule in PD. Or you cannot successfully take the TiFlash node down. +> If you don't cancel all tables replicated to TiFlash before all TiFlash nodes stop running, you need to manually delete the replication rule in PD. Or you cannot successfully take the TiFlash node down. > > To manually delete the replication rule in PD, send the `DELETE` request `http://:/pd/api/v1/config/rule/tiflash/`. `rule_id` refers to the `id` of the `rule` to be deleted. ## TiFlash troubleshooting -This section describes some common questions of TiFlash, the reasons, and the solutions. +This section describes some commonly encountered issues when using TiFlash, the reasons, and the solutions. -### TiFlash replica is always in an unusable state +### TiFlash replica is always unavailable -This is because TiFlash is in the exception status caused by the configuration error or the environment problems. You can take the following steps to identify the problem component: +This is because TiFlash is in an abnormal state caused by configuration errors or environment issues. Take the following steps to identify the faulty component: 1. Check whether PD enables the `Placement Rules` feature (to enable the feature, see the step 2 of [Add a TiFlash component in an existing TiDB Cluster](/reference/tiflash/deploy.md#add-a-TiFlash-component-in-an-existing-TiDB-cluster): From 5439f32fb1401d2cf688c58987bc22a1e33ee0b6 Mon Sep 17 00:00:00 2001 From: toutdesuite Date: Wed, 8 Apr 2020 21:24:01 +0800 Subject: [PATCH 08/10] address comments, esp anchor links --- reference/tiflash/maintain.md | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/reference/tiflash/maintain.md b/reference/tiflash/maintain.md index ee3e12bca6d8c..efbc812b6f5c0 100644 --- a/reference/tiflash/maintain.md +++ b/reference/tiflash/maintain.md @@ -24,7 +24,7 @@ There are two ways to check the TiFlash version: LD_LIBRARY_PATH=./ ./tiflash version ``` -- Check the TiFlash version by referring to the TiFlash log. For the log path, see the `[logger]` part in [configure the `tiflash.toml` file](/reference/tiflash/configuration.md#configure-the-tiflashtoml-file). For example: +- Check the TiFlash version by referring to the TiFlash log. For the log path, see the `[logger]` part in [the `tiflash.toml` file](/reference/tiflash/configuration.md#configure-the-tiflashtoml-file). For example: ``` : TiFlash version: TiFlash 0.2.0 master-375035282451103999f3863c691e2fc2 @@ -48,7 +48,7 @@ Follow the steps below to take a TiFlash node down: alter table . set tiflash replica 0; ``` -2. To ensure that the TiFlash replicas of these tables are removed, see [View the Table Replication Progress](/reference/tiflash/use-tiflash.md#view-the-table-replication-progress). If you cannot view the replication progress of the related tables, it means that the replicas are removed. +2. To ensure that the TiFlash replicas of these tables are removed, see [Check the Replication Progress](/reference/tiflash/use-tiflash.md#check-the-replication-progress). If you cannot view the replication progress of the related tables, it means that the replicas are removed. 3. Input the `store` command into [pd-ctl](/reference/tools/pd-control.md) (the binary file is in `resources/bin` of the tidb-ansible directory) to view the `store id` of the TiFlash node. @@ -70,7 +70,7 @@ This section describes some commonly encountered issues when using TiFlash, the This is because TiFlash is in an abnormal state caused by configuration errors or environment issues. Take the following steps to identify the faulty component: -1. Check whether PD enables the `Placement Rules` feature (to enable the feature, see the step 2 of [Add a TiFlash component in an existing TiDB Cluster](/reference/tiflash/deploy.md#add-a-TiFlash-component-in-an-existing-TiDB-cluster): +1. Check whether PD enables the `Placement Rules` feature (to enable the feature, see the step 2 of [Add TiFlash component to an existing TiDB cluster](/reference/tiflash/deploy.md#add-tiflash-component-to-an-existing-tidb-cluster): {{< copyable "shell-regular" >}} @@ -80,7 +80,7 @@ This is because TiFlash is in an abnormal state caused by configuration errors o The expected result is `"enable-placement-rules": "true"`. -2. Check whether the TiFlash process in the operation system is working correctly using `UpTime` of the TiFlash-Summary monitor panel. +2. Check whether the TiFlash process is working correctly by viewing `UpTime` on the TiFlash-Summary monitoring panel. 3. Check whether the TiFlash proxy status is normal through `pd-ctl`. @@ -90,9 +90,9 @@ This is because TiFlash is in an abnormal state caused by configuration errors o echo "store" | /path/to/pd-ctl -u http://: ``` - If `store.labels` includes information such as `{"key": "engine", "value": "tiflash"}`, it refers to the TiFlash proxy. + The TiFlash proxy's `store.labels` includes information such as `{"key": "engine", "value": "tiflash"}`. You can check this information to confirm a TiFlash proxy. -4. Check whether `pd buddy` can print the logs correctly (the value of `log` in the [flash.flash_cluster] configuration item of the log path, is by default the `tmp` directory configured by the TiFlash configuration file). +4. Check whether `pd buddy` can correctly print the logs (the log path is the value of `log` in the [flash.flash_cluster] configuration item; the default log path is under the `tmp` directory configured in the TiFlash configuration file). 5. Check whether the value of `max-replicas` in PD is less than or equal to the number of TiKV nodes in the cluster. If not, PD cannot replicate data to TiFlash: @@ -104,19 +104,19 @@ This is because TiFlash is in an abnormal state caused by configuration errors o Reconfirm the value of `max-replicas`. -6. Check whether the remaining disk space of the machine (where `store` of the TiFlash node is) is sufficient. By default, when the remaining disk space is less than 20% of the `store` capacity (which is controlled by the `low-space-ratio` parameter), PD cannot schedule data to TiFlash. +6. Check whether the remaining disk space of the machine (where `store` of the TiFlash node is) is sufficient. By default, when the remaining disk space is less than 20% of the `store` capacity (which is controlled by the `low-space-ratio` parameter), PD cannot schedule data to this TiFlash node. ### TiFlash query time is unstable, and error log prints many `Lock Exception` messages -This is because large amounts of data are written to the cluster, which leads to the situation that the TiFlash query encounters a lock and requires query retry. +This is because large amounts of data are written to the cluster, which causes that the TiFlash query encounters a lock and requires query retry. -You can set the query timestamp to one second earlier in TiDB (for example, `set @@tidb_snapshot=412881237115666555;`), to reduce the possibility that TiFlash query encounters a lock; thereby mitigating the risk of unstable query time. +You can set the query timestamp to one second earlier in TiDB (for example, `set @@tidb_snapshot=412881237115666555;`). This makes less TiFlash queries encounter a lock and mitigate the risk of unstable query time. -### Partial queries return `Region Unavailable` +### Some queries return the `Region Unavailable` error -If the load pressure in TiFlash is so heavy that TiFlash data replication falls behind. Some queries might return error message `Region Unavailable`. +If the load pressure on TiFlash is too heavy and it causes that TiFlash data replication falls behind, some queries might return the `Region Unavailable` error. -In this case, you can share the pressure by adding TiFlash nodes. +In this case, you can balance the load pressure by adding more TiFlash nodes. ### Data file corruption @@ -131,8 +131,8 @@ Take the following steps to handle the data file corruption: | Log Information | Log Description | |---------------|-------------------| | [ 23 ] KVStore: Start to persist [region 47, applied: term 6 index 10] | Data starts to be replicated (the number in the square brackets at the start of the log refers to the thread ID | -| [ 30 ] CoprocessorHandler: grpc::Status DB::CoprocessorHandler::execute() | `Handling DAG request` refers to that TiFlash starts to handle a Coprocessor request | -| [ 30 ] CoprocessorHandler: grpc::Status DB::CoprocessorHandler::execute() | `Handle DAG request done` refers to that TiFlash finishes a Coprocessor request | +| [ 30 ] CoprocessorHandler: grpc::Status DB::CoprocessorHandler::execute() | Handling DAG request, that is, TiFlash starts to handle a Coprocessor request | +| [ 30 ] CoprocessorHandler: grpc::Status DB::CoprocessorHandler::execute() | Handling DAG request done, that is, TiFlash finishes handling a Coprocessor request | You can find the beginning or the end of a Coprocessor request, and then locate the related logs of the Coprocessor request through the thread ID printed at the start of the log. From dba7d76d592c590c498986e716c4c21c2c59f4d5 Mon Sep 17 00:00:00 2001 From: Keke Yi <40977455+yikeke@users.noreply.github.com> Date: Wed, 8 Apr 2020 21:53:44 +0800 Subject: [PATCH 09/10] Update reference/tiflash/maintain.md --- reference/tiflash/maintain.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/reference/tiflash/maintain.md b/reference/tiflash/maintain.md index efbc812b6f5c0..9abbdf59d0784 100644 --- a/reference/tiflash/maintain.md +++ b/reference/tiflash/maintain.md @@ -110,7 +110,7 @@ This is because TiFlash is in an abnormal state caused by configuration errors o This is because large amounts of data are written to the cluster, which causes that the TiFlash query encounters a lock and requires query retry. -You can set the query timestamp to one second earlier in TiDB (for example, `set @@tidb_snapshot=412881237115666555;`). This makes less TiFlash queries encounter a lock and mitigate the risk of unstable query time. +You can set the query timestamp to one second earlier in TiDB (for example, `set @@tidb_snapshot=412881237115666555;`). This makes less TiFlash queries encounter a lock and mitigates the risk of unstable query time. ### Some queries return the `Region Unavailable` error From 046055be04f1cf5aad8b91af98b7a2c940e19ccb Mon Sep 17 00:00:00 2001 From: yikeke Date: Wed, 8 Apr 2020 22:54:14 +0800 Subject: [PATCH 10/10] minor edits --- reference/tiflash/maintain.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/reference/tiflash/maintain.md b/reference/tiflash/maintain.md index 9abbdf59d0784..d949a2f74a93d 100644 --- a/reference/tiflash/maintain.md +++ b/reference/tiflash/maintain.md @@ -40,7 +40,7 @@ Follow the steps below to take a TiFlash node down: > > After you take the TiFlash node down, if the number of the remaining nodes in the TiFlash cluster is greater than or equal to the maximum replicas of all data tables, you can go directly to step 3. -1. If the number of replicas of tables is greater than or equal to that of the remaining TiFlash nodes in the cluster, execute the following command on those tables in the TiDB client: +1. If the number of replicas of tables is greater than or equal to that of the remaining TiFlash nodes in the cluster, execute the following command on these tables in the TiDB client: {{< copyable "sql" >}} @@ -54,7 +54,7 @@ Follow the steps below to take a TiFlash node down: 4. Input `store delete ` into `pd-ctl`. Here `` refers to the `store id` in step 3. -5. When the corresponding `store` of the node disappears, or when `state_name` is changed to `Tomestone`, stop the TiFlash process. +5. When the corresponding `store` of the node disappears, or when `state_name` is changed to `Tombstone`, stop the TiFlash process. > **Note:** > @@ -106,7 +106,7 @@ This is because TiFlash is in an abnormal state caused by configuration errors o 6. Check whether the remaining disk space of the machine (where `store` of the TiFlash node is) is sufficient. By default, when the remaining disk space is less than 20% of the `store` capacity (which is controlled by the `low-space-ratio` parameter), PD cannot schedule data to this TiFlash node. -### TiFlash query time is unstable, and error log prints many `Lock Exception` messages +### TiFlash query time is unstable, and the error log prints many `Lock Exception` messages This is because large amounts of data are written to the cluster, which causes that the TiFlash query encounters a lock and requires query retry.