diff --git a/scale-tidb-using-tiup.md b/scale-tidb-using-tiup.md index 3cf7eed6c53b8..ffab8eff57736 100644 --- a/scale-tidb-using-tiup.md +++ b/scale-tidb-using-tiup.md @@ -7,9 +7,9 @@ aliases: ['/docs/dev/how-to/scale/with-tiup/'] # Scale the TiDB Cluster Using TiUP -The capacity of a TiDB cluster can be increased or decreased without affecting the online services. +The capacity of a TiDB cluster can be increased or decreased without interrupting the online services. -This document describes how to scale the TiDB, TiKV, PD, TiCDC, or TiFlash nodes using TiUP. If you have not installed TiUP, refer to the steps in [Install TiUP on the control machine](/upgrade-tidb-using-tiup.md#install-tiup-on-the-control-machine) and import the cluster into TiUP before you scale the TiDB cluster. +This document describes how to scale the TiDB, TiKV, PD, TiCDC, or TiFlash nodes using TiUP. If you have not installed TiUP, refer to the steps in [Install TiUP on the control machine](/upgrade-tidb-using-tiup.md#install-tiup-on-the-control-machine) and import the cluster into TiUP before you use TiUP to scale the TiDB cluster. To view the current cluster name list, run `tiup cluster list`. @@ -23,20 +23,20 @@ For example, if the original topology of the cluster is as follows: | 10.0.1.1 | TiKV | | 10.0.1.2 | TiKV | -## Scale out a TiDB/TiKV/PD/TiCDC node +## Scale out a TiDB/PD/TiKV node If you want to add a TiDB node to the `10.0.1.5` host, take the following steps. > **Note:** > -> You can take similar steps to add the TiKV, PD, or TiCDC node. +> You can take similar steps to add the PD node. Before you add the TiKV node, it is recommended that you adjust the PD scheduling parameters in advance according to the cluster load. 1. Configure the scale-out topology: > **Note:** > - > * The port information is not required by default. - > * If multiple instances are deployed on a single machine, you need to allocate different ports for them. If the ports or directories have conflicts, you will receive a notification during deployment or scaling. + > * The port and directory information is not required by default. + > * If multiple instances are deployed on a single machine, you need to allocate different ports and directories for them. If the ports or directories have conflicts, you will receive a notification during deployment or scaling. Add the scale-out topology configuration in the `scale-out.yaml` file: @@ -46,15 +46,50 @@ If you want to add a TiDB node to the `10.0.1.5` host, take the following steps. vi scale-out.yaml ``` - ``` + {{< copyable "" >}} + + ```ini tidb_servers: - host: 10.0.1.5 ssh_port: 22 port: 4000 status_port: 10080 + deploy_dir: /data/deploy/install/deploy/tidb-4000 + log_dir: /data/deploy/install/log/tidb-4000 + ``` + + Here is a TiKV configuration file template: + + {{< copyable "" >}} + + ```ini + tikv_servers: + - host: 10.0.1.5 + ssh_port: 22 + port: 20160 + status_port: 20180 + deploy_dir: /data/deploy/install/deploy/tikv-20160 + data_dir: /data/deploy/install/data/tikv-20160 + log_dir: /data/deploy/install/log/tikv-20160 + ``` + + Here is a PD configuration file template: + + {{< copyable "" >}} + + ```ini + pd_servers: + - host: 10.0.1.5 + ssh_port: 22 + name: pd-1 + client_port: 2379 + peer_port: 2380 + deploy_dir: /data/deploy/install/deploy/pd-2379 + data_dir: /data/deploy/install/data/pd-2379 + log_dir: /data/deploy/install/log/pd-2379 ``` - To view the whole configuration of the current cluster, run `tiup cluster edit-config `. The global configuration of `global` and `server_configs` also takes effect in `scale-out.yaml`. + To view the configuration of the current cluster, run `tiup cluster edit-config `. Because the parameter configuration of `global` and `server_configs` is inherited by `scale-out.yaml` and thus also takes effect in `scale-out.yaml`. After the configuration, the current topology of the cluster is as follows: @@ -84,12 +119,29 @@ If you want to add a TiDB node to the `10.0.1.5` host, take the following steps. tiup cluster display ``` - Access the monitoring platform at using your browser to monitor the status of the cluster and the new node. + Access the monitoring platform at using your browser to monitor the status of the cluster and the new node. + +After the scale-out, the cluster topology is as follows: + +| Host IP | Service | +|:----|:----| +| 10.0.1.3 | TiDB + TiFlash | +| 10.0.1.4 | TiDB + PD | +| 10.0.1.5 | **TiDB** + TiKV + Monitor | +| 10.0.1.1 | TiKV | +| 10.0.1.2 | TiKV | ## Scale out a TiFlash node If you want to add a TiFlash node to the `10.0.1.4` host, take the following steps. +> **Note:** +> +> When adding a TiFlash node to an existing TiDB cluster, you need to note the following things: +> +> 1. Confirm that the current TiDB version supports using TiFlash, otherwise upgrade your TiDB cluster to v4.0.0-rc or higher. +> 2. Download [pd-ctl](https://download.pingcap.org/tidb-v4.0.0-rc.2-linux-amd64.tar.gz) and execute the `config set enable-placement-rules true` command to enable the PD's Placement Rules. + 1. Add the node information to the `scale-out.yaml` file: Create the `scale-out.yaml` file to add the TiFlash node information. @@ -98,7 +150,7 @@ If you want to add a TiFlash node to the `10.0.1.4` host, take the following ste ```ini tiflash_servers: - - host: 10.0.1.4 + - host: 10.0.1.4 ``` Currently, you can only add IP but not domain name. @@ -119,15 +171,69 @@ If you want to add a TiFlash node to the `10.0.1.4` host, take the following ste tiup cluster display ``` - Access the monitoring platform at using your browser, and view the status of the cluster and the new node. + Access the monitoring platform at using your browser, and view the status of the cluster and the new node. -## Scale in a TiDB/TiKV/PD/TiCDC node +After the scale-out, the cluster topology is as follows: + +| Host IP | Service | +|:----|:----| +| 10.0.1.3 | TiDB + TiFlash | +| 10.0.1.4 | TiDB + PD + **TiFlash** | +| 10.0.1.5 | TiDB+ TiKV + Monitor | +| 10.0.1.1 | TiKV | +| 10.0.1.2 | TiKV | + +## Scale out a TiCDC node + +If you want to add two TiCDC nodes to the `10.0.1.3` and `10.0.1.4` hosts, take the following steps. + +1. Add the node information to the `scale-out.yaml` file: + + Create the `scale-out.yaml` file to add the TiCDC node information. + + {{< copyable "" >}} + + ```ini + cdc_servers: + - host: 10.0.1.3 + - host: 10.0.1.4 + ``` + +2. Run the scale-out command: + + {{< copyable "shell-regular" >}} + + ```shell + tiup cluster scale-out scale-out.yaml + ``` + +3. View the cluster status: + + {{< copyable "shell-regular" >}} + + ```shell + tiup cluster display + ``` + + Access the monitoring platform at using your browser, and view the status of the cluster and the new nodes. + +After the scale-out, the cluster topology is as follows: + +| Host IP | Service | +|:----|:----| +| 10.0.1.3 | TiDB + TiFlash + **TiCDC** | +| 10.0.1.4 | TiDB + PD + TiFlash + **TiCDC** | +| 10.0.1.5 | TiDB+ TiKV + Monitor | +| 10.0.1.1 | TiKV | +| 10.0.1.2 | TiKV | + +## Scale in a TiDB/PD/TiKV node If you want to remove a TiKV node from the `10.0.1.5` host, take the following steps. > **Note:** > -> You can take similar steps to remove the TiDB, PD, or TiCDC node. +> You can take similar steps to remove the TiDB and PD node. 1. View the node ID information: @@ -138,11 +244,13 @@ If you want to remove a TiKV node from the `10.0.1.5` host, take the following s ``` ``` - Starting /root/.tiup/components/cluster/v0.4.6/cluster display + Starting /root/.tiup/components/cluster/v0.4.6/cluster display TiDB Cluster: TiDB Version: v4.0.0-rc ID Role Host Ports Status Data Dir Deploy Dir -- ---- ---- ----- ------ -------- ---------- + 10.0.1.3:8300 cdc 10.0.1.3 8300 Up - deploy/cdc-8300 + 10.0.1.4:8300 cdc 10.0.1.4 8300 Up - deploy/cdc-8300 10.0.1.4:2379 pd 10.0.1.4 2379/2380 Healthy data/pd-2379 deploy/pd-2379 10.0.1.1:20160 tikv 10.0.1.1 20160/20180 Up data/tikv-20160 deploy/tikv-20160 10.0.1.2:20160 tikv 10.0.1.2 20160/20180 Up data/tikv-20160 deploy/tikv-20160 @@ -152,9 +260,9 @@ If you want to remove a TiKV node from the `10.0.1.5` host, take the following s 10.0.1.5:4000 tidb 10.0.1.5 4000/10080 Up - deploy/tidb-4000 10.0.1.3:9000 tiflash 10.0.1.3 9000/8123/3930/20170/20292/8234 Up data/tiflash-9000 deploy/tiflash-9000 10.0.1.4:9000 tiflash 10.0.1.4 9000/8123/3930/20170/20292/8234 Up data/tiflash-9000 deploy/tiflash-9000 - 10.0.1.5:9290 prometheus 10.0.1.5 9290 Up data/prometheus-9290 deploy/prometheus-9290 - 10.0.1.5:3200 grafana 10.0.1.5 3200 Up - deploy/grafana-3200 - 10.0.1.5:9293 alertmanager 10.0.1.5 9293/9294 Up data/alertmanager-9293 deploy/alertmanager-9293 + 10.0.1.5:9090 prometheus 10.0.1.5 9090 Up data/prometheus-9090 deploy/prometheus-9090 + 10.0.1.5:3000 grafana 10.0.1.5 3000 Up - deploy/grafana-3000 + 10.0.1.5:9093 alertmanager 10.0.1.5 9093/9294 Up data/alertmanager-9093 deploy/alertmanager-9093 ``` 2. Run the scale-in command: @@ -181,47 +289,166 @@ If you want to remove a TiKV node from the `10.0.1.5` host, take the following s tiup cluster display ``` - The current topology is as follows: + Access the monitoring platform at using your browser, and view the status of the cluster. - | Host IP | Service | - |:----|:----| - | 10.0.1.3 | TiDB + TiFlash | - | 10.0.1.4 | TiDB + PD + TiFlash | - | 10.0.1.5 | TiDB + Monitor **(TiKV is deleted)** | - | 10.0.1.1 | TiKV | - | 10.0.1.2 | TiKV | +The current topology is as follows: - Access the monitoring platform at using your browser to monitor the status of the cluster. +| Host IP | Service | +|:----|:----| +| 10.0.1.3 | TiDB + TiFlash + TiCDC | +| 10.0.1.4 | TiDB + PD + TiFlash + TiCDC | +| 10.0.1.5 | TiDB + Monitor **(TiKV is deleted)** | +| 10.0.1.1 | TiKV | +| 10.0.1.2 | TiKV | ## Scale in a TiFlash node -If you want to remove the TiFlash node from the `10.0.1.4` host, take the following steps. +If you want to remove a TiFlash node from the `10.0.1.4` host, take the following steps. + +### 1. Adjust the number of replicas of the tables according to the number of remaining TiFlash nodes + +Before the node goes down, make sure that the number of remaining nodes in the TiFlash cluster is no smaller than the maximum number of replicas of all tables. Otherwise, modify the number of TiFlash replicas of the related tables. + +1. For all tables whose replicas are greater than the number of remaining TiFlash nodes in the cluster, execute the following command in the TiDB client: + + {{< copyable "sql" >}} + + ```sql + alter table . set tiflash replica 0; + ``` + +2. Wait for the TiFlash replicas of the related tables to be deleted. [Check the table replication progress](/tiflash/use-tiflash.md#check-the-replication-progress) and the replicas are deleted if the replication information of the related tables is not found. + +### 2. Scale in the TiFlash node + +Next, perform the scale-in operation with one of the following solutions. + +#### Solution 1: Using TiUP to scale in the TiFlash node + +1. First, confirm the name of the node to be taken down: + + {{< copyable "shell-regular" >}} + + ```shell + tiup cluster display + ``` + +2. Scale in the TiFlash node (assume that the node name is `10.0.1.4:9000` from Step 1): + + {{< copyable "shell-regular" >}} + + ```shell + tiup cluster scale-in --node 10.0.1.4:9000 + ``` + +#### Solution 2: Manually scale in the TiFlash node + +In special cases (such as when a node needs to be forcibly taken down), or if the TiUP scale-in operation fails, you can manually scale in a TiFlash node with the following steps. + +1. Use the store command of pd-ctl to view the store ID corresponding to this TiFlash node. + + * Enter the store command in [pd-ctl](/pd-control.md) (the binary file is under `resources/bin` in the tidb-ansible directory). + + * If you use TiUP deployment, replace `pd-ctl` with `tiup ctl pd`: + + {{< copyable "shell-regular" >}} + + ```shell + tiup ctl pd -u store + ``` + +2. Scale in the TiFlash node in pd-ctl: + + * Enter `store delete ` in pd-ctl (`` is the store ID of the TiFlash node found in the previous step. + + * If you use TiUP deployment, replace `pd-ctl` with `tiup ctl pd`: + + {{< copyable "shell-regular" >}} + + ```shell + tiup ctl pd -u store delete + ``` + +3. Wait for the store of the TiFlash node to disappear or for the `state_name` to become `Tombstone` before you stop the TiFlash process. + + If, after waiting for a long time, the node still fails to disappear or the `state_name` fails to become `Tombstone`, consider using the following command to force the node out of the cluster. + + **Note that the command will directly discard the replicas on the TiFlash node, which might cause the query to fail.** + + {{< copyable "shell-regular" >}} + + ```shell + curl -X POST 'http:///pd/api/v1/store//state?state=Tombstone' + ``` + +4. Manually delete TiFlash data files (whose location can be found in the `data_dir` directory under the TiFlash configuration of the cluster topology file). + +5. Manually update TiUP's cluster configuration file (delete the information of the TiFlash node that goes down in edit mode). + + {{< copyable "shell-regular">} + + ```shell + tiup cluster edit-config + ``` > **Note:** > -> The scale-in process described in this section does not delete the data on the node that goes offline. If you need to bring the node back again, delete the data manually. +> Before all TiFlash nodes in the cluster stop running, if not all tables replicated to TiFlash are canceled, you need to manually clean up the replication rules in PD, or the TiFlash node cannot be taken down successfully. -1. Take the node offline: +The steps to manually clean up the replication rules in PD are below: - To take offline the node to be scaled in, refer to [Take a TiFlash node down](/tiflash/maintain-tiflash.md#take-a-tiflash-node-down). +1. View all data replication rules related to TiFlash in the current PD instance: -2. Check the node status: + {{< copyable "shell-regular" >}} - The scale-in process takes some time. + ```shell + curl http://:/pd/api/v1/config/rules/group/tiflash + ``` + + ``` + [ + { + "group_id": "tiflash", + "id": "table-45-r", + "override": true, + "start_key": "7480000000000000FF2D5F720000000000FA", + "end_key": "7480000000000000FF2E00000000000000F8", + "role": "learner", + "count": 1, + "label_constraints": [ + { + "key": "engine", + "op": "in", + "values": [ + "tiflash" + ] + } + ] + } + ] + ``` - You can use Grafana or pd-ctl to check whether the node has been successfully taken offline. +2. Remove all data replication rules related to TiFlash. Take the rule whose `id` is `table-45-r` as an example. Delete it by the following command: -3. Stop the TiFlash process: + {{< copyable "shell-regular" >}} - After the `store` corresponding to TiFlash disappears, or the `state_name` becomes `Tombstone`, execute the following command to stop the TiFlash process: + ```shell + curl -v -X DELETE http://:/pd/api/v1/config/rule/tiflash/table-45-r + ``` + +## Scale in a TiCDC node + +If you want to remove the TiCDC node from the `10.0.1.4` host, take the following steps. + +1. Take the node offline: {{< copyable "shell-regular" >}} ```shell - tiup cluster scale-in --node 10.0.1.4:9000 + tiup cluster scale-in --node 10.0.1.4:8300 ``` -4. View the cluster status: +2. View the cluster status: {{< copyable "shell-regular" >}} @@ -229,4 +456,14 @@ If you want to remove the TiFlash node from the `10.0.1.4` host, take the follow tiup cluster display ``` - Access the monitoring platform at using your browser, and view the status of the cluster. + Access the monitoring platform at using your browser, and view the status of the cluster. + +The current topology is as follows: + +| Host IP | Service | +|:----|:----| +| 10.0.1.3 | TiDB + TiFlash + TiCDC | +| 10.0.1.4 | TiDB + PD + **(TiCDC is deleted)** | +| 10.0.1.5 | TiDB + Monitor | +| 10.0.1.1 | TiKV | +| 10.0.1.2 | TiKV | diff --git a/ticdc/deploy-ticdc.md b/ticdc/deploy-ticdc.md index ccca72666889e..41775158d7709 100644 --- a/ticdc/deploy-ticdc.md +++ b/ticdc/deploy-ticdc.md @@ -91,7 +91,7 @@ To deploy TiCDC, take the following steps: 1. Check if your TiDB version supports TiCDC. If not, upgrade the TiDB cluster to 4.0.0 rc.1 or later versions. -2. Refer to [Scale out a TiDB/TiKV/PD/TiCDC node](/scale-tidb-using-tiup.md#scale-out-a-tidbtikvpdticdc-node) and deploy TiCDC. +2. Refer to [Scale out a TiDB/TiKV/PD/TiCDC node](/scale-tidb-using-tiup.md#scale-out-a-tidbpdtikv-node) and deploy TiCDC. This is an example of the scale-out configuration file: diff --git a/tiflash/maintain-tiflash.md b/tiflash/maintain-tiflash.md index ce601cc2fc3bb..061a47f20694b 100644 --- a/tiflash/maintain-tiflash.md +++ b/tiflash/maintain-tiflash.md @@ -7,7 +7,7 @@ aliases: ['/docs/dev/reference/tiflash/maintain/'] # Maintain a TiFlash Cluster -This document describes how to perform common operations when you maintain a TiFlash cluster, including checking the TiFlash version, and taking TiFlash nodes down. This document also introduces critical logs and a system table of TiFlash. +This document describes how to perform common operations when you maintain a TiFlash cluster, including checking the TiFlash version. This document also introduces critical logs and a system table of TiFlash. ## Check the TiFlash version @@ -31,77 +31,6 @@ There are two ways to check the TiFlash version: : TiFlash version: TiFlash 0.2.0 master-375035282451103999f3863c691e2fc2 ``` -## Take a TiFlash node down - -Taking a TiFlash node down differs from [Scaling in a TiFlash node](/scale-tidb-using-tiup.md#scale-in-a-tiflash-node) in that the former doesn't remove the node in TiDB Ansible; instead, it just safely shuts down the TiFlash process. - -Follow the steps below to take a TiFlash node down: - -> **Note:** -> -> After you take the TiFlash node down, if the number of the remaining nodes in the TiFlash cluster is greater than or equal to the maximum replicas of all data tables, you can go directly to step 3. - -1. If the number of replicas of tables is greater than or equal to that of the remaining TiFlash nodes in the cluster, execute the following command on these tables in the TiDB client: - - {{< copyable "sql" >}} - - ```sql - alter table . set tiflash replica 0; - ``` - -2. To ensure that the TiFlash replicas of these tables are removed, see [Check the Replication Progress](/tiflash/use-tiflash.md#check-the-replication-progress). If you cannot view the replication progress of the related tables, it means that the replicas are removed. - -3. Input the `store` command into [pd-ctl](/pd-control.md) (the binary file is in `resources/bin` of the tidb-ansible directory) to view the `store id` of the TiFlash node. - -4. Input `store delete ` into `pd-ctl`. Here `` refers to the `store id` in step 3. - -5. When the corresponding `store` of the node disappears, or when `state_name` is changed to `Tombstone`, stop the TiFlash process. - -> **Note:** -> -> If you don't cancel all tables replicated to TiFlash before all TiFlash nodes stop running, you need to manually delete the replication rules in PD. Or you cannot successfully take the TiFlash node down. - -To manually delete the replication rules in PD, take the following steps: - -1. Query all the data replication rules related to TiFlash in the current PD instance: - - {{< copyable "shell-regular" >}} - - ```shell - curl http://:/pd/api/v1/config/rules/group/tiflash - ``` - - ``` - [ - { - "group_id": "tiflash", - "id": "table-45-r", - "override": true, - "start_key": "7480000000000000FF2D5F720000000000FA", - "end_key": "7480000000000000FF2E00000000000000F8", - "role": "learner", - "count": 1, - "label_constraints": [ - { - "key": "engine", - "op": "in", - "values": [ - "tiflash" - ] - } - ] - } - ] - ``` - -2. Delete all the data replication rules related to TiFlash. The following example command deletes the rule whose `id` is `table-45-r`: - - {{< copyable "shell-regular" >}} - - ```shell - curl -v -X DELETE http://:/pd/api/v1/config/rule/tiflash/table-45-r - ``` - ## TiFlash critical logs | Log Information | Log Description |