From c21e17ab5032ed7df9e12e6a1fdf968ec553fbb1 Mon Sep 17 00:00:00 2001 From: toutdesuite Date: Thu, 4 Jun 2020 09:07:25 +0800 Subject: [PATCH 1/5] Update manage-ticdc.md --- ticdc/manage-ticdc.md | 301 +++++++++++++++++++++++++++--------------- 1 file changed, 192 insertions(+), 109 deletions(-) diff --git a/ticdc/manage-ticdc.md b/ticdc/manage-ticdc.md index 99bdccf1e5dd7..c2ff879d98169 100644 --- a/ticdc/manage-ticdc.md +++ b/ticdc/manage-ticdc.md @@ -7,11 +7,47 @@ aliases: ['/docs/dev/reference/tools/ticdc/manage/'] # Manage TiCDC Cluster and Replication Tasks -Currently, you can manage a TiCDC cluster and replication tasks using the `cdc cli` command-line tool or the HTTP interface. +This document describes how to deploy a TiCDC cluster and how to manage the TiCDC cluster and replication tasks through the command line tool `cdc cli` and the HTTP interface. + +## Deploy TiCDC + +### Using TiUP to deploy TiCDC + +#### Deploy a TiDB cluster with the TiCDC component + +For details, refer to [Deploy a TiDB Cluster Using TiUP](/production-deployment-using-tiup.md#step-3-edit-the-initialization-configuration-file). + +#### Deploy a TiCDC component on an existing TiDB cluster + +1. First, make sure that the current TiDB version supports TiCDC; otherwise, you need to upgrade the TiDB cluster to `v4.0.0 rc.1` or later versions. + +2. To deploy TiCDC, refer to [Scale out a TiDB/TiKV/PD/TiCDC node](/scale-tidb-using-tiup.md#scale-out-a-tidbtikvpdticdc-node). + +### Deploy a TiCDC component on an existing TiDB cluster using Binary + +Suppose that in the PD cluster, there is a PD node (the client URL is `10.0.10.25:2379`) that can provide services. If you want to deploy three TiCDC nodes, start the TiCDC cluster by executing the following commands. Note that you only need to specify the same PD address, the newly started nodes automatically join the TiCDC cluster. + +{{< copyable "shell-regular" >}} + +```shell +cdc server --pd=http://10.0.10.25:2379 --log-file=ticdc_1.log --addr=0.0.0.0:8301 --advertise-addr=127.0.0.1:8301 +cdc server --pd=http://10.0.10.25:2379 --log-file=ticdc_2.log --addr=0.0.0.0:8302 --advertise-addr=127.0.0.1:8302 +cdc server --pd=http://10.0.10.25:2379 --log-file=ticdc_3.log --addr=0.0.0.0:8303 --advertise-addr=127.0.0.1:8303 +``` + +The following are descriptions of options available in the `cdc server` command: + +- `gc-ttl`: The TTL (Time To Live) of the service level `GC safepoint` in PD set by TiCDC, specified in seconds. The default value is `86400`, which means 24 hours. +- `pd`: URL of the PD client. +- `addr`: The listening address of TiCDC, the HTTP API address, and the Prometheus address of the service. +- `advertise-addr`: The access address of TiCDC to the outside world. +- `tz`: Time zone used by the TiCDC service. TiCDC uses this time zone when time data types such as `TIMESTAMP` are converted internally or when data are replicated to the downstream. The default is the local time zone in which the process runs. +- `log-file`: The address of the running log of the TiCDC process. The default is `cdc.log`. +- `log-level`: The log level when the TiCDC process is running. The default is `info`. ## Use `cdc cli` to manage cluster status and data replication task -This section introduces how to use `cdc cli` to manage a TiCDC cluster and data replication tasks. +This section introduces how to use `cdc cli` to manage a TiCDC cluster and data replication tasks. In the following interface description, it is assumed that PD listens on `127.0.0.1` and the port is `2379`. ### Manage TiCDC service progress (`capture`) @@ -20,7 +56,7 @@ This section introduces how to use `cdc cli` to manage a TiCDC cluster and data {{< copyable "shell-regular" >}} ```shell - cdc cli capture list + cdc cli capture list --pd=http://127.0.0.1:2379 ``` ``` @@ -38,61 +74,172 @@ This section introduces how to use `cdc cli` to manage a TiCDC cluster and data ### Manage replication tasks (`changefeed`) -- Create `changefeed`: +#### Create a replication task - {{< copyable "shell-regular" >}} +Execute the following commands to create a replication task: - ```shell - cdc cli changefeed create --sink-uri="mysql://root:123456@127.0.0.1:3306/" - create changefeed ID: 28c43ffc-2316-4f4f-a70b-d1a7c59ba79f info {"sink-uri":"mysql://root:123456@127.0.0.1:3306/","opts":{},"create-time":"2020-03-12T22:04:08.103600025+08:00","start-ts":415241823337054209,"target-ts":0,"admin-job-type":0,"config":{"filter-case-sensitive":false,"filter-rules":null,"ignore-txn-commit-ts":null}} - ``` +{{< copyable "shell-regular" >}} + +```shell +cdc cli changefeed create --pd=http://127.0.0.1:2379 --sink-uri="mysql://root:123456@127.0.0.1:3306/" +create changefeed ID: 28c43ffc-2316-4f4f-a70b-d1a7c59ba79f info {"sink-uri":"mysql://root:123456@127.0.0.1:3306/","opts":{},"create-time":"2020-03-12T22:04:08.103600025+08:00","start-ts":415241823337054209,"target-ts":0,"admin-job-type":0,"config":{"filter-case-sensitive":false,"filter-rules":null,"ignore-txn-start-ts":null}} +``` + +Configure `--sink-uri` according to the following format. Currently, the scheme supports `mysql`/`tidb`/`kafka`. + +{{< copyable "" >}} + +``` +[scheme]://[userinfo@][host]:[port][/path]?[query_parameters] +``` + +- Configure sink URI with `mysql`/`tidb` -- Query the `changefeed` list: + Sample configuration: {{< copyable "shell-regular" >}} ```shell - cdc cli changefeed list + --sink-uri="mysql://root:123456@127.0.0.1:3306/?worker-count=16&max-txn-row=5000" ``` - ``` - [ - { - "id": "28c43ffc-2316-4f4f-a70b-d1a7c59ba79f" - } - ] - ``` + The following are descriptions of parameters and parameter values in the sample configuration: -- Query a specific `changefeed` which corresponds to the status of a specific replication task: + | Parameter/Parameter Value | Description | + | :------------ | :------------------------------------------------ | + | `root` | The username of the downstream database | + | `123456` | The password of the downstream database | + | `127.0.0.1` | The IP address of the downstream database | + | `3306` | The port for the downstream data | + | `worker-count` | The number of SQL statements that can be concurrently executed to the downstream (optional, `16` by default) | + | `max-txn-row` | The size of a transaction batch that can be executed to the downstream (optional, `256` by default)) | + +- Configure sink URI with `kafka` + + Sample configuration: {{< copyable "shell-regular" >}} ```shell - cdc cli changefeed query --changefeed-id=28c43ffc-2316-4f4f-a70b-d1a7c59ba79f + --sink-uri="kafka://127.0.0.1:9092/cdc-test?kafka-version=2.4.0&partition-num=6&max-message-bytes=67108864&replication-factor=1" ``` - ``` - { - "info": { - "sink-uri": "mysql://root:123456@127.0.0.1:3306/", - "opts": {}, - "create-time": "2020-03-12T22:04:08.103600025+08:00", - "start-ts": 415241823337054209, - "target-ts": 0, - "admin-job-type": 0, - "config": { - "filter-case-sensitive": false, - "filter-rules": null, - "ignore-txn-commit-ts": null - } - }, - "status": { - "resolved-ts": 415241860902289409, - "checkpoint-ts": 415241860640145409, - "admin-job-type": 0 - } - } - ``` + The following are descriptions of parameters and parameter values in the sample configuration: + + | Parameter/Parameter Value | Description | + | :------------------ | :------------------------------------------------------------ | + | `127.0.0.1` | The IP address of the downstream Kafka services | + | `9092` | The port for the downstream Kafka | + | `cdc-test` | The name of the Kafka topic | + | `kafka-version` | The version of the downstream Kafka (optional, `2.4.0` by default) | + | `partition-num` | The number of the downstream Kafka partitions (Optional. The value must be **no greater than** the actual number of partitions. If you do not configure this parameter, the partition number is obtained automatically.) | + | `max-message-bytes` | The maximum size of data that is sent to Kafka broker each time (optional, `64MB` by default) | + | `replication-factor` | The number of Kafka message replicas that can be saved (optional, `1` by default) | + +#### Query the replication task list + +Execute the following command to query the replication task list: + +{{< copyable "shell-regular" >}} + +```shell +cdc cli changefeed list --pd=http://127.0.0.1:2379 +``` + +``` +[ + { + "id": "28c43ffc-2316-4f4f-a70b-d1a7c59ba79f" + } +] +``` + +#### Query a specific replication task + +Execute the following command to query a specific replication task: + +{{< copyable "shell-regular" >}} + +```shell +cdc cli changefeed query --pd=http://127.0.0.1:2379 --changefeed-id=28c43ffc-2316-4f4f-a70b-d1a7c59ba79f +``` + +The information returned consists of `"info"` and `"status"` of the replication task. + +``` +{ + "info": { + "sink-uri": "mysql://root:123456@127.0.0.1:3306/", + "opts": {}, + "create-time": "2020-03-12T22:04:08.103600025+08:00", + "start-ts": 415241823337054209, + "target-ts": 0, + "admin-job-type": 0, + "config": { + "filter-case-sensitive": false, + "filter-rules": null, + "ignore-txn-start-ts": null + } + }, + "status": { + "resolved-ts": 415241860902289409, + "checkpoint-ts": 415241860640145409, + "admin-job-type": 0 + } +} +``` + +In the above command: + +- `resolved-ts`: The largest transaction `TS` in the current `changfeed`. Note that this `TS` has been successfully sent from TiKV to TiCDC. +- `checkpoint-ts`: The largest transaction `TS` in the current `changefeed` that has been successfully written to the downstream. +- `admin-job-type`: The status of a `changefeed`: + - `0`: The state is normal. It is the initial status. + - `1`: The task is paused. When the task is paused, all replicated `processor`s exit, while the configuration and the replication status of the task are retained. + - `2`: The task is resumed. The replication task resumes from `checkpoint-ts`. + - `3`: The task is removed. When the task is removed, all replicated `processor`s are ended, and the configuration information of the replication task is cleared up. Only the replication status is retained for later queries. + +#### Pause a replication task + +Execute the following command to pause a replication task: + +{{< copyable "shell-regular" >}} + +```shell +cdc cli changefeed pause --pd=http://127.0.0.1:2379 --changefeed-id 28c43ffc-2316-4f4f-a70b-d1a7c59ba79f +``` + +In the above command: + +- `--changefeed=uuid` represents the ID of the `changefeed` that corresponds to the replication task you want to pause. + +#### Resume a replication task + +Execute the following command to resume a paused replication task: + +{{< copyable "shell-regular" >}} + +```shell +cdc cli changefeed resume --pd=http://127.0.0.1:2379 --changefeed-id 28c43ffc-2316-4f4f-a70b-d1a7c59ba79f +``` + +In the above command: + +- `--changefeed=uuid` represents the ID of the `changefeed` that corresponds to the replication task you want to resume. + +#### Remove a replication task + +Execute the following command to remove a replication task: + +{{< copyable "shell-regular" >}} + +```shell +cdc cli changefeed remove --pd=http://127.0.0.1:2379 --changefeed-id 28c43ffc-2316-4f4f-a70b-d1a7c59ba79f +``` + +In the above command: + +- `--changefeed=uuid` represents the ID of the `changefeed` that corresponds to the replication task you want to remove. ### Manage processing units of replication sub-tasks (`processor`) @@ -101,7 +248,7 @@ This section introduces how to use `cdc cli` to manage a TiCDC cluster and data {{< copyable "shell-regular" >}} ```shell - cdc cli processor list + cdc cli processor list --pd=http://127.0.0.1:2379 ``` ``` @@ -119,7 +266,7 @@ This section introduces how to use `cdc cli` to manage a TiCDC cluster and data {{< copyable "shell-regular" >}} ```shell - cdc cli processor query --changefeed-id=28c43ffc-2316-4f4f-a70b-d1a7c59ba79f --capture-id=b293999a-4168-4988-a4f4-35d9589b226b + cdc cli processor query --pd=http://127.0.0.1:2379 --changefeed-id=28c43ffc-2316-4f4f-a70b-d1a7c59ba79f ``` ``` @@ -146,7 +293,7 @@ This section introduces how to use `cdc cli` to manage a TiCDC cluster and data Currently, the HTTP interface provides some basic features for query and maintenance. -In the following examples, suppose that the interface IP address for querying the TiCDC server status is `127.0.0.1`, and the port address is `8300` (you can specify the IP and port in `--status-addr=ip:port` when starting the TiCDC server). In later TiCDC versions, these features will be integrated to `cdc cli`. +In the following examples, suppose that the TiCDC server listens on `127.0.0.1`, and the port is `8300` (you can specify the IP and port in `--addr=ip:port` when starting the TiCDC server). ### Get the TiCDC server status @@ -196,70 +343,6 @@ For nodes other than owner nodes, executing the above command will return the fo election: not leader ``` -### Stop replication task - -Use the following command to stop a replication task: - -{{< copyable "shell-regular" >}} - -```shell -curl -X POST -d "admin-job=1&cf-id=28c43ffc-2316-4f4f-a70b-d1a7c59ba79f" http://127.0.0.1:8301/capture/owner/admin -``` - -``` -{ - "status": true, - "message": "" -} -``` - -In the above command: - -- `admin-job=1` means to stop the replication task. After the task is stopped, all `processor`s are stopped and exit but the configuration and status of the task are saved and can be recovered from `checkpoint-ts`. -- `cf-id=xxx` is the ID of `changefeed` that needs operation. - -### Resume replication task - -Use the following command to resume a replication task: - -{{< copyable "shell-regular" >}} - -```shell -curl -X POST -d "admin-job=2&cf-id=28c43ffc-2316-4f4f-a70b-d1a7c59ba79f" http://127.0.0.1:8301/capture/owner/admin -``` - -``` -{ - "status": true, - "message": "" -} -``` - -In the above command: - -- `admin-job=2` means to resume the replication task from `checkpoint-ts`. -- `cf-id=xxx` is the ID of `changefeed` that needs operation. - -### Delete replication task - -Use the following command to delete a replication task: - -{{< copyable "shell-regular" >}} - -```shell -curl -X POST -d "admin-job=3&cf-id=28c43ffc-2316-4f4f-a70b-d1a7c59ba79f" http://127.0.0.1:8301/capture/owner/admin -``` - -``` -{ - "status": true, - "message": "" -} -``` - -- `admin-job=3` means to delete the replication task. After the TiCDC server receives the request, all `processor`s are stopped and the configuration information of the task is cleared. The replication status is reserved. No service is available except for the query. -- `cf-id=xxx` is the ID of `changefeed` that needs operation. - ## Error handling This section introduces how to handle the error occurred when using TiCDC to replicate data. From 5a9acd41c0a3c31d0e1be635bcc3f523bd2efd75 Mon Sep 17 00:00:00 2001 From: toutdesuite Date: Fri, 5 Jun 2020 15:15:49 +0800 Subject: [PATCH 2/5] Apply suggestions from code review Co-authored-by: Ran --- ticdc/manage-ticdc.md | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/ticdc/manage-ticdc.md b/ticdc/manage-ticdc.md index c2ff879d98169..06bc09430136a 100644 --- a/ticdc/manage-ticdc.md +++ b/ticdc/manage-ticdc.md @@ -11,7 +11,7 @@ This document describes how to deploy a TiCDC cluster and how to manage the TiCD ## Deploy TiCDC -### Using TiUP to deploy TiCDC +### Use TiUP #### Deploy a TiDB cluster with the TiCDC component @@ -23,9 +23,11 @@ For details, refer to [Deploy a TiDB Cluster Using TiUP](/production-deployment- 2. To deploy TiCDC, refer to [Scale out a TiDB/TiKV/PD/TiCDC node](/scale-tidb-using-tiup.md#scale-out-a-tidbtikvpdticdc-node). -### Deploy a TiCDC component on an existing TiDB cluster using Binary +### Use Binary -Suppose that in the PD cluster, there is a PD node (the client URL is `10.0.10.25:2379`) that can provide services. If you want to deploy three TiCDC nodes, start the TiCDC cluster by executing the following commands. Note that you only need to specify the same PD address, the newly started nodes automatically join the TiCDC cluster. +Binary only supports deploying the TiCDC component on an existing TiDB cluster. + +Suppose that the PD cluster has a PD node (the client URL is `10.0.10.25:2379`) that can provide services. If you want to deploy three TiCDC nodes, start the TiCDC cluster by executing the following commands. Note that you only need to specify the same PD address, the newly started nodes automatically join the TiCDC cluster. {{< copyable "shell-regular" >}} @@ -37,8 +39,8 @@ cdc server --pd=http://10.0.10.25:2379 --log-file=ticdc_3.log --addr=0.0.0.0:830 The following are descriptions of options available in the `cdc server` command: -- `gc-ttl`: The TTL (Time To Live) of the service level `GC safepoint` in PD set by TiCDC, specified in seconds. The default value is `86400`, which means 24 hours. -- `pd`: URL of the PD client. +- `gc-ttl`: The TTL (Time To Live) of the service level `GC safepoint` in PD set by TiCDC, in seconds. The default value is `86400`, which means 24 hours. +- `pd`: The URL of the PD client. - `addr`: The listening address of TiCDC, the HTTP API address, and the Prometheus address of the service. - `advertise-addr`: The access address of TiCDC to the outside world. - `tz`: Time zone used by the TiCDC service. TiCDC uses this time zone when time data types such as `TIMESTAMP` are converted internally or when data are replicated to the downstream. The default is the local time zone in which the process runs. @@ -47,7 +49,7 @@ The following are descriptions of options available in the `cdc server` command: ## Use `cdc cli` to manage cluster status and data replication task -This section introduces how to use `cdc cli` to manage a TiCDC cluster and data replication tasks. In the following interface description, it is assumed that PD listens on `127.0.0.1` and the port is `2379`. +This section introduces how to use `cdc cli` to manage a TiCDC cluster and data replication tasks. The following interface description assumes that PD listens on `127.0.0.1` and the port is `2379`. ### Manage TiCDC service progress (`capture`) @@ -191,11 +193,11 @@ The information returned consists of `"info"` and `"status"` of the replication In the above command: -- `resolved-ts`: The largest transaction `TS` in the current `changfeed`. Note that this `TS` has been successfully sent from TiKV to TiCDC. +- `resolved-ts`: The largest transaction `TS` in the current `changefeed`. Note that this `TS` has been successfully sent from TiKV to TiCDC. - `checkpoint-ts`: The largest transaction `TS` in the current `changefeed` that has been successfully written to the downstream. - `admin-job-type`: The status of a `changefeed`: - `0`: The state is normal. It is the initial status. - - `1`: The task is paused. When the task is paused, all replicated `processor`s exit, while the configuration and the replication status of the task are retained. + - `1`: The task is paused. When the task is paused, all replicated `processor`s exit. The configuration and the replication status of the task are retained, so you can resume the task from `checkpiont-ts`. - `2`: The task is resumed. The replication task resumes from `checkpoint-ts`. - `3`: The task is removed. When the task is removed, all replicated `processor`s are ended, and the configuration information of the replication task is cleared up. Only the replication status is retained for later queries. From b1504de2eacae458b44492de7ba3568883eb7a50 Mon Sep 17 00:00:00 2001 From: Ran Date: Fri, 5 Jun 2020 21:09:43 +0800 Subject: [PATCH 3/5] add sentences between headings --- ticdc/manage-ticdc.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/ticdc/manage-ticdc.md b/ticdc/manage-ticdc.md index 06bc09430136a..6108580e98d1e 100644 --- a/ticdc/manage-ticdc.md +++ b/ticdc/manage-ticdc.md @@ -11,8 +11,15 @@ This document describes how to deploy a TiCDC cluster and how to manage the TiCD ## Deploy TiCDC +You can deploy TiCDC using either TiUP or Binary. + ### Use TiUP +If you use TiUP to deploy TiCDC, you can choose one of the following ways: + +- Deploy TiCDC when deploying a TiDB cluster +- Deploy a TiCDC component on an existing TiDB cluster + #### Deploy a TiDB cluster with the TiCDC component For details, refer to [Deploy a TiDB Cluster Using TiUP](/production-deployment-using-tiup.md#step-3-edit-the-initialization-configuration-file). From d1d4114842627ad829c6d6516162b7894d563df0 Mon Sep 17 00:00:00 2001 From: Ran Date: Sat, 6 Jun 2020 10:19:13 +0800 Subject: [PATCH 4/5] Update ticdc/manage-ticdc.md Co-authored-by: toutdesuite --- ticdc/manage-ticdc.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ticdc/manage-ticdc.md b/ticdc/manage-ticdc.md index 6108580e98d1e..9a6255f9d93ab 100644 --- a/ticdc/manage-ticdc.md +++ b/ticdc/manage-ticdc.md @@ -20,7 +20,7 @@ If you use TiUP to deploy TiCDC, you can choose one of the following ways: - Deploy TiCDC when deploying a TiDB cluster - Deploy a TiCDC component on an existing TiDB cluster -#### Deploy a TiDB cluster with the TiCDC component +#### Deploy TiCDC when deploying a TiDB cluster For details, refer to [Deploy a TiDB Cluster Using TiUP](/production-deployment-using-tiup.md#step-3-edit-the-initialization-configuration-file). From 26d18ec595a4890ed699b0f1f838f7b5f78ceb9c Mon Sep 17 00:00:00 2001 From: toutdesuite Date: Mon, 8 Jun 2020 14:11:04 +0800 Subject: [PATCH 5/5] remove error handling --- ticdc/manage-ticdc.md | 11 ----------- 1 file changed, 11 deletions(-) diff --git a/ticdc/manage-ticdc.md b/ticdc/manage-ticdc.md index 9a6255f9d93ab..8ced4e29a4dff 100644 --- a/ticdc/manage-ticdc.md +++ b/ticdc/manage-ticdc.md @@ -351,14 +351,3 @@ For nodes other than owner nodes, executing the above command will return the fo ``` election: not leader ``` - -## Error handling - -This section introduces how to handle the error occurred when using TiCDC to replicate data. - -### An error occurs when TiCDC replicates statements downstream - -When an error occurs when TiCDC executes DDL or DML statements downstream, the replication task is stopped. - -- If the error occurs because of downstream anomalies or network jitter, directly resume the replication task; -- If the error occurs because the downstream is incompatible with the SQL statement, resuming the task will fail. In this situation, you can configure the `ignore-txn-commit-ts` parameter in the replication configuration to skip the transaction at `commit-ts` and resume the task.