diff --git a/migrate-from-tidb-to-mysql.md b/migrate-from-tidb-to-mysql.md index c897c1ef04186..e12657daf75c0 100644 --- a/migrate-from-tidb-to-mysql.md +++ b/migrate-from-tidb-to-mysql.md @@ -158,12 +158,12 @@ After setting up the environment, you can use [Dumpling](/dumpling-overview.md) In the upstream cluster, run the following command to create a changefeed from the upstream to the downstream clusters: ```shell - tiup ctl: cdc changefeed create --pd=http://127.0.0.1:2379 --sink-uri="mysql://root:@127.0.0.1:3306" --changefeed-id="upstream-to-downstream" --start-ts="434217889191428107" + tiup ctl: cdc changefeed create --server=http://127.0.0.1:8300 --sink-uri="mysql://root:@127.0.0.1:3306" --changefeed-id="upstream-to-downstream" --start-ts="434217889191428107" ``` In this command, the parameters are as follows: - - `--pd`: PD address of the upstream cluster + - `--server`: IP address of any node in the TiCDC cluster - `--sink-uri`: URI of the downstream cluster - `--changefeed-id`: changefeed ID, must be in the format of a regular expression, `^[a-zA-Z0-9]+(\-[a-zA-Z0-9]+)*$` - `--start-ts`: start timestamp of the changefeed, must be the backup time (or BackupTS in the "Back up data" section in [Step 2. Migrate full data](#step-2-migrate-full-data)) diff --git a/migrate-from-tidb-to-tidb.md b/migrate-from-tidb-to-tidb.md index 721aa6b19a184..18b9f7bad2bfe 100644 --- a/migrate-from-tidb-to-tidb.md +++ b/migrate-from-tidb-to-tidb.md @@ -219,12 +219,12 @@ After setting up the environment, you can use the backup and restore functions o {{< copyable "shell-regular" >}} ```shell - tiup cdc cli changefeed create --pd=http://172.16.6.122:2379 --sink-uri="mysql://root:@172.16.6.125:4000" --changefeed-id="upstream-to-downstream" --start-ts="431434047157698561" + tiup cdc cli changefeed create --server=http://172.16.6.122:8300 --sink-uri="mysql://root:@172.16.6.125:4000" --changefeed-id="upstream-to-downstream" --start-ts="431434047157698561" ``` In this command, the parameters are as follows: - - `--pd`: PD address of the upstream cluster + - `--server`: IP address of any node in the TiCDC cluster - `--sink-uri`: URI of the downstream cluster - `--changefeed-id`: changefeed ID, must be in the format of a regular expression, ^[a-zA-Z0-9]+(\-[a-zA-Z0-9]+)*$ - `--start-ts`: start timestamp of the changefeed, must be the backup time (or BackupTS in the "Back up data" section in [Step 2. Migrate full data](#step-2-migrate-full-data)) @@ -268,7 +268,7 @@ After creating a changefeed, data written to the upstream cluster is replicated ```shell # Stop the changefeed from the upstream cluster to the downstream cluster - tiup cdc cli changefeed pause -c "upstream-to-downstream" --pd=http://172.16.6.122:2379 + tiup cdc cli changefeed pause -c "upstream-to-downstream" --server=http://172.16.6.122:8300 # View the changefeed status tiup cdc cli changefeed list @@ -291,7 +291,7 @@ After creating a changefeed, data written to the upstream cluster is replicated 2. Create a changefeed from downstream to upstream. You can leave `start-ts` unspecified so as to use the default setting, because the upstream and downstream data are consistent and there is no new data written to the cluster. ```shell - tiup cdc cli changefeed create --pd=http://172.16.6.125:2379 --sink-uri="mysql://root:@172.16.6.122:4000" --changefeed-id="downstream -to-upstream" + tiup cdc cli changefeed create --server=http://172.16.6.125:8300 --sink-uri="mysql://root:@172.16.6.122:4000" --changefeed-id="downstream -to-upstream" ``` 3. After migrating writing services to the downstream cluster, observe for a period. If the downstream cluster is stable, you can discard the upstream cluster. diff --git a/replicate-between-primary-and-secondary-clusters.md b/replicate-between-primary-and-secondary-clusters.md index 43c4135ff9171..c4ccb346ef1b2 100644 --- a/replicate-between-primary-and-secondary-clusters.md +++ b/replicate-between-primary-and-secondary-clusters.md @@ -233,12 +233,12 @@ After setting up the environment, you can use the backup and restore functions o In the upstream cluster, run the following command to create a changefeed from the upstream to the downstream clusters: ```shell - tiup cdc cli changefeed create --pd=http://172.16.6.122:2379 --sink-uri="mysql://root:@172.16.6.125:4000" --changefeed-id="primary-to-secondary" --start-ts="431434047157698561" + tiup cdc cli changefeed create --server=http://172.16.6.122:8300 --sink-uri="mysql://root:@172.16.6.125:4000" --changefeed-id="primary-to-secondary" --start-ts="431434047157698561" ``` In this command, the parameters are as follows: - - `--pd`: PD address of the upstream cluster + - `--server`: IP address of any node in the TiCDC cluster - `--sink-uri`: URI of the downstream cluster - `--start-ts`: start timestamp of the changefeed, must be the backup time (or BackupTS mentioned in [Step 2. Migrate full data](#step-2-migrate-full-data)) @@ -312,5 +312,5 @@ After the previous step, the downstream (secondary) cluster has data that is con ```shell # Create a changefeed - tiup cdc cli changefeed create --pd=http://172.16.6.122:2379 --sink-uri="mysql://root:@172.16.6.125:4000" --changefeed-id="primary-to-secondary" + tiup cdc cli changefeed create --server=http://172.16.6.122:8300 --sink-uri="mysql://root:@172.16.6.125:4000" --changefeed-id="primary-to-secondary" ``` diff --git a/replicate-data-to-kafka.md b/replicate-data-to-kafka.md index 1487ec03afbc9..bc737a4f5d8c8 100644 --- a/replicate-data-to-kafka.md +++ b/replicate-data-to-kafka.md @@ -57,7 +57,7 @@ The preceding steps are performed in a lab environment. You can also deploy a cl 2. Create a changefeed to replicate incremental data to Kafka: ```shell - tiup ctl: cdc changefeed create --pd="http://127.0.0.1:2379" --sink-uri="kafka://127.0.0.1:9092/kafka-topic-name?protocol=canal-json" --changefeed-id="kafka-changefeed" --config="changefeed.conf" + tiup ctl: cdc changefeed create --server="http://127.0.0.1:8300" --sink-uri="kafka://127.0.0.1:9092/kafka-topic-name?protocol=canal-json" --changefeed-id="kafka-changefeed" --config="changefeed.conf" ``` - If the changefeed is successfully created, changefeed information, such as changefeed ID, is displayed, as shown below: @@ -73,13 +73,13 @@ The preceding steps are performed in a lab environment. You can also deploy a cl In a production environment, a Kafka cluster has multiple broker nodes. Therefore, you can add the addresses of multiple brokers to the sink UIR. This ensures stable access to the Kafka cluster. When the Kafka cluster is down, the changefeed still works. Suppose that a Kafka cluster has three broker nodes, with IP addresses being 127.0.0.1:9092, 127.0.0.2:9092, and 127.0.0.3:9092, respectively. You can create a changefeed with the following sink URI. ```shell - tiup ctl: cdc changefeed create --pd="http://127.0.0.1:2379" --sink-uri="kafka://127.0.0.1:9092,127.0.0.2:9092,127.0.0.3:9092/kafka-topic-name?protocol=canal-json&partition-num=3&replication-factor=1&max-message-bytes=1048576" --config="changefeed.conf" + tiup ctl: cdc changefeed create --server="http://127.0.0.1:8300" --sink-uri="kafka://127.0.0.1:9092,127.0.0.2:9092,127.0.0.3:9092/kafka-topic-name?protocol=canal-json&partition-num=3&replication-factor=1&max-message-bytes=1048576" --config="changefeed.conf" ``` 3. After creating the changefeed, run the following command to check the changefeed status: ```shell - tiup ctl: cdc changefeed list --pd="http://127.0.0.1:2379" + tiup ctl: cdc changefeed list --server="http://127.0.0.1:8300" ``` You can refer to [Manage TiCDC Changefeeds](/ticdc/ticdc-manage-changefeed.md) to manage the changefeed. diff --git a/ticdc/deploy-ticdc.md b/ticdc/deploy-ticdc.md index 321323842fd60..4e18bef5e2d1a 100644 --- a/ticdc/deploy-ticdc.md +++ b/ticdc/deploy-ticdc.md @@ -111,7 +111,7 @@ When you upgrade a TiCDC cluster, you need to pay attention to the following: ## Modify TiCDC cluster configurations using TiUP -This section describes how to use the [`tiup cluster edit-config`](/tiup/tiup-component-cluster-edit-config.md) command to modify the configurations of TiCDC. In the following example, it is assumed that you need to change the default value of `gc-ttl` from `86400` to `3600` (1 hour). +This section describes how to use the [`tiup cluster edit-config`](/tiup/tiup-component-cluster-edit-config.md) command to modify the configurations of TiCDC. In the following example, it is assumed that you need to change the default value of `gc-ttl` from `86400` to `172800` (48 hours). 1. Run the `tiup cluster edit-config` command. Replace `` with the actual cluster name: @@ -131,9 +131,11 @@ This section describes how to use the [`tiup cluster edit-config`](/tiup/tiup-co pump: {} drainer: {} cdc: - gc-ttl: 3600 + gc-ttl: 172800 ``` + In the preceding command, `gc-ttl` is set to 48 hours. + 3. Run the `tiup cluster reload -R cdc` command to reload the configuration. ## Stop and start TiCDC using TiUP @@ -161,12 +163,14 @@ tiup ctl: cdc capture list --server=http://10.0.10.25:8300 { "id": "806e3a1b-0e31-477f-9dd6-f3f2c570abdd", "is-owner": true, - "address": "127.0.0.1:8300" + "address": "127.0.0.1:8300", + "cluster-id": "default" }, { "id": "ea2a4203-56fe-43a6-b442-7b295f458ebc", "is-owner": false, - "address": "127.0.0.1:8301" + "address": "127.0.0.1:8301", + "cluster-id": "default" } ] ``` @@ -174,3 +178,4 @@ tiup ctl: cdc capture list --server=http://10.0.10.25:8300 - `id`: Indicates the ID of the service process. - `is-owner`: Indicates whether the service process is the owner node. - `address`: Indicates the address via which the service process provides interface to the outside. +- `cluster-id`: Indicates the ID of the TiCDC cluster. The default value is `default`. diff --git a/ticdc/integrate-confluent-using-ticdc.md b/ticdc/integrate-confluent-using-ticdc.md index 133b36151709d..4974acb861b80 100644 --- a/ticdc/integrate-confluent-using-ticdc.md +++ b/ticdc/integrate-confluent-using-ticdc.md @@ -99,7 +99,7 @@ The preceding steps are performed in a lab environment. You can also deploy a cl 2. Create a changefeed to replicate incremental data to Confluent Cloud: ```shell - tiup ctl: cdc changefeed create --pd="http://127.0.0.1:2379" --sink-uri="kafka:///ticdc-meta?protocol=avro&replication-factor=3&enable-tls=true&auto-create-topic=true&sasl-mechanism=plain&sasl-user=&sasl-password=" --schema-registry="https://:@" --changefeed-id="confluent-changefeed" --config changefeed.conf + tiup ctl: cdc changefeed create --server="http://127.0.0.1:8300" --sink-uri="kafka:///ticdc-meta?protocol=avro&replication-factor=3&enable-tls=true&auto-create-topic=true&sasl-mechanism=plain&sasl-user=&sasl-password=" --schema-registry="https://:@" --changefeed-id="confluent-changefeed" --config changefeed.conf ``` You need to replace the values of the following fields with those created or recorded in [Step 2. Create an access key pair](#step-2-create-an-access-key-pair): @@ -114,7 +114,7 @@ The preceding steps are performed in a lab environment. You can also deploy a cl Note that you should encode `` based on [HTML URL Encoding Reference](https://www.w3schools.com/tags/ref_urlencode.asp) before replacing its value. After you replace all the preceding fields, the configuration file is as follows: ```shell - tiup ctl: cdc changefeed create --pd="http://127.0.0.1:2379" --sink-uri="kafka://xxx-xxxxx.ap-east-1.aws.confluent.cloud:9092/ticdc-meta?protocol=avro&replication-factor=3&enable-tls=true&auto-create-topic=true&sasl-mechanism=plain&sasl-user=L5WWA4GK4NAT2EQV&sasl-password=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" --schema-registry="https://7NBH2CAFM2LMGTH7:xxxxxxxxxxxxxxxxxx@yyy-yyyyy.us-east-2.aws.confluent.cloud" --changefeed-id="confluent-changefeed" --config changefeed.conf + tiup ctl: cdc changefeed create --server="http://127.0.0.1:8300" --sink-uri="kafka://xxx-xxxxx.ap-east-1.aws.confluent.cloud:9092/ticdc-meta?protocol=avro&replication-factor=3&enable-tls=true&auto-create-topic=true&sasl-mechanism=plain&sasl-user=L5WWA4GK4NAT2EQV&sasl-password=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" --schema-registry="https://7NBH2CAFM2LMGTH7:xxxxxxxxxxxxxxxxxx@yyy-yyyyy.us-east-2.aws.confluent.cloud" --changefeed-id="confluent-changefeed" --config changefeed.conf ``` - Run the command to create a changefeed. @@ -132,7 +132,7 @@ The preceding steps are performed in a lab environment. You can also deploy a cl 3. After creating the changefeed, run the following command to check the changefeed status: ```shell - tiup ctl: cdc changefeed list --pd="http://127.0.0.1:2379" + tiup ctl: cdc changefeed list --server="http://127.0.0.1:8300" ``` You can refer to [Manage TiCDC Changefeeds](/ticdc/ticdc-manage-changefeed.md) to manage the changefeed. diff --git a/ticdc/monitor-ticdc.md b/ticdc/monitor-ticdc.md index d1227dcb45cc2..550efca4079b5 100644 --- a/ticdc/monitor-ticdc.md +++ b/ticdc/monitor-ticdc.md @@ -10,7 +10,7 @@ If you use TiUP to deploy the TiDB cluster, you can see a sub-dashboard for TiCD The metric description in this document is based on the following replication task example, which replicates data to MySQL using the default configuration. ```shell -cdc cli changefeed create --pd=http://10.0.10.25:2379 --sink-uri="mysql://root:123456@127.0.0.1:3306/" --changefeed-id="simple-replication-task" +cdc cli changefeed create --server=http://10.0.10.25:8300 --sink-uri="mysql://root:123456@127.0.0.1:3306/" --changefeed-id="simple-replication-task" ``` The TiCDC dashboard contains four monitoring panels. See the following screenshot: diff --git a/ticdc/ticdc-architecture.md b/ticdc/ticdc-architecture.md index f25df1427ab10..3d4831a09216d 100644 --- a/ticdc/ticdc-architecture.md +++ b/ticdc/ticdc-architecture.md @@ -42,7 +42,7 @@ Changefeed and Task in TiCDC are two logical concepts. The specific description For example: ``` -cdc cli changefeed create --pd=http://10.0.10.25:2379 --sink-uri="kafka://127.0.0.1:9092/cdc-test?kafka-version=2.4.0&partition-num=6&max-message-bytes=67108864&replication-factor=1" +cdc cli changefeed create --server="http://127.0.0.1:8300" --sink-uri="kafka://127.0.0.1:9092/cdc-test?kafka-version=2.4.0&partition-num=6&max-message-bytes=67108864&replication-factor=1" cat changefeed.toml ...... [sink] @@ -139,17 +139,19 @@ The preceding sections only cover data changes of DML statements and do not incl #### Barrier TS -Barrier TS is generated when a DDL statement is executed or a Syncpoint is used. +Barrier TS is generated when there are DDL change events or a Syncpoint is used. -- This timestamp ensures that all changes before this DDL statement are replicated to the downstream. After this DDL statement is executed and replicated, TiCDC starts replicating other data changes. Because DDL statements are processed by the Capture Owner, the Barrier TS corresponding to a DDL statement is only generated by the Processor thread of the owner node. -- Syncpoint Barrier TS is also a timestamp. When you enable the Syncpoint feature of TiCDC, a Barrier TS is generated by TiCDC according to the `sync-point-interval` you specified. When all table changes before this Barrier TS are replicated, TiCDC records the global Checkpoint in downstream, from which data replication continues next time. +- DDL change events: Barrier TS ensures that all changes before the DDL statement are replicated to the downstream. After this DDL statement is executed and replicated, TiCDC starts replicating other data changes. Because DDL statements are processed by the Capture Owner, the Barrier TS corresponding to a DDL statement is only generated by the owner node. +- Syncpoint: When you enable the Syncpoint feature of TiCDC, a Barrier TS is generated by TiCDC according to the `sync-point-interval` you specified. When all table changes before this Barrier TS are replicated, TiCDC inserts the current global CheckpointTS as the primary TS to the table recording tsMap in downstream. Then TiCDC continues data replication. -After a Barrier TS is generated, TiCDC only replicates data changes that occur before this Barrier TS to downstream. Then TiCDC checks whether all target data has been replicated by comparing the global CheckpointTS and Barrier TS. If global CheckpointTS equals to Barrier TS, TiCDC continues replication after performing a designated operation (such as executing a DDL statement or recording the global CheckpointTS downstream). Otherwise, TiCDC waits for all data changes that occur before Barrier TS to be replicated to the downstream. +After a Barrier TS is generated, TiCDC ensures that only data changes that occur before this Barrier TS are replicated to downstream. Before these data changes are replicated to downstream, the replication task does not proceed. The owner TiCDC checks whether all target data has been replicated by continuously comparing the global CheckpointTS and the Barrier TS. If the global CheckpointTS equals to the Barrier TS, TiCDC continues replication after performing a designated operation (such as executing a DDL statement or recording the global CheckpointTS downstream). Otherwise, TiCDC waits for all data changes that occur before the Barrier TS to be replicated to the downstream. ## Major processes This section describes the major processes of TiCDC to help you better understand its working principles. +Note that the following processes occur only within TiCDC and are transparent to users. Therefore, you do not need to care about which TiCDC node you are starting. + ### Start TiCDC - For a TiCDC node that is not an owner, it works as follows: diff --git a/ticdc/ticdc-avro-protocol.md b/ticdc/ticdc-avro-protocol.md index 74f002550cf41..31c56bca18761 100644 --- a/ticdc/ticdc-avro-protocol.md +++ b/ticdc/ticdc-avro-protocol.md @@ -16,7 +16,7 @@ The following is a configuration example using Avro: {{< copyable "shell-regular" >}} ```shell -cdc cli changefeed create --pd=http://127.0.0.1:2379 --changefeed-id="kafka-avro" --sink-uri="kafka://127.0.0.1:9092/topic-name?protocol=avro" --schema-registry=http://127.0.0.1:8081 --config changefeed_config.toml +cdc cli changefeed create --server=http://127.0.0.1:8300 --changefeed-id="kafka-avro" --sink-uri="kafka://127.0.0.1:9092/topic-name?protocol=avro" --schema-registry=http://127.0.0.1:8081 --config changefeed_config.toml ``` ```shell @@ -41,7 +41,7 @@ The following is a configuration example: {{< copyable "shell-regular" >}} ```shell -cdc cli changefeed create --pd=http://127.0.0.1:2379 --changefeed-id="kafka-avro-enable-extension" --sink-uri="kafka://127.0.0.1:9092/topic-name?protocol=avro&enable-tidb-extension=true" --schema-registry=http://127.0.0.1:8081 --config changefeed_config.toml +cdc cli changefeed create --server=http://127.0.0.1:8300 --changefeed-id="kafka-avro-enable-extension" --sink-uri="kafka://127.0.0.1:9092/topic-name?protocol=avro&enable-tidb-extension=true" --schema-registry=http://127.0.0.1:8081 --config changefeed_config.toml ``` ```shell @@ -207,7 +207,7 @@ The following is a configuration example: {{< copyable "shell-regular" >}} ```shell -cdc cli changefeed create --pd=http://127.0.0.1:2379 --changefeed-id="kafka-avro-string-option" --sink-uri="kafka://127.0.0.1:9092/topic-name?protocol=avro&avro-decimal-handling-mode=string&avro-bigint-unsigned-handling-mode=string" --schema-registry=http://127.0.0.1:8081 --config changefeed_config.toml +cdc cli changefeed create --server=http://127.0.0.1:8300 --changefeed-id="kafka-avro-string-option" --sink-uri="kafka://127.0.0.1:9092/topic-name?protocol=avro&avro-decimal-handling-mode=string&avro-bigint-unsigned-handling-mode=string" --schema-registry=http://127.0.0.1:8081 --config changefeed_config.toml ``` ```shell diff --git a/ticdc/ticdc-canal-json.md b/ticdc/ticdc-canal-json.md index 1bb9a5db69a1a..fd0c816a7131a 100644 --- a/ticdc/ticdc-canal-json.md +++ b/ticdc/ticdc-canal-json.md @@ -22,7 +22,7 @@ The following is an example of using `Canal-JSON`: {{< copyable "shell-regular" >}} ```shell -cdc cli changefeed create --pd=http://127.0.0.1:2379 --changefeed-id="kafka-canal-json" --sink-uri="kafka://127.0.0.1:9092/topic-name?kafka-version=2.4.0&protocol=canal-json" +cdc cli changefeed create --server=http://127.0.0.1:8300 --changefeed-id="kafka-canal-json" --sink-uri="kafka://127.0.0.1:9092/topic-name?kafka-version=2.4.0&protocol=canal-json" ``` ## TiDB extension field @@ -37,7 +37,7 @@ The following is an example: {{< copyable "shell-regular" >}} ```shell -cdc cli changefeed create --pd=http://127.0.0.1:2379 --changefeed-id="kafka-canal-json-enable-tidb-extension" --sink-uri="kafka://127.0.0.1:9092/topic-name?kafka-version=2.4.0&protocol=canal-json&enable-tidb-extension=true" +cdc cli changefeed create --server=http://127.0.0.1:8300 --changefeed-id="kafka-canal-json-enable-tidb-extension" --sink-uri="kafka://127.0.0.1:9092/topic-name?kafka-version=2.4.0&protocol=canal-json&enable-tidb-extension=true" ``` ## Definitions of message formats diff --git a/ticdc/ticdc-changefeed-config.md b/ticdc/ticdc-changefeed-config.md index e15669273aff1..06de5a81469ec 100644 --- a/ticdc/ticdc-changefeed-config.md +++ b/ticdc/ticdc-changefeed-config.md @@ -16,7 +16,7 @@ cdc cli changefeed create --server=http://10.0.10.25:8300 --sink-uri="mysql://ro ```shell Create changefeed successfully! ID: simple-replication-task -Info: {"sink-uri":"mysql://root:123456@127.0.0.1:3306/","opts":{},"create-time":"2020-03-12T22:04:08.103600025+08:00","start-ts":415241823337054209,"target-ts":0,"admin-job-type":0,"sort-engine":"unified","sort-dir":".","config":{"case-sensitive":true,"filter":{"rules":["*.*"],"ignore-txn-start-ts":null,"ddl-allow-list":null},"mounter":{"worker-num":16},"sink":{"dispatchers":null},"scheduler":{"type":"table-number","polling-time":-1}},"state":"normal","history":null,"error":null} +Info: {"upstream_id":7178706266519722477,"namespace":"default","id":"simple-replication-task","sink_uri":"mysql://root:xxxxx@127.0.0.1:4000/?time-zone=","create_time":"2022-12-19T15:05:46.679218+08:00","start_ts":438156275634929669,"engine":"unified","config":{"case_sensitive":true,"enable_old_value":true,"force_replicate":false,"ignore_ineligible_table":false,"check_gc_safe_point":true,"enable_sync_point":true,"bdr_mode":false,"sync_point_interval":30000000000,"sync_point_retention":3600000000000,"filter":{"rules":["test.*"],"event_filters":null},"mounter":{"worker_num":16},"sink":{"protocol":"","schema_registry":"","csv":{"delimiter":",","quote":"\"","null":"\\N","include_commit_ts":false},"column_selectors":null,"transaction_atomicity":"none","encoder_concurrency":16,"terminator":"\r\n","date_separator":"none","enable_partition_separator":false},"consistent":{"level":"none","max_log_size":64,"flush_interval":2000,"storage":""}},"state":"normal","creator_version":"v6.5.0"} ``` - `--changefeed-id`: The ID of the replication task. The format must match the `^[a-zA-Z0-9]+(\-[a-zA-Z0-9]+)*$` regular expression. If this ID is not specified, TiCDC automatically generates a UUID (the version 4 format) as the ID. diff --git a/ticdc/ticdc-compatibility.md b/ticdc/ticdc-compatibility.md index 0f8fe87912506..f5e37e75a20f7 100644 --- a/ticdc/ticdc-compatibility.md +++ b/ticdc/ticdc-compatibility.md @@ -21,8 +21,7 @@ TODO * In TiCDC v4.0.0, `ignore-txn-commit-ts` is removed and `ignore-txn-start-ts` is added, which uses start_ts to filter transactions. * In TiCDC v4.0.2, `db-dbs`/`db-tables`/`ignore-dbs`/`ignore-tables` are removed and `rules` is added, which uses new filter rules for databases and tables. For detailed filter syntax, see [Table Filter](/table-filter.md). -* In TiCDC v6.1.0, `mounter` is removed. If you configure `mounter`, TiCDC does not report an error, but the configuration does not take effect. -* Starting from TiCDC v6.2.0, `cdc cli` can directly interact with TiCDC server via TiCDC Open API. You can specify the address of TiCDC server using the `--server` parameter. `--pd` is deprecated and no longer recommended. +* Starting from TiCDC v6.2.0, `cdc cli` can directly interact with TiCDC server via TiCDC Open API. You can specify the address of the TiCDC server using the `--server` parameter. `--pd` is deprecated. * Since v6.4.0, only the changefeed with the `SYSTEM_VARIABLES_ADMIN` or `SUPER` privilege can use the TiCDC Syncpoint feature. ## Handle compatibility issues @@ -58,6 +57,7 @@ The `sort-dir` configuration is used to specify the temporary file directory for | v4.0.11 or an earlier v4.0 version, v5.0.0-rc | It is a changefeed configuration item and specifies temporary file directory for the `file` sorter and `unified` sorter. | In these versions, `file` sorter and `unified` sorter are **experimental features** and **NOT** recommended for the production environment.

If multiple changefeeds use the `unified` sorter as its `sort-engine`, the actual temporary file directory might be the `sort-dir` configuration of any changefeed, and the directory used for each TiCDC node might be different. | It is not recommended to use `unified` sorter in the production environment. | | v4.0.12, v4.0.13, v5.0.0, and v5.0.1 | It is a configuration item of changefeed or of `cdc server`. | By default, the `sort-dir` configuration of a changefeed does not take effect, and the `sort-dir` configuration of `cdc server` defaults to `/tmp/cdc_sort`. It is recommended to only configure `cdc server` in the production environment.

If you use TiUP to deploy TiCDC, it is recommended to use the latest TiUP version and set `sorter.sort-dir` in the TiCDC server configuration.

The `unified` sorter is enabled by default in v4.0.13, v5.0.0, and v5.0.1. If you want to upgrade your cluster to these versions, make sure that you have correctly configured `sorter.sort-dir` in the TiCDC server configuration. | You need to configure `sort-dir` using the `cdc server` command-line parameter (or TiUP). | | v4.0.14 and later v4.0 versions, v5.0.3 and later v5.0 versions, later TiDB versions | `sort-dir` is deprecated. It is recommended to configure `data-dir`. | You can configure `data-dir` using the latest version of TiUP. In these TiDB versions, `unified` sorter is enabled by default. Make sure that `data-dir` has been configured correctly when you upgrade your cluster. Otherwise, `/tmp/cdc_data` will be used by default as the temporary file directory.

If the storage capacity of the device where the directory is located is insufficient, the problem of insufficient hard disk space might occur. In this situation, the previous `sort-dir` configuration of changefeed will become invalid.| You need to configure `data-dir` using the `cdc server` command-line parameter (or TiUP). | +| v6.0.0 and later versions | `data-dir` is used for saving the temporary files generated by TiCDC. | Starting from v6.0.0, TiCDC uses `db sorter` as the sort engine by default. `data-dir` is the disk directory for this engine. | You need to configure `data-dir` using the `cdc server` command-line parameter (or TiUP). | ### Compatibility with temporary tables diff --git a/ticdc/ticdc-faq.md b/ticdc/ticdc-faq.md index c51cc25ccbe4a..e9abf26e33601 100644 --- a/ticdc/ticdc-faq.md +++ b/ticdc/ticdc-faq.md @@ -9,7 +9,7 @@ This document introduces the common questions that you might encounter when usin > **Note:** > -> In this document, the PD address specified in `cdc cli` commands is `--pd=http://10.0.10.25:2379`. When you use the command, replace the address with your actual PD address. +> In this document, the server address specified in `cdc cli` commands is `--server=http://127.0.0.1:8300`. When you use the command, replace the address with your actual PD address. ## How do I choose `start-ts` when creating a task in TiCDC? @@ -31,7 +31,7 @@ To view the status of TiCDC replication tasks, use `cdc cli`. For example: {{< copyable "shell-regular" >}} ```shell -cdc cli changefeed list --pd=http://10.0.10.25:2379 +cdc cli changefeed list --server=http://127.0.0.1:8300 ``` The expected output is as follows: @@ -113,7 +113,7 @@ Yes. To enable Canal output, specify the protocol as `canal` in the `--sink-uri` {{< copyable "shell-regular" >}} ```shell -cdc cli changefeed create --pd=http://10.0.10.25:2379 --sink-uri="kafka://127.0.0.1:9092/cdc-test?kafka-version=2.4.0&protocol=canal" --config changefeed.toml +cdc cli changefeed create --server=http://127.0.0.1:8300 --sink-uri="kafka://127.0.0.1:9092/cdc-test?kafka-version=2.4.0&protocol=canal" --config changefeed.toml ``` > **Note:** @@ -187,7 +187,7 @@ For more information, refer to [Open protocol Row Changed Event format](/ticdc/t ## How much PD storage does TiCDC use? -TiCDC uses etcd in PD to store and regularly update the metadata. Because the time interval between the MVCC of etcd and PD's default compaction is one hour, the amount of PD storage that TiCDC uses is proportional to the amount of metadata versions generated within this hour. However, in v4.0.5, v4.0.6, and v4.0.7, TiCDC has a problem of frequent writing, so if there are 1000 tables created or scheduled in an hour, it then takes up all the etcd storage and returns the `etcdserver: mvcc: database space exceeded` error. You need to clean up the etcd storage after getting this error. See [etcd maintaince space-quota](https://etcd.io/docs/v3.4.0/op-guide/maintenance/#space-quota) for details. It is recommended to upgrade your cluster to v4.0.9 or later versions. +TiCDC uses etcd in PD to store and regularly update the metadata. Because the time interval between the MVCC of etcd and PD's default compaction is one hour, the amount of PD storage that TiCDC uses is proportional to the amount of metadata versions generated within this hour. However, in v4.0.5, v4.0.6, and v4.0.7, TiCDC has a problem of frequent writing, so if there are 1000 tables created or scheduled in an hour, it then takes up all the etcd storage and returns the `etcdserver: mvcc: database space exceeded` error. You need to clean up the etcd storage after getting this error. See [etcd maintenance space-quota](https://etcd.io/docs/v3.4.0/op-guide/maintenance/#space-quota) for details. It is recommended to upgrade your cluster to v4.0.9 or later versions. ## Does TiCDC support replicating large transactions? Is there any risk? diff --git a/ticdc/ticdc-manage-changefeed.md b/ticdc/ticdc-manage-changefeed.md index 1987bd84c4ab0..c605046ba8b27 100644 --- a/ticdc/ticdc-manage-changefeed.md +++ b/ticdc/ticdc-manage-changefeed.md @@ -19,7 +19,7 @@ cdc cli changefeed create --server=http://10.0.10.25:8300 --sink-uri="mysql://ro ```shell Create changefeed successfully! ID: simple-replication-task -Info: {"sink-uri":"mysql://root:123456@127.0.0.1:3306/","opts":{},"create-time":"2020-03-12T22:04:08.103600025+08:00","start-ts":415241823337054209,"target-ts":0,"admin-job-type":0,"sort-engine":"unified","sort-dir":".","config":{"case-sensitive":true,"filter":{"rules":["*.*"],"ignore-txn-start-ts":null,"ddl-allow-list":null},"mounter":{"worker-num":16},"sink":{"dispatchers":null},"scheduler":{"type":"table-number","polling-time":-1}},"state":"normal","history":null,"error":null} +Info: {"upstream_id":7178706266519722477,"namespace":"default","id":"simple-replication-task","sink_uri":"mysql://root:xxxxx@127.0.0.1:4000/?time-zone=","create_time":"2022-12-19T15:05:46.679218+08:00","start_ts":438156275634929669,"engine":"unified","config":{"case_sensitive":true,"enable_old_value":true,"force_replicate":false,"ignore_ineligible_table":false,"check_gc_safe_point":true,"enable_sync_point":true,"bdr_mode":false,"sync_point_interval":30000000000,"sync_point_retention":3600000000000,"filter":{"rules":["test.*"],"event_filters":null},"mounter":{"worker_num":16},"sink":{"protocol":"","schema_registry":"","csv":{"delimiter":",","quote":"\"","null":"\\N","include_commit_ts":false},"column_selectors":null,"transaction_atomicity":"none","encoder_concurrency":16,"terminator":"\r\n","date_separator":"none","enable_partition_separator":false},"consistent":{"level":"none","max_log_size":64,"flush_interval":2000,"storage":""}},"state":"normal","creator_version":"v6.5.0"} ``` ## Query the replication task list @@ -288,6 +288,10 @@ force-replicate = true ## Unified Sorter +> **Note:** +> +> Starting from v6.0.0, TiCDC uses the DB Sorter engine by default, and no longer uses the Unified Sorter. It is recommended that you do not configure the `sort engine` item. + Unified sorter is the sorting engine in TiCDC. It can mitigate OOM problems caused by the following scenarios: + The data replication task in TiCDC is paused for a long time, during which a large amount of incremental data is accumulated and needs to be replicated. @@ -306,4 +310,4 @@ In the output of the above command, if the value of `sort-engine` is "unified", > **Note:** > > + If your servers use mechanical hard drives or other storage devices that have high latency or limited bandwidth, the performance of Unified Sorter will be affected significantly. -> + By default, Unified Sorter uses `data_dir` to store temporary files. It is recommended to ensure that the free disk space is greater than or equal to 500 GiB. For production environments, it is recommended to ensure that the free disk space on each node is greater than (the maximum `checkpoint-ts` delay allowed by the business) * (upstream write traffic at business peak hours). In addition, if you plan to replicate a large amount of historical data after `changefeed` is created, make sure that the free space on each node is greater than the amount of replicated data. +> + By default, Unified Sorter uses `data_dir` to store temporary files. It is recommended to ensure that the free disk space is greater than or equal to 500 GiB. For production environments, it is recommended to ensure that the free disk space on each node is greater than (the maximum `checkpoint-ts` delay allowed by the business) * (upstream write traffic at business peak hours). In addition, if you plan to replicate a large amount of historical data after `changefeed` is created, make sure that the free space on each node is greater than the amount of the replicated data. diff --git a/ticdc/ticdc-server-config.md b/ticdc/ticdc-server-config.md index 54bd4233ab2ba..3135a7cee30aa 100644 --- a/ticdc/ticdc-server-config.md +++ b/ticdc/ticdc-server-config.md @@ -15,7 +15,7 @@ The following are descriptions of options available in a `cdc server` command: - `advertise-addr`: The advertised address via which clients access TiCDC. If unspecified, the value is the same as that of `addr`. - `pd`: A comma-separated list of PD endpoints. - `config`: The address of the configuration file that TiCDC uses (optional). This option is supported since TiCDC v5.0.0. This option can be used in the TiCDC deployment since TiUP v1.4.0. For detailed configuration description, see [TiCDC Changefeed Configurations](/ticdc/ticdc-changefeed-config.md) -- `data-dir`: Specifies the directory that TiCDC uses when it needs to use disks to store files. Unified Sorter uses this directory to store temporary files. It is recommended to ensure that the free disk space for this directory is greater than or equal to 500 GiB. For more details, see [Unified Sorter](/ticdc/ticdc-manage-changefeed.md#unified-sorter). If you are using TiUP, you can configure `data_dir` in the [`cdc_servers`](/tiup/tiup-cluster-topology-reference.md#cdc_servers) section, or directly use the default `data_dir` path in `global`. +- `data-dir`: Specifies the directory that TiCDC uses when it needs to use disks to store files. The sort engine used by TiCDC and redo logs use this directory to store temporary files. It is recommended to ensure that the free disk space for this directory is greater than or equal to 500 GiB. If you are using TiUP, you can configure `data_dir` in the [`cdc_servers`](/tiup/tiup-cluster-topology-reference.md#cdc_servers) section, or directly use the default `data_dir` path in `global`. - `gc-ttl`: The TTL (Time To Live) of the service level `GC safepoint` in PD set by TiCDC, and the duration that the replication task can suspend, in seconds. The default value is `86400`, which means 24 hours. Note: Suspending of the TiCDC replication task affects the progress of TiCDC GC safepoint, which means that it affects the progress of upstream TiDB GC, as detailed in [Complete Behavior of TiCDC GC safepoint](/ticdc/ticdc-faq.md#what-is-the-complete-behavior-of-ticdc-garbage-collection-gc-safepoint). - `log-file`: The path to which logs are output when the TiCDC process is running. If this parameter is not specified, logs are written to the standard output (stdout). - `log-level`: The log level when the TiCDC process is running. The default value is `"info"`. @@ -30,13 +30,13 @@ The following are descriptions of options available in a `cdc server` command: The following are configurations in the configuration file of `cdc server`: -``` -addr = "192.155.22.33:8887" +```yaml +addr = "127.0.0.1:8300" advertise-addr = "" log-file = "" log-level = "info" data-dir = "" -gc-ttl = 86400 +gc-ttl = 86400 # 24 h tz = "System" cluster-id = "default" @@ -46,15 +46,15 @@ cluster-id = "default" key-path = "" -capture-session-ttl = 10 -owner-flush-interval = 50000000 -processor-flush-interval = 50000000 -per-table-memory-quota = 10485760 +capture-session-ttl = 10 # 10s +owner-flush-interval = 50000000 # 50 ms +processor-flush-interval = 50000000 # 50 ms +per-table-memory-quota = 10485760 # 10 MiB [log] error-output = "stderr" [log.file] - max-size = 300 + max-size = 300 # 300 MiB max-days = 0 max-backups = 0 @@ -66,9 +66,9 @@ per-table-memory-quota = 10485760 num-workerpool-goroutine = 16 sort-dir = "/tmp/sorter" -[kv-client] - worker-concurrent = 8 - worker-pool-size = 0 - region-scan-limit = 40 - region-retry-duration = 60000000000 +# [kv-client] +# worker-concurrent = 8 +# worker-pool-size = 0 +# region-scan-limit = 40 +# region-retry-duration = 60000000000 ``` diff --git a/ticdc/ticdc-sink-to-mysql.md b/ticdc/ticdc-sink-to-mysql.md index b90d83bfc6bcd..091b538c16852 100644 --- a/ticdc/ticdc-sink-to-mysql.md +++ b/ticdc/ticdc-sink-to-mysql.md @@ -41,7 +41,7 @@ Sink URI is used to specify the connection information of the TiCDC target syste Sample configuration for MySQL: ```shell ---sink-uri="mysql://root:123456@127.0.0.1:3306/?worker-count=16&max-txn-row=5000&transaction-atomicity=table" +--sink-uri="mysql://root:123456@127.0.0.1:3306" ``` The following are descriptions of sink URI parameters and parameter values that can be configured for MySQL or TiDB: diff --git a/ticdc/troubleshoot-ticdc.md b/ticdc/troubleshoot-ticdc.md index fe2f07a2d14e3..cfc39b51876b7 100644 --- a/ticdc/troubleshoot-ticdc.md +++ b/ticdc/troubleshoot-ticdc.md @@ -10,7 +10,7 @@ This document introduces the common errors you might encounter when using TiCDC, > **Note:** > -> In this document, the PD address specified in `cdc cli` commands is `--pd=http://10.0.10.25:2379`. When you use the command, replace the address with your actual PD address. +> In this document, the server address specified in `cdc cli` commands is `server=http://127.0.0.1:8300`. When you use the command, replace the address with your actual PD address. ## TiCDC replication interruptions @@ -28,7 +28,7 @@ You can know whether the replication task is stopped manually by executing `cdc {{< copyable "shell-regular" >}} ```shell -cdc cli changefeed query --pd=http://10.0.10.25:2379 --changefeed-id 28c43ffc-2316-4f4f-a70b-d1a7c59ba79f +cdc cli changefeed query --server=http://127.0.0.1:8300 --changefeed-id 28c43ffc-2316-4f4f-a70b-d1a7c59ba79f ``` In the output of the above command, `admin-job-type` shows the state of this replication task: @@ -111,7 +111,7 @@ If the downstream is a special MySQL environment (a public cloud RDS or some MyS {{< copyable "shell-regular" >}} ```shell - cdc cli changefeed create --sink-uri="mysql://root@127.0.0.1:3306/?time-zone=CST" --pd=http://10.0.10.25:2379 + cdc cli changefeed create --sink-uri="mysql://root@127.0.0.1:3306/?time-zone=CST" --server=http://127.0.0.1:8300 ``` > **Note:** @@ -153,7 +153,7 @@ To fix the error, take the following steps: {{< copyable "shell-regular" >}} ```shell - cdc cli changefeed pause -c test-cf --pd=http://10.0.10.25:2379 + cdc cli changefeed pause -c test-cf --server=http://127.0.0.1:8300 ``` 3. Execute `cdc cli changefeed update` to update the original changefeed configuration. @@ -161,7 +161,7 @@ To fix the error, take the following steps: {{< copyable "shell-regular" >}} ```shell - cdc cli changefeed update -c test-cf --pd=http://10.0.10.25:2379 --sink-uri="mysql://127.0.0.1:3306/?max-txn-row=20&worker-number=8" --config=changefeed.toml + cdc cli changefeed update -c test-cf --server=http://127.0.0.1:8300 --sink-uri="mysql://127.0.0.1:3306/?max-txn-row=20&worker-number=8" --config=changefeed.toml ``` 4. Execute `cdc cli changfeed resume` to resume the replication task. @@ -169,7 +169,7 @@ To fix the error, take the following steps: {{< copyable "shell-regular" >}} ```shell - cdc cli changefeed resume -c test-cf --pd=http://10.0.10.25:2379 + cdc cli changefeed resume -c test-cf --server=http://127.0.0.1:8300 ``` ## The `[tikv:9006]GC life time is shorter than transaction duration, transaction starts at xx, GC safe point is yy` error is reported when I use TiCDC to create a changefeed. What should I do? @@ -201,7 +201,7 @@ If a DDL statement fails to execute, the replication task (changefeed) automatic {{< copyable "shell-regular" >}} ```shell -cdc cli changefeed resume -c test-cf --pd=http://10.0.10.25:2379 +cdc cli changefeed resume -c test-cf --server=http://127.0.0.1:8300 ``` If you want to skip this DDL statement that goes wrong, set the start-ts of the changefeed to the checkpoint-ts (the timestamp at which the DDL statement goes wrong) plus one, and then run the `cdc cli changefeed create` command to create a new changefeed task. For example, if the checkpoint-ts at which the DDL statement goes wrong is `415241823337054209`, run the following commands to skip this DDL statement: @@ -209,6 +209,6 @@ If you want to skip this DDL statement that goes wrong, set the start-ts of the {{< copyable "shell-regular" >}} ```shell -cdc cli changefeed remove --pd=http://10.0.10.25:2379 --changefeed-id simple-replication-task -cdc cli changefeed create --pd=http://10.0.10.25:2379 --sink-uri="mysql://root:123456@127.0.0.1:3306/" --changefeed-id="simple-replication-task" --sort-engine="unified" --start-ts 415241823337054210 +cdc cli changefeed remove --server=http://127.0.0.1:8300 --changefeed-id simple-replication-task +cdc cli changefeed create --server=http://127.0.0.1:8300 --sink-uri="mysql://root:123456@127.0.0.1:3306/" --changefeed-id="simple-replication-task" --sort-engine="unified" --start-ts 415241823337054210 ```