From b8a4b780c7a2fc77d12ebe17ae1ce35f57c53d85 Mon Sep 17 00:00:00 2001 From: Ran Date: Fri, 31 Jul 2020 19:29:33 +0800 Subject: [PATCH 1/7] ticdc: add more faqs Signed-off-by: Ran --- ticdc/manage-ticdc.md | 41 ++++++---- ticdc/ticdc-overview.md | 26 ++++++- ticdc/troubleshoot-ticdc.md | 149 +++++++++++++++++++++++++++++++++++- 3 files changed, 194 insertions(+), 22 deletions(-) diff --git a/ticdc/manage-ticdc.md b/ticdc/manage-ticdc.md index 951489054461b..f99cdfa3da59c 100644 --- a/ticdc/manage-ticdc.md +++ b/ticdc/manage-ticdc.md @@ -230,7 +230,7 @@ In the above command: - `resolved-ts`: The largest transaction `TS` in the current `changefeed`. Note that this `TS` has been successfully sent from TiKV to TiCDC. - `checkpoint-ts`: The largest transaction `TS` in the current `changefeed` that has been successfully written to the downstream. - `admin-job-type`: The status of a `changefeed`: - - `0`: The state is normal. It is the initial status. + - `0`: The state is normal. - `1`: The task is paused. When the task is paused, all replicated `processor`s exit. The configuration and the replication status of the task are retained, so you can resume the task from `checkpiont-ts`. - `2`: The task is resumed. The replication task resumes from `checkpoint-ts`. - `3`: The task is removed. When the task is removed, all replicated `processor`s are ended, and the configuration information of the replication task is cleared up. Only the replication status is retained for later queries. @@ -302,29 +302,36 @@ In the above command: {{< copyable "shell-regular" >}} ```shell - cdc cli processor query --pd=http://10.0.10.25:2379 --changefeed-id=28c43ffc-2316-4f4f-a70b-d1a7c59ba79f + cdc cli processor query --pd=http://10.0.10.25:2379 --changefeed-id=28c43ffc-2316-4f4f-a70b-d1a7c59ba79f --capture-id=b293999a-4168-4988-a4f4-35d9589b226b ``` ``` { - "status": { - "table-infos": [ - { - "id": 45, - "start-ts": 415241823337054209 - } - ], - "table-p-lock": null, - "table-c-lock": null, - "admin-job-type": 0 - }, - "position": { - "checkpoint-ts": 415241893447467009, - "resolved-ts": 415241893971492865 - } + "status": { + "tables": { + "56": { # ID of the replication table, corresponding to tidb_table_id of a table in TiDB + "start-ts": 417474117955485702, + "mark-table-id": 0 # ID of mark tables in the cyclic replication, corresponding to tidb_table_id of mark tables in TiDB + } + }, + "operation": null, + "admin-job-type": 0 + }, + "position": { + "checkpoint-ts": 417474143881789441, + "resolved-ts": 417474143881789441, + "count": 0 + } } ``` + In the command above: + + - `status.tables`: Each key number represents the ID of the replication table, corresponding to `tidb_table_id` of a table in TiDB. + - `mark-table-id`: The ID of mark tables in the cyclic replication, corresponding to `tidb_table_id` of mark tables in TiDB. + - `resolved-ts`: The largest TSO among the sorted data in the current processor. + - `checkpoint-ts`: The largest TSO that has been successfully written to downstream in the current processor. + ## Use HTTP interface to manage cluster status and data replication task Currently, the HTTP interface provides some basic features for query and maintenance. diff --git a/ticdc/ticdc-overview.md b/ticdc/ticdc-overview.md index 9bae8240d53ec..00b94ef44dc62 100644 --- a/ticdc/ticdc-overview.md +++ b/ticdc/ticdc-overview.md @@ -30,7 +30,7 @@ The architecture of TiCDC is shown in the following figure: - `capture`: The operating process of TiCDC. Multiple `capture`s form a TiCDC cluster that replicates KV change logs. - Each `capture` pulls a part of KV change logs. - - Sorts the pulled the KV change log(s). + - Sorts the pulled KV change log(s). - Restores the transaction to downstream or outputs the log based on the TiCDC open protocol. ## Replication features @@ -44,6 +44,30 @@ Currently, the TiCDC sink component supports replicating data to the following d - Databases compatible with MySQL protocol. The sink component provides the final consistency support. - Kafka based on the TiCDC Open Protocol. The sink component ensures the row-level order, final consistency or strict transactional consistency. +### Ensure replication order and consistency + +#### Data replication order + +- For all DDL and DML statements, TiCDC outputs **at least once**. +- When the TiKV or TiCDC cluster encounters failure, TiCDC might send the same DDL/DML statement repeatedly. For duplicated DDL/DML statements: + + - MySQL sink can execute DDL statements repeatedly. For DDL statements that can be executed repeatedly in downstream, such as `truncate table`, the statement is executed successfully. For those that cannot be executed repeatedly, such as `create table`, the execution fails, and TiCDC ignores the error and continues the replication. + - Kafka sink sends messages repeatedly, but the duplicate messages do not affect the constraints fo `Resolved Ts`. Users can filter the duplicated messages from Kafka consumers. + +#### Data replication consistency + +- MySQL sink + + - TiCDC does not split in-table transactions. This is to ensure the transaction consistency within a single table. However, TiCDC does not ensure the transaction order in the upstream table. + - TiCDC splits cross-table transactions in the unit of tables. TiCDC does not ensure that cross-table transactions are always consistent. + - TiCDC ensures that the order of single-row updates are consistent with that in the upstream. + +- Kafka sink + + - TiCDC provides different strategies for data distribution. You can distribute data to different Kafka partitions based on the table, primary key, or ts. + - For different distribution strategies, the different implementation of consumers can achieve different levels of consistency, including row-level consistency, eventual consistency, or cross-table transaction consistency. + - TiCDC does not has an implementation of Kafka consumers, but only offers [TiCDC open protocol](/ticdc/ticdc-open-protocol.md). You can implement the Kafka consumer according to the protocol. + ## Restrictions To replicate data to TiDB or MySQL, you must ensure that the following requirements are satisfied to guarantee data correctness: diff --git a/ticdc/troubleshoot-ticdc.md b/ticdc/troubleshoot-ticdc.md index ff8555c25fedc..be3bfcc73c48e 100644 --- a/ticdc/troubleshoot-ticdc.md +++ b/ticdc/troubleshoot-ticdc.md @@ -8,7 +8,7 @@ aliases: ['/docs/dev/ticdc/troubleshoot-ticdc/'] This document introduces the common issues and errors that you might encounter when using TiCDC, and the corresponding maintenance and troubleshooting methods. -## How to choose `start-ts` when starting a task +## How do I choose `start-ts` when creating a task in TiCDC? The `start-ts` of a replication task corresponds to a Timestamp Oracle (TSO) in the upstream TiDB cluster. TiCDC requests data from this TSO in a replication task. Therefore, the `start-ts` of the replication task must meet the following requirements: @@ -17,11 +17,11 @@ The `start-ts` of a replication task corresponds to a Timestamp Oracle (TSO) in If you do not specify `start-ts`, or specify `start-ts` as `0`, when a replication task is started, TiCDC gets a current TSO and starts the task from this TSO. -## Some tables cannot be replicated when you start a task +## Why can't I replicate some tables when I create a task in TiCDC? When you execute `cdc cli changefeed create` to create a replication task, TiCDC checks whether the upstream tables meet the [replication restrictions](/ticdc/ticdc-overview.md#restrictions). If some tables do not meet the restrictions, `some tables are not eligible to replicate` is returned with a list of ineligible tables. You can choose `Y` or `y` to continue creating the task, and all updates on these tables are automatically ignored during the replication. If you choose an input other than `Y` or `y`, the replication task is not created. -## How to handle replication interruption +## How do I handle replication interruption? A replication task might be interrupted in the following known scenarios: @@ -38,7 +38,7 @@ A replication task might be interrupted in the following known scenarios: 2. Use the new task configuration file and add the `ignore-txn-commit-ts` parameter to skip the transaction corresponding to the specified `commit-ts`. 3. Stop the old replication task via HTTP API. Execute `cdc cli changefeed create` to create a new task and specify the new task configuration file. Specify `checkpoint-ts` recorded in step 1 as the `start-ts` and start a new task to resume the replication. -## `gc-ttl` and file sorting +## What is `gc-ttl` and file sorting in TiCDC? Since v4.0.0-rc.1, PD supports external services in setting the service-level GC safepoint. Any service can register and update its GC safepoint. PD ensures that the key-value data smaller than this GC safepoint is not cleaned by GC. Enabling this feature in TiCDC ensures that the data to be consumed by TiCDC is retained in TiKV without being cleaned by GC when the replication task is unavailable or interrupted. @@ -55,3 +55,144 @@ cdc cli changefeed create --pd=http://10.0.10.25:2379 --start-ts=415238226621235 > **Note:** > > TiCDC (the 4.0 version) does not support dynamically modifying the file sorting and memory sorting yet. + +## How do I handle the `Error 1298: Unknown or incorrect time zone: 'UTC'` error when creating the replication task or replicating data to MySQL? + +This error is returned when the downstream MySQL does not load the time zone. You can load the time zone by running [`mysql_tzinfo_to_sql`](https://dev.mysql.com/doc/refman/8.0/en/mysql-tzinfo-to-sql.html). After loading the time zone, you can create tasks and migrate data normally. + +{{< copyable "shell-regular" >}} + +```shell +mysql_tzinfo_to_sql /usr/share/zoneinfo | mysql -u root mysql -p +``` + +``` +Enter password: +Warning: Unable to load '/usr/share/zoneinfo/iso3166.tab' as time zone. Skipping it. +Warning: Unable to load '/usr/share/zoneinfo/leap-seconds.list' as time zone. Skipping it. +Warning: Unable to load '/usr/share/zoneinfo/zone.tab' as time zone. Skipping it. +Warning: Unable to load '/usr/share/zoneinfo/zone1970.tab' as time zone. Skipping it. +``` + +If you use MySQL in a special cloud environment, such Aliyun RDS, and you do not have the permission to modify MySQL, you need to specify the time zone using the `--tz` parameter. + +First, query the time zone used by MySQL: + +{{< copyable "shell-regular" >}} + +```shell +show variables like '%time_zone%'; +``` + +``` ++------------------+--------+ +| Variable_name | Value | ++------------------+--------+ +| system_time_zone | CST | +| time_zone | SYSTEM | ++------------------+--------+ +``` + +Specify the time zone when you create the replication task and create the TiCDC service: + +{{< copyable "shell-regular" >}} + +```shell +cdc cli changefeed create --sink-uri="mysql://root@127.0.0.1:3306/" --tz=Asia/Shanghai +``` + +> **Note:** +> +> In MySQL, CST refers to the China Standard Time (UTC+08:00). Usually you cannot use `CST` directly in your system, but use `Asia/Shanghai` instead. + +Be cautious when you set the time zone of the TiCDC server, because the time zone will be used for the conversion of time type. It is recommended that you use the same time zone in the upstream and downstream databases, and specify the time zone using `--tz` when you start the TiCDC server. + +The TiCDC server chooses its time zone in the following priority: + +- TiCDC first uses the time zone specified by `--tz`. +- When the above parameter is not available, TiCDC tries to read the timezone set by the `TZ` environment variable. +- When the above environment variable is not available, TiCDC uses the default time zone of the machine. + +## How do I handle the incompatibility of configuration files caused by TiCDC upgrade? + +Refer to [Notes for compatibility](/ticdc/manage-ticdc.md#notes-for-compatibility). + +## Does TiCDC support outputting data changes in the Canal format? + +Yes. To enable Canal output, specify the protocol as `canal` in the `--sink-uri` parameter. For example: + +{{< copyable "shell-regular" >}} + +```shell +cdc cli changefeed create --pd=http://10.0.10.25:2379 --sink-uri="kafka://127.0.0.1:9092/cdc-test?kafka-version=2.4.0&protocol=canal" --config changefeed.toml +``` + +> **Note:** +> +> * This feature is introduced in TiCDC 4.0.2. +> * TiCDC currently only supports outputting data changes in the Canal format to Kafka. + +For more information, refer to [Create a replication task](/ticdc/manage-ticdc.md#create-a-replication-task). + +## How do I view the latency of TiCDC replication tasks? + +To view the latency of TiCDC replication tasks, use `cdc cli`. For example: + +{{< copyable "shell-regular" >}} + +```shell +cdc cli changefeed list --pd=http://10.0.10.25:2379 +``` + +The expected output is as follows: + +```json +[{ + "id": "4e24dde6-53c1-40b6-badf-63620e4940dc", + "summary": { + "state": "normal", + "tso": 417886179132964865, + "checkpoint": "2020-07-07 16:07:44.881", + "error": null + } +}] +``` + +* `checkpoint`: TiCDC has replicated all data before this timestamp to downstream. +* `state`: The state of the replication task: + + * `normal`: The task runs normally. + * `stopped`: The task is stopped manually or encounters an error. + * `removed`: The task is deleted. + +> **Note:** +> +> This feature is introduced in TiCDC 4.0.3. + +## How do I know whether the replication task runs normally? + +You can view the state of the replication tasks by using `cdc cli`. For example: + +{{< copyable "shell-regular" >}} + +```shell +cdc cli changefeed query --pd=http://10.0.10.25:2379 --changefeed-id 28c43ffc-2316-4f4f-a70b-d1a7c59ba79f +``` + +In the output of this command, `admin-job-type` shows the state of the replication task: + +* `0`: Normal. +* `1`: Paused. When the task is paused, all replicated `processor`s exit. The configuration and the replication status of the task are retained, so you can resume the task from `checkpiont-ts`. +* `2`: Resumed. The replication task resumes from `checkpoint-ts`. +* `3`: Removed. When the task is removed, all replicated `processor`s are ended, and the configuration information of the replication task is cleared up. Only the replication status is retained for later queries. + +## Why does the latency from TiCDC to Kafka become larger and larger? + +* Check [whether the status of the replication task is normal](#how-do-i-know-whether-the-replication-task-runs-normally). +* Adjust the following parameters of Kafka: + + * Increase `message.max.bytes` in `server.properties` to `1073741824` (1 GB). + * Increase `replica.fetch.max.bytes` in `server.properties` to `1073741824` (1 GB). + * Increase `fetch.message.max.bytes` in `consumer.properties` to make it larger than `message.max.bytes`. + +## TiCDC 把数据同步到 Kafka 时,是把一个事务内的所有变更都写到一个消息中吗?如果不是,是根据什么划分的? From fcb8d4e9af2aa1daa2e8eec43bcf7cc12f6bbbf4 Mon Sep 17 00:00:00 2001 From: Ran Date: Mon, 3 Aug 2020 11:15:45 +0800 Subject: [PATCH 2/7] add more faqs Signed-off-by: Ran --- ticdc/troubleshoot-ticdc.md | 43 ++++++++++++++++++++++++++++++++++++- 1 file changed, 42 insertions(+), 1 deletion(-) diff --git a/ticdc/troubleshoot-ticdc.md b/ticdc/troubleshoot-ticdc.md index be3bfcc73c48e..96d39d480c8bd 100644 --- a/ticdc/troubleshoot-ticdc.md +++ b/ticdc/troubleshoot-ticdc.md @@ -195,4 +195,45 @@ In the output of this command, `admin-job-type` shows the state of the replicati * Increase `replica.fetch.max.bytes` in `server.properties` to `1073741824` (1 GB). * Increase `fetch.message.max.bytes` in `consumer.properties` to make it larger than `message.max.bytes`. -## TiCDC 把数据同步到 Kafka 时,是把一个事务内的所有变更都写到一个消息中吗?如果不是,是根据什么划分的? +## When TiCDC replicates data to Kafka, does it write all the changes in a transaction into one message? If not, on what basis does it divide the changes? + +No. According to the different distribution strategies configured, TiCDC divides the changes on different bases, including `default`, `row id`, `table`, and `ts`. + +For more information, refer to [Replication task configuration file](/ticdc/manage-ticdc.md#task-configuration-file). + +## When TiCDC replicates data to Kafka, can I control the maximum size of a single message in TiDB? + +No. Currently TiCDC sets the maximum size of batch messages to 512 MB, and that of a single message to 4 MB. + +## When TiCDC replicates data to Kafka, does a message contain multiple types of data changes? + +Yes. A single might contain multiple `update`s or `delete`s, and `update` and `delete` might co-exist. + +## When TiCDC replicates data to Kafka, how do I view the timestamp, table name, and schema name in the output of TiCDC Open Protocol? + +The information is included in the key of Kafka messages. For example: + +```json +{ + "ts":, + "scm":, + "tbl":, + "t":1 +} +``` + +For more information, refer to [TiCDC Open Protocol event format](/ticdc/ticdc-open-protocol.md#event-format). + +## When TiCDC replicates data to Kafka, how do I know the timestamp of the data changes in a message? + +You can get the unix timestamp by moving `ts` in the key of the Kafka message by 18 bits to the right. + +## How does TiCDC Open Protocol represent `null`? + +In TiCDC Open Protocol, the type code `6` represents `null`. + +| Type | Code | Output Example | Note | +|:--|:--|:--|:--| +| Null | 6 | `{"t":6,"v":null}` | | + +For more information, refer to [TiCDC Open Protocol column type code](ticdc/ticdc-open-protocol.md#column-type-code). From de66a2d59ece6eb28ee3ba719e4a967db1859d86 Mon Sep 17 00:00:00 2001 From: Ran Date: Mon, 3 Aug 2020 11:40:01 +0800 Subject: [PATCH 3/7] update wording Signed-off-by: Ran --- ticdc/manage-ticdc.md | 2 +- ticdc/ticdc-overview.md | 12 ++++----- ticdc/troubleshoot-ticdc.md | 50 ++++++++++++++++++------------------- 3 files changed, 32 insertions(+), 32 deletions(-) diff --git a/ticdc/manage-ticdc.md b/ticdc/manage-ticdc.md index f99cdfa3da59c..a5b953ec0db68 100644 --- a/ticdc/manage-ticdc.md +++ b/ticdc/manage-ticdc.md @@ -330,7 +330,7 @@ In the above command: - `status.tables`: Each key number represents the ID of the replication table, corresponding to `tidb_table_id` of a table in TiDB. - `mark-table-id`: The ID of mark tables in the cyclic replication, corresponding to `tidb_table_id` of mark tables in TiDB. - `resolved-ts`: The largest TSO among the sorted data in the current processor. - - `checkpoint-ts`: The largest TSO that has been successfully written to downstream in the current processor. + - `checkpoint-ts`: The largest TSO that has been successfully written to the downstream in the current processor. ## Use HTTP interface to manage cluster status and data replication task diff --git a/ticdc/ticdc-overview.md b/ticdc/ticdc-overview.md index 00b94ef44dc62..22e41ffa7310c 100644 --- a/ticdc/ticdc-overview.md +++ b/ticdc/ticdc-overview.md @@ -51,22 +51,22 @@ Currently, the TiCDC sink component supports replicating data to the following d - For all DDL and DML statements, TiCDC outputs **at least once**. - When the TiKV or TiCDC cluster encounters failure, TiCDC might send the same DDL/DML statement repeatedly. For duplicated DDL/DML statements: - - MySQL sink can execute DDL statements repeatedly. For DDL statements that can be executed repeatedly in downstream, such as `truncate table`, the statement is executed successfully. For those that cannot be executed repeatedly, such as `create table`, the execution fails, and TiCDC ignores the error and continues the replication. - - Kafka sink sends messages repeatedly, but the duplicate messages do not affect the constraints fo `Resolved Ts`. Users can filter the duplicated messages from Kafka consumers. + - MySQL sink can execute DDL statements repeatedly. For DDL statements that can be executed repeatedly in the downstream, such as `truncate table`, the statement is executed successfully. For those that cannot be executed repeatedly, such as `create table`, the execution fails, and TiCDC ignores the error and continues the replication. + - Kafka sink sends messages repeatedly, but the duplicate messages do not affect the constraints of `Resolved Ts`. Users can filter the duplicated messages from Kafka consumers. #### Data replication consistency - MySQL sink - - TiCDC does not split in-table transactions. This is to ensure the transaction consistency within a single table. However, TiCDC does not ensure the transaction order in the upstream table. - - TiCDC splits cross-table transactions in the unit of tables. TiCDC does not ensure that cross-table transactions are always consistent. - - TiCDC ensures that the order of single-row updates are consistent with that in the upstream. + - TiCDC does not split in-table transactions. This is to **ensure** the transaction consistency within a single table. However, TiCDC does **not ensure** the transaction order in the upstream table. + - TiCDC splits cross-table transactions in the unit of tables. TiCDC does **not ensure** that cross-table transactions are always consistent. + - TiCDC **ensures** that the order of single-row updates are consistent with that in the upstream. - Kafka sink - TiCDC provides different strategies for data distribution. You can distribute data to different Kafka partitions based on the table, primary key, or ts. - For different distribution strategies, the different implementation of consumers can achieve different levels of consistency, including row-level consistency, eventual consistency, or cross-table transaction consistency. - - TiCDC does not has an implementation of Kafka consumers, but only offers [TiCDC open protocol](/ticdc/ticdc-open-protocol.md). You can implement the Kafka consumer according to the protocol. + - TiCDC does not have an implementation of Kafka consumers, but only provides [TiCDC Open Protocol](/ticdc/ticdc-open-protocol.md). You can implement the Kafka consumer according to the protocol. ## Restrictions diff --git a/ticdc/troubleshoot-ticdc.md b/ticdc/troubleshoot-ticdc.md index 96d39d480c8bd..2aab8cc6e710d 100644 --- a/ticdc/troubleshoot-ticdc.md +++ b/ticdc/troubleshoot-ticdc.md @@ -74,44 +74,44 @@ Warning: Unable to load '/usr/share/zoneinfo/zone.tab' as time zone. Skipping it Warning: Unable to load '/usr/share/zoneinfo/zone1970.tab' as time zone. Skipping it. ``` -If you use MySQL in a special cloud environment, such Aliyun RDS, and you do not have the permission to modify MySQL, you need to specify the time zone using the `--tz` parameter. +If you use MySQL in a special cloud environment, such Aliyun RDS, and if you do not have the permission to modify MySQL, you need to specify the time zone using the `--tz` parameter: -First, query the time zone used by MySQL: +1. Query the time zone used by MySQL: -{{< copyable "shell-regular" >}} + {{< copyable "shell-regular" >}} -```shell -show variables like '%time_zone%'; -``` + ```shell + show variables like '%time_zone%'; + ``` -``` -+------------------+--------+ -| Variable_name | Value | -+------------------+--------+ -| system_time_zone | CST | -| time_zone | SYSTEM | -+------------------+--------+ -``` + ``` + +------------------+--------+ + | Variable_name | Value | + +------------------+--------+ + | system_time_zone | CST | + | time_zone | SYSTEM | + +------------------+--------+ + ``` -Specify the time zone when you create the replication task and create the TiCDC service: +2. Specify the time zone when you create the replication task and create the TiCDC service: -{{< copyable "shell-regular" >}} + {{< copyable "shell-regular" >}} -```shell -cdc cli changefeed create --sink-uri="mysql://root@127.0.0.1:3306/" --tz=Asia/Shanghai -``` + ```shell + cdc cli changefeed create --sink-uri="mysql://root@127.0.0.1:3306/" --tz=Asia/Shanghai + ``` -> **Note:** -> -> In MySQL, CST refers to the China Standard Time (UTC+08:00). Usually you cannot use `CST` directly in your system, but use `Asia/Shanghai` instead. + > **Note:** + > + > In MySQL, CST refers to the China Standard Time (UTC+08:00). Usually you cannot use `CST` directly in your system, but use `Asia/Shanghai` instead. Be cautious when you set the time zone of the TiCDC server, because the time zone will be used for the conversion of time type. It is recommended that you use the same time zone in the upstream and downstream databases, and specify the time zone using `--tz` when you start the TiCDC server. The TiCDC server chooses its time zone in the following priority: -- TiCDC first uses the time zone specified by `--tz`. -- When the above parameter is not available, TiCDC tries to read the timezone set by the `TZ` environment variable. -- When the above environment variable is not available, TiCDC uses the default time zone of the machine. +1. TiCDC first uses the time zone specified by `--tz`. +2. When the above parameter is not available, TiCDC tries to read the timezone set by the `TZ` environment variable. +3. When the above environment variable is not available, TiCDC uses the default time zone of the machine. ## How do I handle the incompatibility of configuration files caused by TiCDC upgrade? From 6df6ee1695dd2d7f507bd3018ae6483a5e3cb02c Mon Sep 17 00:00:00 2001 From: Ran Date: Mon, 3 Aug 2020 11:41:39 +0800 Subject: [PATCH 4/7] fix lint Signed-off-by: Ran --- ticdc/troubleshoot-ticdc.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ticdc/troubleshoot-ticdc.md b/ticdc/troubleshoot-ticdc.md index 2aab8cc6e710d..e7ee5fd6bfc7a 100644 --- a/ticdc/troubleshoot-ticdc.md +++ b/ticdc/troubleshoot-ticdc.md @@ -236,4 +236,4 @@ In TiCDC Open Protocol, the type code `6` represents `null`. |:--|:--|:--|:--| | Null | 6 | `{"t":6,"v":null}` | | -For more information, refer to [TiCDC Open Protocol column type code](ticdc/ticdc-open-protocol.md#column-type-code). +For more information, refer to [TiCDC Open Protocol column type code](/ticdc/ticdc-open-protocol.md#column-type-code). From 91e8aef4741ba15a1254046a5135a0e61f670764 Mon Sep 17 00:00:00 2001 From: Ran Date: Wed, 5 Aug 2020 10:35:35 +0800 Subject: [PATCH 5/7] Apply suggestions from code review Co-authored-by: TomShawn <41534398+TomShawn@users.noreply.github.com> --- ticdc/ticdc-overview.md | 14 +++++++------- ticdc/troubleshoot-ticdc.md | 26 +++++++++++++------------- 2 files changed, 20 insertions(+), 20 deletions(-) diff --git a/ticdc/ticdc-overview.md b/ticdc/ticdc-overview.md index 22e41ffa7310c..4c477092583b3 100644 --- a/ticdc/ticdc-overview.md +++ b/ticdc/ticdc-overview.md @@ -46,27 +46,27 @@ Currently, the TiCDC sink component supports replicating data to the following d ### Ensure replication order and consistency -#### Data replication order +#### Replication order -- For all DDL and DML statements, TiCDC outputs **at least once**. +- For all DDL or DML statements, TiCDC outputs them **at least once**. - When the TiKV or TiCDC cluster encounters failure, TiCDC might send the same DDL/DML statement repeatedly. For duplicated DDL/DML statements: - MySQL sink can execute DDL statements repeatedly. For DDL statements that can be executed repeatedly in the downstream, such as `truncate table`, the statement is executed successfully. For those that cannot be executed repeatedly, such as `create table`, the execution fails, and TiCDC ignores the error and continues the replication. - Kafka sink sends messages repeatedly, but the duplicate messages do not affect the constraints of `Resolved Ts`. Users can filter the duplicated messages from Kafka consumers. -#### Data replication consistency +#### Replication consistency - MySQL sink - - TiCDC does not split in-table transactions. This is to **ensure** the transaction consistency within a single table. However, TiCDC does **not ensure** the transaction order in the upstream table. + - TiCDC does not split in-table transactions. This is to **ensure** the transactional consistency within a single table. However, TiCDC does **not ensure** that the transactional order in the upstream table is consistent. - TiCDC splits cross-table transactions in the unit of tables. TiCDC does **not ensure** that cross-table transactions are always consistent. - TiCDC **ensures** that the order of single-row updates are consistent with that in the upstream. - Kafka sink - - TiCDC provides different strategies for data distribution. You can distribute data to different Kafka partitions based on the table, primary key, or ts. - - For different distribution strategies, the different implementation of consumers can achieve different levels of consistency, including row-level consistency, eventual consistency, or cross-table transaction consistency. - - TiCDC does not have an implementation of Kafka consumers, but only provides [TiCDC Open Protocol](/ticdc/ticdc-open-protocol.md). You can implement the Kafka consumer according to the protocol. + - TiCDC provides different strategies for data distribution. You can distribute data to different Kafka partitions based on the table, primary key, or timestamp. + - For different distribution strategies, the different consumer implementations can achieve different levels of consistency, including row-level consistency, eventual consistency, or cross-table transactional consistency. + - TiCDC does not have an implementation of Kafka consumers, but only provides [TiCDC Open Protocol](/ticdc/ticdc-open-protocol.md). You can implement the Kafka consumer according to this protocol. ## Restrictions diff --git a/ticdc/troubleshoot-ticdc.md b/ticdc/troubleshoot-ticdc.md index e7ee5fd6bfc7a..d02f54bb86b51 100644 --- a/ticdc/troubleshoot-ticdc.md +++ b/ticdc/troubleshoot-ticdc.md @@ -17,7 +17,7 @@ The `start-ts` of a replication task corresponds to a Timestamp Oracle (TSO) in If you do not specify `start-ts`, or specify `start-ts` as `0`, when a replication task is started, TiCDC gets a current TSO and starts the task from this TSO. -## Why can't I replicate some tables when I create a task in TiCDC? +## Why can't some tables be replicated when I create a task in TiCDC? When you execute `cdc cli changefeed create` to create a replication task, TiCDC checks whether the upstream tables meet the [replication restrictions](/ticdc/ticdc-overview.md#restrictions). If some tables do not meet the restrictions, `some tables are not eligible to replicate` is returned with a list of ineligible tables. You can choose `Y` or `y` to continue creating the task, and all updates on these tables are automatically ignored during the replication. If you choose an input other than `Y` or `y`, the replication task is not created. @@ -58,7 +58,7 @@ cdc cli changefeed create --pd=http://10.0.10.25:2379 --start-ts=415238226621235 ## How do I handle the `Error 1298: Unknown or incorrect time zone: 'UTC'` error when creating the replication task or replicating data to MySQL? -This error is returned when the downstream MySQL does not load the time zone. You can load the time zone by running [`mysql_tzinfo_to_sql`](https://dev.mysql.com/doc/refman/8.0/en/mysql-tzinfo-to-sql.html). After loading the time zone, you can create tasks and migrate data normally. +This error is returned when the downstream MySQL does not load the time zone. You can load the time zone by running [`mysql_tzinfo_to_sql`](https://dev.mysql.com/doc/refman/8.0/en/mysql-tzinfo-to-sql.html). After loading the time zone, you can create tasks and replicate data normally. {{< copyable "shell-regular" >}} @@ -74,7 +74,7 @@ Warning: Unable to load '/usr/share/zoneinfo/zone.tab' as time zone. Skipping it Warning: Unable to load '/usr/share/zoneinfo/zone1970.tab' as time zone. Skipping it. ``` -If you use MySQL in a special cloud environment, such Aliyun RDS, and if you do not have the permission to modify MySQL, you need to specify the time zone using the `--tz` parameter: +If you use MySQL in a special public cloud environment, such Alibaba Cloud RDS, and if you do not have the permission to modify MySQL, you need to specify the time zone using the `--tz` parameter: 1. Query the time zone used by MySQL: @@ -110,10 +110,10 @@ Be cautious when you set the time zone of the TiCDC server, because the time zon The TiCDC server chooses its time zone in the following priority: 1. TiCDC first uses the time zone specified by `--tz`. -2. When the above parameter is not available, TiCDC tries to read the timezone set by the `TZ` environment variable. -3. When the above environment variable is not available, TiCDC uses the default time zone of the machine. +2. When `--tz` is not available, TiCDC tries to read the time zone set by the `TZ` environment variable. +3. When the `TZ` environment variable is not available, TiCDC uses the default time zone of the machine. -## How do I handle the incompatibility of configuration files caused by TiCDC upgrade? +## How do I handle the incompatibility issue of configuration files caused by TiCDC upgrade? Refer to [Notes for compatibility](/ticdc/manage-ticdc.md#notes-for-compatibility). @@ -130,7 +130,7 @@ cdc cli changefeed create --pd=http://10.0.10.25:2379 --sink-uri="kafka://127.0. > **Note:** > > * This feature is introduced in TiCDC 4.0.2. -> * TiCDC currently only supports outputting data changes in the Canal format to Kafka. +> * TiCDC currently supports outputting data changes in the Canal format only to Kafka. For more information, refer to [Create a replication task](/ticdc/manage-ticdc.md#create-a-replication-task). @@ -163,7 +163,7 @@ The expected output is as follows: * `normal`: The task runs normally. * `stopped`: The task is stopped manually or encounters an error. - * `removed`: The task is deleted. + * `removed`: The task is removed. > **Note:** > @@ -186,14 +186,14 @@ In the output of this command, `admin-job-type` shows the state of the replicati * `2`: Resumed. The replication task resumes from `checkpoint-ts`. * `3`: Removed. When the task is removed, all replicated `processor`s are ended, and the configuration information of the replication task is cleared up. Only the replication status is retained for later queries. -## Why does the latency from TiCDC to Kafka become larger and larger? +## Why does the latency from TiCDC to Kafka become higher and higher? * Check [whether the status of the replication task is normal](#how-do-i-know-whether-the-replication-task-runs-normally). * Adjust the following parameters of Kafka: - * Increase `message.max.bytes` in `server.properties` to `1073741824` (1 GB). - * Increase `replica.fetch.max.bytes` in `server.properties` to `1073741824` (1 GB). - * Increase `fetch.message.max.bytes` in `consumer.properties` to make it larger than `message.max.bytes`. + * Increase the `message.max.bytes` value in `server.properties` to `1073741824` (1 GB). + * Increase the `replica.fetch.max.bytes` value in `server.properties` to `1073741824` (1 GB). + * Increase the `fetch.message.max.bytes` value in `consumer.properties` to make it larger than the `message.max.bytes` value. ## When TiCDC replicates data to Kafka, does it write all the changes in a transaction into one message? If not, on what basis does it divide the changes? @@ -207,7 +207,7 @@ No. Currently TiCDC sets the maximum size of batch messages to 512 MB, and that ## When TiCDC replicates data to Kafka, does a message contain multiple types of data changes? -Yes. A single might contain multiple `update`s or `delete`s, and `update` and `delete` might co-exist. +Yes. A single message might contain multiple `update`s or `delete`s, and `update` and `delete` might co-exist. ## When TiCDC replicates data to Kafka, how do I view the timestamp, table name, and schema name in the output of TiCDC Open Protocol? From d4b4f9bc24dfea5446ea642b931e525f0b0cee77 Mon Sep 17 00:00:00 2001 From: Ran Date: Fri, 7 Aug 2020 18:32:07 +0800 Subject: [PATCH 6/7] Update ticdc/troubleshoot-ticdc.md Co-authored-by: amyangfei --- ticdc/troubleshoot-ticdc.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ticdc/troubleshoot-ticdc.md b/ticdc/troubleshoot-ticdc.md index d02f54bb86b51..11379e36143ba 100644 --- a/ticdc/troubleshoot-ticdc.md +++ b/ticdc/troubleshoot-ticdc.md @@ -80,7 +80,7 @@ If you use MySQL in a special public cloud environment, such Alibaba Cloud RDS, {{< copyable "shell-regular" >}} - ```shell + ```sql show variables like '%time_zone%'; ``` From c3d58ad87d6d5630eeec7b0cd386affff1a96f47 Mon Sep 17 00:00:00 2001 From: TomShawn <41534398+TomShawn@users.noreply.github.com> Date: Fri, 7 Aug 2020 18:35:10 +0800 Subject: [PATCH 7/7] Update ticdc/troubleshoot-ticdc.md --- ticdc/troubleshoot-ticdc.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ticdc/troubleshoot-ticdc.md b/ticdc/troubleshoot-ticdc.md index 11379e36143ba..3e75603abf2e7 100644 --- a/ticdc/troubleshoot-ticdc.md +++ b/ticdc/troubleshoot-ticdc.md @@ -78,7 +78,7 @@ If you use MySQL in a special public cloud environment, such Alibaba Cloud RDS, 1. Query the time zone used by MySQL: - {{< copyable "shell-regular" >}} + {{< copyable "sql" >}} ```sql show variables like '%time_zone%';