Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 24 additions & 17 deletions ticdc/manage-ticdc.md
Original file line number Diff line number Diff line change
Expand Up @@ -230,7 +230,7 @@ In the above command:
- `resolved-ts`: The largest transaction `TS` in the current `changefeed`. Note that this `TS` has been successfully sent from TiKV to TiCDC.
- `checkpoint-ts`: The largest transaction `TS` in the current `changefeed` that has been successfully written to the downstream.
- `admin-job-type`: The status of a `changefeed`:
- `0`: The state is normal. It is the initial status.
- `0`: The state is normal.
- `1`: The task is paused. When the task is paused, all replicated `processor`s exit. The configuration and the replication status of the task are retained, so you can resume the task from `checkpiont-ts`.
- `2`: The task is resumed. The replication task resumes from `checkpoint-ts`.
- `3`: The task is removed. When the task is removed, all replicated `processor`s are ended, and the configuration information of the replication task is cleared up. Only the replication status is retained for later queries.
Expand Down Expand Up @@ -302,29 +302,36 @@ In the above command:
{{< copyable "shell-regular" >}}

```shell
cdc cli processor query --pd=http://10.0.10.25:2379 --changefeed-id=28c43ffc-2316-4f4f-a70b-d1a7c59ba79f
cdc cli processor query --pd=http://10.0.10.25:2379 --changefeed-id=28c43ffc-2316-4f4f-a70b-d1a7c59ba79f --capture-id=b293999a-4168-4988-a4f4-35d9589b226b
```

```
{
"status": {
"table-infos": [
{
"id": 45,
"start-ts": 415241823337054209
}
],
"table-p-lock": null,
"table-c-lock": null,
"admin-job-type": 0
},
"position": {
"checkpoint-ts": 415241893447467009,
"resolved-ts": 415241893971492865
}
"status": {
"tables": {
"56": { # ID of the replication table, corresponding to tidb_table_id of a table in TiDB
"start-ts": 417474117955485702,
"mark-table-id": 0 # ID of mark tables in the cyclic replication, corresponding to tidb_table_id of mark tables in TiDB
}
},
"operation": null,
"admin-job-type": 0
},
"position": {
"checkpoint-ts": 417474143881789441,
"resolved-ts": 417474143881789441,
"count": 0
}
}
```

In the command above:

- `status.tables`: Each key number represents the ID of the replication table, corresponding to `tidb_table_id` of a table in TiDB.
- `mark-table-id`: The ID of mark tables in the cyclic replication, corresponding to `tidb_table_id` of mark tables in TiDB.
- `resolved-ts`: The largest TSO among the sorted data in the current processor.
- `checkpoint-ts`: The largest TSO that has been successfully written to the downstream in the current processor.

## Use HTTP interface to manage cluster status and data replication task

Currently, the HTTP interface provides some basic features for query and maintenance.
Expand Down
26 changes: 25 additions & 1 deletion ticdc/ticdc-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ The architecture of TiCDC is shown in the following figure:
- `capture`: The operating process of TiCDC. Multiple `capture`s form a TiCDC cluster that replicates KV change logs.

- Each `capture` pulls a part of KV change logs.
- Sorts the pulled the KV change log(s).
- Sorts the pulled KV change log(s).
- Restores the transaction to downstream or outputs the log based on the TiCDC open protocol.

## Replication features
Expand All @@ -44,6 +44,30 @@ Currently, the TiCDC sink component supports replicating data to the following d
- Databases compatible with MySQL protocol. The sink component provides the final consistency support.
- Kafka based on the TiCDC Open Protocol. The sink component ensures the row-level order, final consistency or strict transactional consistency.

### Ensure replication order and consistency

#### Replication order

- For all DDL or DML statements, TiCDC outputs them **at least once**.
- When the TiKV or TiCDC cluster encounters failure, TiCDC might send the same DDL/DML statement repeatedly. For duplicated DDL/DML statements:

- MySQL sink can execute DDL statements repeatedly. For DDL statements that can be executed repeatedly in the downstream, such as `truncate table`, the statement is executed successfully. For those that cannot be executed repeatedly, such as `create table`, the execution fails, and TiCDC ignores the error and continues the replication.
- Kafka sink sends messages repeatedly, but the duplicate messages do not affect the constraints of `Resolved Ts`. Users can filter the duplicated messages from Kafka consumers.

#### Replication consistency

- MySQL sink

- TiCDC does not split in-table transactions. This is to **ensure** the transactional consistency within a single table. However, TiCDC does **not ensure** that the transactional order in the upstream table is consistent.
- TiCDC splits cross-table transactions in the unit of tables. TiCDC does **not ensure** that cross-table transactions are always consistent.
- TiCDC **ensures** that the order of single-row updates are consistent with that in the upstream.

- Kafka sink

- TiCDC provides different strategies for data distribution. You can distribute data to different Kafka partitions based on the table, primary key, or timestamp.
- For different distribution strategies, the different consumer implementations can achieve different levels of consistency, including row-level consistency, eventual consistency, or cross-table transactional consistency.
- TiCDC does not have an implementation of Kafka consumers, but only provides [TiCDC Open Protocol](/ticdc/ticdc-open-protocol.md). You can implement the Kafka consumer according to this protocol.

## Restrictions

To replicate data to TiDB or MySQL, you must ensure that the following requirements are satisfied to guarantee data correctness:
Expand Down
190 changes: 186 additions & 4 deletions ticdc/troubleshoot-ticdc.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ aliases: ['/docs/dev/ticdc/troubleshoot-ticdc/']

This document introduces the common issues and errors that you might encounter when using TiCDC, and the corresponding maintenance and troubleshooting methods.

## How to choose `start-ts` when starting a task
## How do I choose `start-ts` when creating a task in TiCDC?

The `start-ts` of a replication task corresponds to a Timestamp Oracle (TSO) in the upstream TiDB cluster. TiCDC requests data from this TSO in a replication task. Therefore, the `start-ts` of the replication task must meet the following requirements:

Expand All @@ -17,11 +17,11 @@ The `start-ts` of a replication task corresponds to a Timestamp Oracle (TSO) in

If you do not specify `start-ts`, or specify `start-ts` as `0`, when a replication task is started, TiCDC gets a current TSO and starts the task from this TSO.

## Some tables cannot be replicated when you start a task
## Why can't some tables be replicated when I create a task in TiCDC?

When you execute `cdc cli changefeed create` to create a replication task, TiCDC checks whether the upstream tables meet the [replication restrictions](/ticdc/ticdc-overview.md#restrictions). If some tables do not meet the restrictions, `some tables are not eligible to replicate` is returned with a list of ineligible tables. You can choose `Y` or `y` to continue creating the task, and all updates on these tables are automatically ignored during the replication. If you choose an input other than `Y` or `y`, the replication task is not created.

## How to handle replication interruption
## How do I handle replication interruption?

A replication task might be interrupted in the following known scenarios:

Expand All @@ -38,7 +38,7 @@ A replication task might be interrupted in the following known scenarios:
2. Use the new task configuration file and add the `ignore-txn-start-ts` parameter to skip the transaction corresponding to the specified `start-ts`.
3. Stop the old replication task via HTTP API. Execute `cdc cli changefeed create` to create a new task and specify the new task configuration file. Specify `checkpoint-ts` recorded in step 1 as the `start-ts` and start a new task to resume the replication.

## `gc-ttl` and file sorting
## What is `gc-ttl` and file sorting in TiCDC?

Since v4.0.0-rc.1, PD supports external services in setting the service-level GC safepoint. Any service can register and update its GC safepoint. PD ensures that the key-value data smaller than this GC safepoint is not cleaned by GC. Enabling this feature in TiCDC ensures that the data to be consumed by TiCDC is retained in TiKV without being cleaned by GC when the replication task is unavailable or interrupted.

Expand All @@ -55,3 +55,185 @@ cdc cli changefeed create --pd=http://10.0.10.25:2379 --start-ts=415238226621235
> **Note:**
>
> TiCDC (the 4.0 version) does not support dynamically modifying the file sorting and memory sorting yet.

## How do I handle the `Error 1298: Unknown or incorrect time zone: 'UTC'` error when creating the replication task or replicating data to MySQL?

This error is returned when the downstream MySQL does not load the time zone. You can load the time zone by running [`mysql_tzinfo_to_sql`](https://dev.mysql.com/doc/refman/8.0/en/mysql-tzinfo-to-sql.html). After loading the time zone, you can create tasks and replicate data normally.

{{< copyable "shell-regular" >}}

```shell
mysql_tzinfo_to_sql /usr/share/zoneinfo | mysql -u root mysql -p
```

```
Enter password:
Warning: Unable to load '/usr/share/zoneinfo/iso3166.tab' as time zone. Skipping it.
Warning: Unable to load '/usr/share/zoneinfo/leap-seconds.list' as time zone. Skipping it.
Warning: Unable to load '/usr/share/zoneinfo/zone.tab' as time zone. Skipping it.
Warning: Unable to load '/usr/share/zoneinfo/zone1970.tab' as time zone. Skipping it.
```

If you use MySQL in a special public cloud environment, such Alibaba Cloud RDS, and if you do not have the permission to modify MySQL, you need to specify the time zone using the `--tz` parameter:

1. Query the time zone used by MySQL:

{{< copyable "sql" >}}

```sql
show variables like '%time_zone%';
```

```
+------------------+--------+
| Variable_name | Value |
+------------------+--------+
| system_time_zone | CST |
| time_zone | SYSTEM |
+------------------+--------+
```

2. Specify the time zone when you create the replication task and create the TiCDC service:

{{< copyable "shell-regular" >}}

```shell
cdc cli changefeed create --sink-uri="mysql://root@127.0.0.1:3306/" --tz=Asia/Shanghai
```

> **Note:**
>
> In MySQL, CST refers to the China Standard Time (UTC+08:00). Usually you cannot use `CST` directly in your system, but use `Asia/Shanghai` instead.

Be cautious when you set the time zone of the TiCDC server, because the time zone will be used for the conversion of time type. It is recommended that you use the same time zone in the upstream and downstream databases, and specify the time zone using `--tz` when you start the TiCDC server.

The TiCDC server chooses its time zone in the following priority:

1. TiCDC first uses the time zone specified by `--tz`.
2. When `--tz` is not available, TiCDC tries to read the time zone set by the `TZ` environment variable.
3. When the `TZ` environment variable is not available, TiCDC uses the default time zone of the machine.

## How do I handle the incompatibility issue of configuration files caused by TiCDC upgrade?

Refer to [Notes for compatibility](/ticdc/manage-ticdc.md#notes-for-compatibility).

## Does TiCDC support outputting data changes in the Canal format?

Yes. To enable Canal output, specify the protocol as `canal` in the `--sink-uri` parameter. For example:

{{< copyable "shell-regular" >}}

```shell
cdc cli changefeed create --pd=http://10.0.10.25:2379 --sink-uri="kafka://127.0.0.1:9092/cdc-test?kafka-version=2.4.0&protocol=canal" --config changefeed.toml
```

> **Note:**
>
> * This feature is introduced in TiCDC 4.0.2.
> * TiCDC currently supports outputting data changes in the Canal format only to Kafka.

For more information, refer to [Create a replication task](/ticdc/manage-ticdc.md#create-a-replication-task).

## How do I view the latency of TiCDC replication tasks?

To view the latency of TiCDC replication tasks, use `cdc cli`. For example:

{{< copyable "shell-regular" >}}

```shell
cdc cli changefeed list --pd=http://10.0.10.25:2379
```

The expected output is as follows:

```json
[{
"id": "4e24dde6-53c1-40b6-badf-63620e4940dc",
"summary": {
"state": "normal",
"tso": 417886179132964865,
"checkpoint": "2020-07-07 16:07:44.881",
"error": null
}
}]
```

* `checkpoint`: TiCDC has replicated all data before this timestamp to downstream.
* `state`: The state of the replication task:

* `normal`: The task runs normally.
* `stopped`: The task is stopped manually or encounters an error.
* `removed`: The task is removed.

> **Note:**
>
> This feature is introduced in TiCDC 4.0.3.

## How do I know whether the replication task runs normally?

You can view the state of the replication tasks by using `cdc cli`. For example:

{{< copyable "shell-regular" >}}

```shell
cdc cli changefeed query --pd=http://10.0.10.25:2379 --changefeed-id 28c43ffc-2316-4f4f-a70b-d1a7c59ba79f
```

In the output of this command, `admin-job-type` shows the state of the replication task:

* `0`: Normal.
* `1`: Paused. When the task is paused, all replicated `processor`s exit. The configuration and the replication status of the task are retained, so you can resume the task from `checkpiont-ts`.
* `2`: Resumed. The replication task resumes from `checkpoint-ts`.
* `3`: Removed. When the task is removed, all replicated `processor`s are ended, and the configuration information of the replication task is cleared up. Only the replication status is retained for later queries.

## Why does the latency from TiCDC to Kafka become higher and higher?

* Check [whether the status of the replication task is normal](#how-do-i-know-whether-the-replication-task-runs-normally).
* Adjust the following parameters of Kafka:

* Increase the `message.max.bytes` value in `server.properties` to `1073741824` (1 GB).
* Increase the `replica.fetch.max.bytes` value in `server.properties` to `1073741824` (1 GB).
* Increase the `fetch.message.max.bytes` value in `consumer.properties` to make it larger than the `message.max.bytes` value.

## When TiCDC replicates data to Kafka, does it write all the changes in a transaction into one message? If not, on what basis does it divide the changes?

No. According to the different distribution strategies configured, TiCDC divides the changes on different bases, including `default`, `row id`, `table`, and `ts`.

For more information, refer to [Replication task configuration file](/ticdc/manage-ticdc.md#task-configuration-file).

## When TiCDC replicates data to Kafka, can I control the maximum size of a single message in TiDB?

No. Currently TiCDC sets the maximum size of batch messages to 512 MB, and that of a single message to 4 MB.

## When TiCDC replicates data to Kafka, does a message contain multiple types of data changes?

Yes. A single message might contain multiple `update`s or `delete`s, and `update` and `delete` might co-exist.

## When TiCDC replicates data to Kafka, how do I view the timestamp, table name, and schema name in the output of TiCDC Open Protocol?

The information is included in the key of Kafka messages. For example:

```json
{
"ts":<TS>,
"scm":<Schema Name>,
"tbl":<Table Name>,
"t":1
}
```

For more information, refer to [TiCDC Open Protocol event format](/ticdc/ticdc-open-protocol.md#event-format).

## When TiCDC replicates data to Kafka, how do I know the timestamp of the data changes in a message?

You can get the unix timestamp by moving `ts` in the key of the Kafka message by 18 bits to the right.

## How does TiCDC Open Protocol represent `null`?

In TiCDC Open Protocol, the type code `6` represents `null`.

| Type | Code | Output Example | Note |
|:--|:--|:--|:--|
| Null | 6 | `{"t":6,"v":null}` | |

For more information, refer to [TiCDC Open Protocol column type code](/ticdc/ticdc-open-protocol.md#column-type-code).