Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cdc add docs about handle-key only #14393

Merged
merged 4 commits into from
Aug 3, 2023
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
65 changes: 65 additions & 0 deletions ticdc/ticdc-sink-to-kafka.md
Original file line number Diff line number Diff line change
Expand Up @@ -280,3 +280,68 @@ You can query the number of Regions a table contains by the following SQL statem
```sql
SELECT COUNT(*) FROM INFORMATION_SCHEMA.TIKV_REGION_STATUS WHERE DB_NAME="database1" AND TABLE_NAME="table1" AND IS_INDEX=0;
```

## Handle messages that exceed the Kafka topic limit

Kafka topic sets a limit on the size of messages it can receive. This limit is controlled by the [`max.message.bytes`](https://kafka.apache.org/documentation/#topicconfigs_max.message.bytes) parameter. When TiCDC Kafka sink sends data, if the data size exceeds this limit, the changefeed reports an error and cannot proceed to replicate data. To solve this problem, TiCDC provides the following solution.
ran-huang marked this conversation as resolved.
Show resolved Hide resolved

### Send handle keys only

Starting from v7.3.0, TiCDC Kafka sink supports sending only the handle keys when the message size exceeds the limit. This can significantly reduce the message size and avoid changefeed errors and task failures caused by the message size exceeding the Kafka topic limit. Handle Key refers to the following:

* If the table to be replicated has primary key, the primary key is the handle key.
* If the table does not have primary key but has Not NULL Unique Key, the Not NULL Unique Key is the handle key.
ran-huang marked this conversation as resolved.
Show resolved Hide resolved

Currently, this feature supports two encoding protocols: Canal-JSON and Open-Protocol. When using the Canal-JSON protocol, you must specify `enable-tidb-extension=true` in `sink-uri`.
ran-huang marked this conversation as resolved.
Show resolved Hide resolved

The sample configuration is as follows:

```toml
[sink.kafka-config.large-message-handle]
# This configuration is introduced in v7.3.0.
# Empty by default, which means when the message size exceeds the limit, the changefeed fails.
# If the configuration is set to "handle-key-only", when the message size exceeds the limit, only the handle key is sent in the data field. If the message size still exceeds the limit, the changefeed fails.
ran-huang marked this conversation as resolved.
Show resolved Hide resolved
large-message-handle-option = "handle-key-only"
```

### Consume messages with handle keys only

The message format with handle keys only is as follows:

```json
{
"id": 0,
"database": "test",
"table": "tp_int",
"pkNames": [
"id"
],
"isDdl": false,
"type": "INSERT",
"es": 1639633141221,
"ts": 1639633142960,
"sql": "",
"sqlType": {
"id": 4
},
"mysqlType": {
"id": "int"
},
"data": [
{
"id": "2"
}
],
"old": null,
"_tidb": { // TiDB extension fields
"commitTs": 163963314122145239,
"onlyHandleKey": true
}
}
```

When a Kafka consumer receives a message, it first checks the `onlyHandleKey` field. If this field exists and is `true`, it means that the message only contains the handle key of the complete data. At this time, you need to query the upstream TiDB and read the complete data by using [`tidb_snapshot` to read historical data](/read-historical-data.md).
ran-huang marked this conversation as resolved.
Show resolved Hide resolved

> **Warning:**
>
> When the Kafka consumer processes data and queries TiDB, the data might have been deleted by GC. You need to [modify the GC Lifetime of the TiDB cluster](/system-variables.md#tidb_gc_life_time-new-in-v50) to a larger value to avoid this situation.