Skip to content

Commit

Permalink
Refactor ticdc (#11570)
Browse files Browse the repository at this point in the history
  • Loading branch information
shichun-0415 committed Dec 13, 2022
1 parent d633f2b commit d654c62
Show file tree
Hide file tree
Showing 31 changed files with 1,327 additions and 1,073 deletions.
40 changes: 25 additions & 15 deletions TOC.md
Original file line number Diff line number Diff line change
Expand Up @@ -491,6 +491,31 @@
- [FAQ](/dm/dm-faq.md)
- [Handle Errors](/dm/dm-error-handling.md)
- [Release Notes](/dm/dm-release-notes.md)
- TiCDC
- [Deploy and Maintain](/ticdc/deploy-ticdc.md)
- Changefeed
- [Overview](/ticdc/ticdc-changefeed-overview.md)
- Create Changefeeds
- [Replicate Data to MySQL-compatible Databases](/ticdc/ticdc-sink-to-mysql.md)
- [Replicate Data to Kafka](/ticdc/ticdc-sink-to-kafka.md)
- [Manage Changefeeds](/ticdc/ticdc-manage-changefeed.md)
- [Log Filter](/ticdc/ticdc-filter.md)
- [Bidirectional Replication](/ticdc/ticdc-bidirectional-replication.md)
- Monitor and Alert
- [Monitoring Metrics](/ticdc/monitor-ticdc.md)
- [Alert Rules](/ticdc/ticdc-alert-rules.md)
- Reference
- [TiCDC Server Configurations](/ticdc/ticdc-server-config.md)
- [TiCDC Changefeed Configurations](/ticdc/ticdc-changefeed-config.md)
- Output Protocols
- [TiCDC Avro Protocol](/ticdc/ticdc-avro-protocol.md)
- [TiCDC Canal-JSON Protocol](/ticdc/ticdc-canal-json.md)
- [TiCDC Open Protocol](/ticdc/ticdc-open-protocol.md)
- [TiCDC Open API](/ticdc/ticdc-open-api.md)
- [Compatibility](/ticdc/ticdc-compatibility.md)
- [Troubleshoot](/ticdc/troubleshoot-ticdc.md)
- [FAQs](/ticdc/ticdc-faq.md)
- [Glossary](/ticdc/ticdc-glossary.md)
- TiDB Binlog
- [Overview](/tidb-binlog/tidb-binlog-overview.md)
- [Quick Start](/tidb-binlog/get-started-with-tidb-binlog.md)
Expand All @@ -511,21 +536,6 @@
- [Troubleshoot](/tidb-binlog/troubleshoot-tidb-binlog.md)
- [Handle Errors](/tidb-binlog/handle-tidb-binlog-errors.md)
- [FAQ](/tidb-binlog/tidb-binlog-faq.md)
- TiCDC
- [Overview](/ticdc/ticdc-overview.md)
- [Deploy](/ticdc/deploy-ticdc.md)
- [Maintain](/ticdc/manage-ticdc.md)
- Monitor and Alert
- [Monitoring Metrics](/ticdc/monitor-ticdc.md)
- [Alert Rules](/ticdc/ticdc-alert-rules.md)
- [Troubleshoot](/ticdc/troubleshoot-ticdc.md)
- Reference
- [TiCDC OpenAPI](/ticdc/ticdc-open-api.md)
- [TiCDC Open Protocol](/ticdc/ticdc-open-protocol.md)
- [TiCDC Avro Protocol](/ticdc/ticdc-avro-protocol.md)
- [TiCDC Canal-JSON Protocol](/ticdc/ticdc-canal-json.md)
- [FAQs](/ticdc/ticdc-faq.md)
- [Glossary](/ticdc/ticdc-glossary.md)
- sync-diff-inspector
- [Overview](/sync-diff-inspector/sync-diff-inspector-overview.md)
- [Data Check for Tables with Different Schema/Table Names](/sync-diff-inspector/route-diff.md)
Expand Down
23 changes: 11 additions & 12 deletions enable-tls-between-components.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,16 @@ Currently, it is not supported to only enable encrypted transmission of some spe

- TiCDC

Configure in the command-line arguments and set the corresponding URL to `https`:
Configure in the configuration file:

```toml
[security]
ca-path = "/path/to/ca.pem"
cert-path = "/path/to/cdc-server.pem"
key-path = "/path/to/cdc-server-key.pem"
```

Alternatively, configure in the command-line arguments and set the corresponding URL to `https`:

{{< copyable "shell-regular" >}}

Expand Down Expand Up @@ -182,16 +191,6 @@ To verify component caller's identity, you need to mark the certificate user ide
cert-allowed-cn = ["TiKV-Server", "TiDB-Server", "PD-Control"]
```

- TiCDC

Configure in the command-line arguments:

{{< copyable "shell-regular" >}}

```bash
cdc server --pd=https://127.0.0.1:2379 --log-file=ticdc.log --addr=0.0.0.0:8301 --advertise-addr=127.0.0.1:8301 --ca=/path/to/ca.pem --cert=/path/to/ticdc-cert.pem --key=/path/to/ticdc-key.pem --cert-allowed-cn="client1,client2"
```

- TiFlash (New in v4.0.5)

Configure in the `tiflash.toml` file or command-line arguments:
Expand All @@ -207,7 +206,7 @@ To verify component caller's identity, you need to mark the certificate user ide
[security]
cert-allowed-cn = ["PD-Server", "TiKV-Server", "TiFlash-Server"]
```

### Reload certificates

To reload the certificates and the keys, TiDB, PD, TiKV, and all kinds of clients reread the current certificates and the key files each time a new connection is created. Currently, you cannot reload the CA certificate.
Expand Down
2 changes: 1 addition & 1 deletion migrate-from-tidb-to-mysql.md
Original file line number Diff line number Diff line change
Expand Up @@ -168,7 +168,7 @@ After setting up the environment, you can use [Dumpling](/dumpling-overview.md)
- `--changefeed-id`: changefeed ID, must be in the format of a regular expression, `^[a-zA-Z0-9]+(\-[a-zA-Z0-9]+)*$`
- `--start-ts`: start timestamp of the changefeed, must be the backup time (or BackupTS in the "Back up data" section in [Step 2. Migrate full data](#step-2-migrate-full-data))

For more information about the changefeed configurations, see [Task configuration file](/ticdc/manage-ticdc.md#task-configuration-file).
For more information about the changefeed configurations, see [Task configuration file](/ticdc/ticdc-changefeed-config.md).

3. Enable GC.

Expand Down
2 changes: 1 addition & 1 deletion migrate-from-tidb-to-tidb.md
Original file line number Diff line number Diff line change
Expand Up @@ -229,7 +229,7 @@ After setting up the environment, you can use the backup and restore functions o
- `--changefeed-id`: changefeed ID, must be in the format of a regular expression, ^[a-zA-Z0-9]+(\-[a-zA-Z0-9]+)*$
- `--start-ts`: start timestamp of the changefeed, must be the backup time (or BackupTS in the "Back up data" section in [Step 2. Migrate full data](#step-2-migrate-full-data))

For more information about the changefeed configurations, see [Task configuration file](/ticdc/manage-ticdc.md#task-configuration-file).
For more information about the changefeed configurations, see [Task configuration file](/ticdc/ticdc-changefeed-config.md).

3. Enable GC.

Expand Down
3 changes: 2 additions & 1 deletion production-deployment-using-tiup.md
Original file line number Diff line number Diff line change
Expand Up @@ -420,6 +420,7 @@ If you have deployed [TiFlash](/tiflash/tiflash-overview.md) along with the TiDB

If you have deployed [TiCDC](/ticdc/ticdc-overview.md) along with the TiDB cluster, see the following documents:

- [Manage TiCDC Cluster and Replication Tasks](/ticdc/manage-ticdc.md)
- [Changefeed Overview](/ticdc/ticdc-changefeed-overview.md)
- [Manage Changefeed](/ticdc/ticdc-manage-changefeed.md)
- [Troubleshoot TiCDC](/ticdc/troubleshoot-ticdc.md)
- [TiCDC FAQs](/ticdc/ticdc-faq.md)
2 changes: 1 addition & 1 deletion releases/release-5.0.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -341,7 +341,7 @@ Add a system variable [`tidb_allow_fallback_to_tikv`](/system-variables.md#tidb_

### Improve TiCDC stability and alleviate the OOM issue caused by replicating too much incremental data

[User document](/ticdc/manage-ticdc.md#unified-sorter), [#1150](https://github.com/pingcap/tiflow/issues/1150)
[User document](/ticdc/ticdc-manage-changefeed.md#unified-sorter), [#1150](https://github.com/pingcap/tiflow/issues/1150)

In TiCDC v4.0.9 or earlier versions, replicating too much data change might cause OOM. In v5.0, the Unified Sorter feature is enabled by default to mitigate OOM issues caused by the following scenarios:

Expand Down
2 changes: 1 addition & 1 deletion releases/release-5.3.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,7 @@ In v5.3, the key new features or improvements are as follows:

This feature supports TiCDC to replicate incremental data from a TiDB cluster to the secondary relational database TiDB/Aurora/MySQL/MariaDB. In case the primary cluster crashes, TiCDC can recover the secondary cluster to a certain snapshot in the primary cluster within 5 minutes, given the condition that before disaster the replication status of TiCDC is normal and replication lag is small. It allows data loss of less than 30 minutes, that is, RTO <= 5min, and RPO <= 30min.

[User document](/ticdc/manage-ticdc.md)
[User document](/ticdc/ticdc-sink-to-mysql.md#eventually-consistent-replication-in-disaster-scenarios)

- **TiCDC supports the HTTP protocol OpenAPI for managing TiCDC tasks**

Expand Down
6 changes: 3 additions & 3 deletions releases/release-6.0.0-dmr.md
Original file line number Diff line number Diff line change
Expand Up @@ -322,9 +322,9 @@ TiDB v6.0.0 is a DMR, and its version is 6.0.0-DMR.
| TiFlash | [`profiles.default.dt_compression_level`](/tiflash/tiflash-configuration.md#configure-the-tiflashtoml-file) | Newly added | Specifies the compression level of TiFlash. The default value is `1`. |
| DM | [`loaders.<name>.import-mode`](/dm/task-configuration-file-full.md#task-configuration-file-template-advanced) | Newly added | The import mode during the full import phase. Since v6.0, DM uses TiDB Lightning's TiDB-backend mode to import data during the full import phase; the previous Loader component is no longer used. This is an internal replacement and has no obvious impact on daily operations.<br/>The default value is set to `sql`, which means using `tidb-backend` mode. In some rare cases, `tidb-backend` might not be fully compatible. You can fall back to Loader mode by configuring this parameter to `loader`. |
| DM | [`loaders.<name>.on-duplicate`](/dm/task-configuration-file-full.md#task-configuration-file-template-advanced) | Newly added | Specifies the methods to resolve conflicts during the full import phase. The default value is `replace`, which means using the new data to replace the existing data. |
| TiCDC | [`dial-timeout`](/ticdc/manage-ticdc.md#configure-sink-uri-with-kafka) | Newly added | The timeout in establishing a connection with the downstream Kafka. The default value is `10s`. |
| TiCDC | [`read-timeout`](/ticdc/manage-ticdc.md#configure-sink-uri-with-kafka) | Newly added | The timeout in getting a response returned by the downstream Kafka. The default value is `10s`. |
| TiCDC | [`write-timeout`](/ticdc/manage-ticdc.md#configure-sink-uri-with-kafka) | Newly added | The timeout in sending a request to the downstream Kafka. The default value is `10s`. |
| TiCDC | [`dial-timeout`](/ticdc/ticdc-sink-to-kafka.md#configure-sink-uri-for-kafka) | Newly added | The timeout in establishing a connection with the downstream Kafka. The default value is `10s`. |
| TiCDC | [`read-timeout`](/ticdc/ticdc-sink-to-kafka.md#configure-sink-uri-for-kafka) | Newly added | The timeout in getting a response returned by the downstream Kafka. The default value is `10s`. |
| TiCDC | [`write-timeout`](/ticdc/ticdc-sink-to-kafka.md#configure-sink-uri-for-kafka) | Newly added | The timeout in sending a request to the downstream Kafka. The default value is `10s`. |

### Others

Expand Down
16 changes: 8 additions & 8 deletions releases/release-6.1.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -208,11 +208,11 @@ In 6.1.0, the key new features or improvements are as follows:

* TiCDC supports dispatching incremental data from TiDB to different Kafka topics by table, which, combined with the Canal-json format, allows sharing data directly with Flink.

[User document](/ticdc/manage-ticdc.md#customize-the-rules-for-topic-and-partition-dispatchers-of-kafka-sink), [#4423](https://github.com/pingcap/tiflow/issues/4423)
[User document](/ticdc/ticdc-sink-to-kafka.md#customize-the-rules-for-topic-and-partition-dispatchers-of-kafka-sink), [#4423](https://github.com/pingcap/tiflow/issues/4423)

* TiCDC supports SASL GSSAPI authentication types and adds SASL authentication examples using Kafka.

[User document](/ticdc/manage-ticdc.md#ticdc-uses-the-authentication-and-authorization-of-kafka), [#4423](https://github.com/pingcap/tiflow/issues/4423)
[User document](/ticdc/ticdc-sink-to-kafka.md#ticdc-uses-the-authentication-and-authorization-of-kafka), [#4423](https://github.com/pingcap/tiflow/issues/4423)

* TiCDC supports replicating `charset=GBK` tables.

Expand Down Expand Up @@ -270,12 +270,12 @@ In 6.1.0, the key new features or improvements are as follows:
| TiKV | [`storage.background-error-recovery-window`](/tikv-configuration-file.md#background-error-recovery-window-new-in-v610) | Newly added | The maximum recovery time is allowed after RocksDB detects a recoverable background error. |
| TiKV | [`storage.api-version`](/tikv-configuration-file.md#api-version-new-in-v610) | Newly added | The storage format and interface version used by TiKV when TiKV serves as the raw key-value store. |
| PD | [`schedule.max-store-preparing-time`](/pd-configuration-file.md#max-store-preparing-time-new-in-v610) | Newly added | Controls the maximum waiting time for the store to go online. |
| TiCDC | [`enable-tls`](/ticdc/manage-ticdc.md#configure-sink-uri-with-kafka) | Newly added | Whether to use TLS to connect to the downstream Kafka instance. |
| TiCDC | `sasl-gssapi-user`<br/>`sasl-gssapi-password`<br/>`sasl-gssapi-auth-type`<br/>`sasl-gssapi-service-name`<br/>`sasl-gssapi-realm`<br/>`sasl-gssapi-key-tab-path`<br/>`sasl-gssapi-kerberos-config-path` | Newly added | Used to support SASL/GSSAPI authentication for Kafka. For details, see [Configure sink URI with `kafka`](/ticdc/manage-ticdc.md#configure-sink-uri-with-kafka). |
| TiCDC | [`avro-decimal-handling-mode`](/ticdc/manage-ticdc.md#configure-sink-uri-with-kafka)<br/>[`avro-bigint-unsigned-handling-mode`](/ticdc/manage-ticdc.md#configure-sink-uri-with-kafka) | Newly added | Determines the output details of Avro format. |
| TiCDC | [`dispatchers.topic`](/ticdc/manage-ticdc.md#customize-the-rules-for-topic-and-partition-dispatchers-of-kafka-sink) | Newly added | Controls how TiCDC dispatches incremental data to different Kafka topics. |
| TiCDC | [`dispatchers.partition`](/ticdc/manage-ticdc.md#customize-the-rules-for-topic-and-partition-dispatchers-of-kafka-sink) | Newly added | `dispatchers.partition` is an alias for `dispatchers.dispatcher`. Controls how TiCDC dispatches incremental data to Kafka partitions. |
| TiCDC | [`schema-registry`](/ticdc/manage-ticdc.md#integrate-ticdc-with-kafka-connect-confluent-platform) | Newly added | Specifies the schema registry endpoint that stores Avro schema. |
| TiCDC | [`enable-tls`](/ticdc/ticdc-sink-to-kafka.md#configure-sink-uri-for-kafka) | Newly added | Whether to use TLS to connect to the downstream Kafka instance. |
| TiCDC | `sasl-gssapi-user`<br/>`sasl-gssapi-password`<br/>`sasl-gssapi-auth-type`<br/>`sasl-gssapi-service-name`<br/>`sasl-gssapi-realm`<br/>`sasl-gssapi-key-tab-path`<br/>`sasl-gssapi-kerberos-config-path` | Newly added | Used to support SASL/GSSAPI authentication for Kafka. For details, see [Configure sink URI with `kafka`](/ticdc/ticdc-sink-to-kafka.md#configure-sink-uri-for-kafka). |
| TiCDC | [`avro-decimal-handling-mode`](/ticdc/ticdc-sink-to-kafka.md#configure-sink-uri-for-kafka)<br/>[`avro-bigint-unsigned-handling-mode`](/ticdc/ticdc-sink-to-kafka.md#configure-sink-uri-for-kafka) | Newly added | Determines the output details of Avro format. |
| TiCDC | [`dispatchers.topic`](/ticdc/ticdc-sink-to-kafka.md#customize-the-rules-for-topic-and-partition-dispatchers-of-kafka-sink) | Newly added | Controls how TiCDC dispatches incremental data to different Kafka topics. |
| TiCDC | [`dispatchers.partition`](/ticdc/ticdc-sink-to-kafka.md#customize-the-rules-for-topic-and-partition-dispatchers-of-kafka-sink) | Newly added | `dispatchers.partition` is an alias for `dispatchers.dispatcher`. Controls how TiCDC dispatches incremental data to Kafka partitions. |
| TiCDC | [`schema-registry`](/ticdc/ticdc-sink-to-kafka.md#integrate-ticdc-with-kafka-connect-confluent-platform) | Newly added | Specifies the schema registry endpoint that stores Avro schema. |
| DM | `worker` in the `dmctl start-relay` command | Deleted | This parameter is not recommended for use. Will provide a simpler implementation. |
| DM | `relay-dir` in the source configuration file | Deleted | Replaced by the same configuration item in the worker configuration file. |
| DM | `is-sharding` in the task configuration file | Deleted | Replaced by the `shard-mode` configuration item. |
Expand Down
8 changes: 4 additions & 4 deletions releases/release-6.2.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ In v6.2.0-DMR, the key new features and improvements are as follows:
* [Point-in-time recovery (PITR)](/br/backup-and-restore-overview.md) is introduced to restore a snapshot of a TiDB cluster to a new cluster from any given time point in the past.
* TiDB Lightning supports [importing data to production clusters in the physical import mode](/tidb-lightning/tidb-lightning-physical-import-mode-usage.md#import-data-into-a-cluster-in-production).
* BR supports [restoring user and privilege data](/br/br-snapshot-guide.md#restore-tables-in-the-mysql-schema), making backup and restore smoother.
* TiCDC unlocks more data replication scenarios by supporting [filtering specific types of DDL events](/ticdc/manage-ticdc.md).
* TiCDC unlocks more data replication scenarios by supporting [filtering specific types of DDL events](/ticdc/ticdc-filter.md).
* The [`SAVEPOINT` mechanism](/sql-statements/sql-statement-savepoint.md) is supported, with which you can flexibly control the rollback points within a transaction.
* TiDB supports [adding, dropping, and modifying multiple columns or indexes with only one `ALTER TABLE` statement](/sql-statements/sql-statement-alter-table.md).
* [Cross-cluster RawKV replication](/tikv-configuration-file.md#api-version-new-in-v610) is now supported.
Expand Down Expand Up @@ -238,7 +238,7 @@ In v6.2.0-DMR, the key new features and improvements are as follows:

In some special occasions, you might want to set filter rules for incremental data change logs. For example, filtering high risk DDL events such as DROP TABLE. Starting from v6.2.0, TiCDC supports filtering DDL events of specified types and filtering DML events based on SQL expressions. This makes TiCDC applicable to more data replication scenarios.

[User document](/ticdc/manage-ticdc.md) [#6160](https://github.com/pingcap/tiflow/issues/6160) @[asddongmen](https://github.com/asddongmen)
[User document](/ticdc/ticdc-filter.md) [#6160](https://github.com/pingcap/tiflow/issues/6160) @[asddongmen](https://github.com/asddongmen)

## Compatibility changes

Expand Down Expand Up @@ -286,8 +286,8 @@ In v6.2.0-DMR, the key new features and improvements are as follows:
| TiFlash | [`storage.format_version`](/tiflash/tiflash-configuration.md#configure-the-tiflashtoml-file) | Modified | The default value of `format_version` changes to `4`, the default format for v6.2.0 and later versions, which reduces write amplification and background task resource consumption. |
| TiFlash | [profiles.default.dt_enable_read_thread](/tiflash/tiflash-configuration.md#configure-the-tiflashtoml-file) | Newly added | This configuration controls whether to use the thread pool to handle read requests from the storage engine. The default value is `false`. |
| TiFlash | [profiles.default.dt_page_gc_threshold](/tiflash/tiflash-configuration.md#configure-the-tiflashtoml-file) | Newly added | This configuration specifies the minimum ratio of valid data in a PageStorage data file. |
| TiCDC | [--overwrite-checkpoint-ts](/ticdc/manage-ticdc.md#resume-a-replication-task) | Newly added | This configuration is added to the `cdc cli changefeed resume` sub-command. |
| TiCDC | [--no-confirm](/ticdc/manage-ticdc.md#resume-a-replication-task) | Newly added | This configuration is added to the `cdc cli changefeed resume` sub-command.|
| TiCDC | [--overwrite-checkpoint-ts](/ticdc/ticdc-manage-changefeed.md#resume-a-replication-task) | Newly added | This configuration is added to the `cdc cli changefeed resume` sub-command. |
| TiCDC | [--no-confirm](/ticdc/ticdc-manage-changefeed.md#resume-a-replication-task) | Newly added | This configuration is added to the `cdc cli changefeed resume` sub-command.|
| DM | [mode](/dm/task-configuration-file-full.md#task-configuration-file-template-advanced) | Newly added | This configuration is a validator parameter. Optional values are `full`, `fast`, and `none`. The default value is `none`, which does not validate the data. |
| DM | [worker-count](/dm/task-configuration-file-full.md#task-configuration-file-template-advanced) | Newly added | This configuration is a validator parameter and specifies the number of validation workers in the background. The default value is `4`. |
| DM | [row-error-delay](/dm/task-configuration-file-full.md#task-configuration-file-template-advanced) | Newly added | This configuration is a validator parameter. If a row is not validated within the specified time, it will be marked as an error row. The default value is 30m, which means 30 minutes. |
Expand Down

0 comments on commit d654c62

Please sign in to comment.