-
Notifications
You must be signed in to change notification settings - Fork 708
docs: changefeed supports to sink to cloud storage (s3) #13771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
ti-chi-bot
merged 17 commits into
pingcap:release-6.5
from
benmaoer:changefeed-sink-to-s3
Jun 13, 2023
Merged
Changes from all commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
0cffa21
docs: changefeed supports to sink to cloud storage (s3)
benmaoer 8829de1
Update tidb-cloud/changefeed-sink-to-cloud-storage.md
benmaoer 6498d0f
Apply suggestions from code review
ran-huang d8a573a
clarify steps; add notes; use consistent description as other docs
ran-huang 00b3caa
Update changefeed-overview.md
benmaoer 82592bd
Update tidb-cloud/changefeed-sink-to-cloud-storage.md
benmaoer 948c874
Update tidb-cloud/changefeed-sink-to-cloud-storage.md
benmaoer 1a1df31
Update tidb-cloud/changefeed-sink-to-cloud-storage.md
benmaoer 55f16c5
Update tidb-cloud/changefeed-sink-to-cloud-storage.md
benmaoer 09374f9
Update tidb-cloud/changefeed-sink-to-cloud-storage.md
benmaoer 6c90090
Update tidb-cloud/changefeed-sink-to-cloud-storage.md
benmaoer 52aed34
Apply suggestions from code review
benmaoer 9ca65e0
Apply suggestions from code review
Oreoxmt b75f52f
Apply suggestions from code review
ran-huang da2bc52
Apply suggestions from code review
ran-huang b20a445
Merge branch 'release-6.5' into changefeed-sink-to-s3
Oreoxmt 2f9e9c5
fix format
Oreoxmt File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,91 @@ | ||
| --- | ||
| title: Sink to Cloud Storage | ||
| Summary: Learn how to create a changefeed to stream data from a TiDB Dedicated cluster to cloud storage, such as Amazon S3. | ||
| --- | ||
|
|
||
| # Sink to Cloud Storage | ||
|
|
||
| This document describes how to create a changefeed to stream data from TiDB Cloud to cloud storage. Currently, only Amazon S3 is supported. | ||
|
|
||
| > **Note:** | ||
| > | ||
| > - To stream data to cloud storage, make sure that your TiDB cluster version is v7.1.0 or later. To upgrade your TiDB Dedicated cluster to v7.1.0 or later, [contact TiDB Cloud Support](/tidb-cloud/tidb-cloud-support.md). | ||
| > - For [TiDB Serverless](/tidb-cloud/select-cluster-tier.md#tidb-serverless-beta) clusters, the changefeed feature is unavailable. | ||
|
|
||
| ## Restrictions | ||
|
|
||
| - For each TiDB Cloud cluster, you can create up to 5 changefeeds. | ||
| - Because TiDB Cloud uses TiCDC to establish changefeeds, it has the same [restrictions as TiCDC](https://docs.pingcap.com/tidb/stable/ticdc-overview#unsupported-scenarios). | ||
| - If the table to be replicated does not have a primary key or a non-null unique index, the absence of a unique constraint during replication could result in duplicated data being inserted downstream in some retry scenarios. | ||
|
|
||
ran-huang marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ## Create a changefeed | ||
|
|
||
| 1. Navigate to the cluster overview page of the target TiDB cluster, and then click **Changefeed** in the left navigation pane. | ||
|
|
||
| 2. Click **Create Changefeed**, and select **Amazon S3** as the destination. | ||
|
|
||
| 3. Fill in the fields in the **S3 Endpoint** area: `S3 URI`, `Access Key ID`, and `Secret Access Key`. | ||
|
|
||
|  | ||
|
|
||
ran-huang marked this conversation as resolved.
Show resolved
Hide resolved
Oreoxmt marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| 4. Click **Next** to establish the connection from the TiDB Dedicated cluster to Amazon S3. TiDB Cloud will automatically test and verify if the connection is successful. | ||
|
|
||
| - If yes, you are directed to the next step of configuration. | ||
| - If not, a connectivity error is displayed, and you need to handle the error. After the error is resolved, click **Next** to retry the connection. | ||
|
|
||
| 5. Customize **Table Filter** to filter the tables that you want to replicate. For the rule syntax, refer to [table filter rules](https://docs.pingcap.com/tidb/stable/ticdc-filter#changefeed-log-filters). | ||
|
|
||
|  | ||
|
|
||
Oreoxmt marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| - **Filter Rules**: you can set filter rules in this column. By default, there is a rule `*.*`, which stands for replicating all tables. When you add a new rule, TiDB Cloud queries all the tables in TiDB and displays only the tables that match the rules in the box on the right. | ||
| - **Tables with valid keys**: this column displays the tables that have valid keys, including primary keys or unique indexes. | ||
| - **Tables without valid keys**: this column shows tables that lack primary keys or unique keys. These tables present a challenge during replication because the absence of a unique identifier can result in inconsistent data when handling duplicate events downstream. To ensure data consistency, it is recommended to add unique keys or primary keys to these tables before initiating the replication. Alternatively, you can employ filter rules to exclude these tables. For example, you can exclude the table `test.tbl1` by using the rule `"!test.tbl1"`. | ||
|
|
||
| 6. In the **Start Replication Position** area, select one of the following replication positions: | ||
|
|
||
| - Start replication from now on | ||
| - Start replication from a specific [TSO](https://docs.pingcap.com/tidb/stable/glossary#tso) | ||
| - Start replication from a specific time | ||
|
|
||
| 7. In the **Data Format** area, select either the **CSV** or **Canal-JSON** format. | ||
|
|
||
| <SimpleTab> | ||
| <div label="Configure CSV format"> | ||
|
|
||
| To configure the **CSV** format, fill in the following fields: | ||
|
|
||
|  | ||
|
|
||
Oreoxmt marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| - **Date Separator**: To rotate data based on the year, month, and day, or choose not to rotate at all. | ||
| - **Delimiter**: Specify the character used to separate values in the CSV file. The comma (`,`) is the most commonly used delimiter. | ||
| - **Quote**: Specify the character used to enclose values that contain the delimiter character or special characters. Typically, double quotes (`"`) are used as the quote character. | ||
| - **Null/Empty Values**: Specify how null or empty values are represented in the CSV file. This is important for proper handling and interpretation of the data. | ||
| - **Include Commit Ts**: Control whether to include [`commit-ts`](https://docs.pingcap.com/tidb/stable/ticdc-sink-to-cloud-storage#replicate-change-data-to-storage-services) in the CSV row. | ||
|
|
||
| </div> | ||
| <div label="Configure Canal-JSON format"> | ||
|
|
||
| Canal-JSON is a plain JSON text format. To configure it, fill in the following fields: | ||
|
|
||
|  | ||
|
|
||
Oreoxmt marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| - **Date Separator**: To rotate data based on the year, month, and day, or choose not to rotate at all. | ||
| - **Enable TiDB Extension**: When you enable this option, TiCDC sends [WATERMARK events](https://docs.pingcap.com/tidb/stable/ticdc-canal-json#watermark-event) and adds the [TiDB extension field](https://docs.pingcap.com/tidb/stable/ticdc-canal-json#tidb-extension-field) to Canal-JSON messages. | ||
|
|
||
| </div> | ||
| </SimpleTab> | ||
|
|
||
| 8. Click **Next** to configure your changefeed specification. | ||
|
|
||
| - In the **Changefeed Specification** area, specify the number of Replication Capacity Units (RCUs) to be used by the changefeed. | ||
| - In the **Changefeed Name** area, specify a name for the changefeed. | ||
|
|
||
| 9. Click **Next** to review the changefeed configuration. | ||
|
|
||
| - If you have verified that all configurations are correct, click **Create** to proceed with the creation of the changefeed. | ||
|
|
||
| - If you need to modify any configurations, click **Previous** to go back and make the necessary changes. | ||
|
|
||
| 10. The sink will start shortly, and you will observe the status of the sink changing from **Creating** to **Running**. | ||
|
|
||
| 11. Click the name of the changefeed to go to its details page. On this page, you can view more information about the changefeed, including the checkpoint status, replication latency, and other relevant metrics. | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.