Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

limit the maximum number of cached txns in mysql worker #10896

Closed
CharlesCheung96 opened this issue Apr 10, 2024 · 0 comments · Fixed by #10892
Closed

limit the maximum number of cached txns in mysql worker #10896

CharlesCheung96 opened this issue Apr 10, 2024 · 0 comments · Fixed by #10892
Assignees
Labels
affects-6.5 affects-7.1 affects-7.5 affects-8.1 area/ticdc Issues or PRs related to TiCDC. component/sink Sink component. severity/moderate This is a moderate bug. type/enhancement This is a enhancement PR

Comments

@CharlesCheung96
Copy link
Contributor

CharlesCheung96 commented Apr 10, 2024

Is your feature request related to a problem?

ref sink to mysql (cdc) workload skew issue

PR #10376 tries to fix the skew problem by sending a transaction to a random worker after the depended transactions are executed. For conflicting transactions, only one transaction can be executed among all workers at a time, which can also be called serial execution or one by one. During synchronous real-time streaming, conflicting transactions are executed serially in the upstream cluster, so it is a reasonable choice for TiCDC to execute these transactions serially.

Txn N(row1)... ------> Txn C(row1)------> Txn B(row1)------> Txn A(row1) 
                                                                    |
                                                                    |-----> worker 1

However, for other common scenarios, this approach can be problematic:

  1. Incremental Scan: during the synchronization of historical data, if conflicting transactions are consistently executed in the upstream cluster. Then the MySQL sink needs at least twice the serial throughput to catch up with the latency. However, the serial execution never satisfies this condition.
Incremental Scan: Txn N(row1)... ------> Txn C(row1)------> Txn B(row1)------> Txn A(row1) 
                                                                                       |
                                                                                       |-----> worker 1
                                                                                       | 
Real-time streaming: Txn N(row1)... ------> Txn C(row1)------> Txn B(row1)------> Txn A(row1) 
  1. Cross regional replication: in this scenario, the throughput of a single MySQL worker is limited by network latency, so the throughput of executing transactions one by one may be much smaller than that of the upstream cluster.

New Proposal

It is better to use a compromise optimization that replaces one by one with batch by batch:

  1. The batch mechanism can effectively improve the throughput of a single worker, so it is necessary to preserve the fast dependencies resolving optimization in conflict detector.

  2. At the same time, to avoid workload skew problems, we could limit the maximum number of cached txns in single worker. When the limit is exceeded, the conflict detector should wait for all transactions cached in the worker to complete before sending a new event to it.

@CharlesCheung96 CharlesCheung96 added type/feature Issues about a new feature component/sink Sink component. type/enhancement This is a enhancement PR area/ticdc Issues or PRs related to TiCDC. and removed type/feature Issues about a new feature labels Apr 10, 2024
@CharlesCheung96 CharlesCheung96 added type/bug This is a bug. and removed type/enhancement This is a enhancement PR labels Apr 18, 2024
@github-actions github-actions bot added this to Need Triage in Question and Bug Reports Apr 18, 2024
@CharlesCheung96 CharlesCheung96 added the severity/moderate This is a moderate bug. label Apr 18, 2024
Question and Bug Reports automation moved this from Need Triage to Done Apr 22, 2024
@flowbehappy flowbehappy added type/enhancement This is a enhancement PR and removed type/bug This is a bug. labels Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-6.5 affects-7.1 affects-7.5 affects-8.1 area/ticdc Issues or PRs related to TiCDC. component/sink Sink component. severity/moderate This is a moderate bug. type/enhancement This is a enhancement PR
Development

Successfully merging a pull request may close this issue.

2 participants