sink: use where in operator in delete or update statement (#3788) by ti-chi-bot · Pull Request #4470 · pingcap/ticdc

ti-chi-bot · 2026-03-13T11:15:31Z

This is an automated cherry-pick of #3788

What problem does this PR solve?

Issue Number: close #4121 ref #1645

What is changed and how it works?

Since pingcap/tiflow#8818 was introduced in v7.2.0, ticdc uses where or instead of where in when there exist many rows. But actually, there is a performance regression because there is only an impact on the right side of the IN() predicate when there is one row constructor. For more details, see https://dev.mysql.com/doc/refman/5.7/en/range-optimization.html#row-constructor-range-optimization.

This PR uses where in as the default clause and decreases the downstream CPU usage.
You can specify the whereClause param in sink-uri to back the previous where clause. --sink-uri="mysql://root:123456@127.0.0.1:3306/?where-clause=v1"

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)

Questions

Will it cause performance regression or break compatibility?

Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Reduce downstream TiDB resource consumption by reducing SQL parsing and compilation time

Summary by CodeRabbit

New Features
- Added a configurable WHERE-clause strategy for multi-row UPDATE/DELETE with a new default ("v2") for more stable SQL generation.
Behavior Changes
- Batch WHERE predicates now prefer IN-style clauses consistently.
- Tighter error handling: several invalid states now cause immediate panics and adjusted logging.
Tests
- Expanded unit tests and added benchmarks for multi-row DML generation, covering generated columns, composite keys, and varied batch sizes.

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>

ti-chi-bot · 2026-03-13T11:15:36Z

@wk989898 This PR has conflicts, I have hold it.
Please resolve them or ask others to resolve them, then comment /unhold to remove the hold label.

ti-chi-bot · 2026-03-13T11:15:38Z

@ti-chi-bot: ## If you want to know how to resolve it, please read the guide in TiDB Dev Guide.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

gemini-code-assist · 2026-03-13T11:15:58Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the performance of multi-row DELETE and UPDATE operations by adopting a more efficient SQL generation strategy. By defaulting to the WHERE IN operator for batch DML, it addresses a known performance regression and provides users with the flexibility to configure this behavior. The changes also improve the robustness of the system by tightening error handling for critical internal inconsistencies.

Highlights

Optimized Multi-Row DML Generation: The default WHERE clause strategy for multi-row DELETE and UPDATE statements has been switched from using WHERE OR to the more performant WHERE IN operator (referred to as 'v2'). This change leverages MySQL's optimization for IN() predicates with row constructors, leading to reduced downstream CPU usage.
Configurable WHERE Clause Strategy: A new where-clause parameter has been introduced in the sink URI, allowing users to explicitly choose between the 'v1' (WHERE OR) and 'v2' (WHERE IN) strategies for multi-row DML. The default is now 'v2'.
Robust NULL Handling in WHERE Clauses: The 'v2' WHERE IN strategy automatically falls back to the 'v1' WHERE OR strategy if any key column in the WHERE clause contains a NULL value. This ensures correct matching semantics, as IN predicates do not handle NULLs identically to IS NULL predicates.
Enhanced Error Handling: Several internal log.L().DPanic calls have been replaced with log.Panic, indicating more critical and immediate failure conditions for unexpected states.
Expanded Test Coverage and Benchmarks: New unit tests and benchmarks have been added for multi-row DML generation, specifically covering scenarios with generated columns, composite keys, and varying batch sizes to validate the new WHERE clause strategies.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

pkg/sink/mysql/config.go
- Added whereClause field to Config struct to store the WHERE clause strategy.
- Initialized whereClause with sqlmodel.DefaultWhereClause in New function.
- Implemented getWhereClause function to parse the where-clause parameter from the sink URI.
pkg/sink/mysql/mysql_writer_dml_batch.go
- Modified calls to sqlmodel.GenDeleteSQL and sqlmodel.GenUpdateSQL to pass the configured whereClause.
pkg/sink/mysql/mysql_writer_dml_test.go
- Updated expected SQL strings in various test cases to reflect the new WHERE IN syntax for DELETE statements.
pkg/sink/pulsar/config.go
- Replaced log.L().Debug calls with log.Debug for consistent logging.
pkg/sink/sqlmodel/multi_row.go
- Refactored GenDeleteSQL and GenUpdateSQL to use a strategy pattern based on whereClause ('v1' or 'v2').
- Introduced constants whereClauseV1, whereClauseV2, and DefaultWhereClause.
- Added canUseWhereClauseV2 and hasNilValue functions to determine when the 'v2' strategy is applicable, with fallback to 'v1' for NULL values.
- Implemented genDeleteSQLV2 and genUpdateSQLV2 for the new WHERE IN based SQL generation.
- Moved the GenInsertSQL logic to be a standalone function, removing it from the GenDeleteSQL and GenUpdateSQL context.
pkg/sink/sqlmodel/multi_row_bench_test.go
- Added new benchmark tests for GenUpdateSQL to compare 'v1' and 'v2' performance across different batch sizes and primary key configurations (single and multi-column).
pkg/sink/sqlmodel/multi_row_test.go
- Added new test cases for GenDeleteMultiRows and TestGenUpdateMultiRows to verify the 'v2' WHERE IN behavior.
- Included tests for GenDeleteMultiRowsWithNullFallbackToV1 and TestGenUpdateMultiRowsWithNullFallbackToV1 to confirm correct fallback when NULLs are present.
- Expanded update tests to cover virtual and stored generated columns.
pkg/sink/sqlmodel/multi_row_v1.go
- Added new file containing the genDeleteSQLV1 and genUpdateSQLV1 implementations, which represent the previous WHERE OR based logic.
pkg/sink/sqlmodel/row_change.go
- Replaced log.L().DPanic calls with log.Panic in NewRowChange, calculateType, genDeleteSQL, genUpdateSQL, and GenSQL for more direct error handling.
pkg/sink/sqlmodel/row_change_test.go
- Added new file with comprehensive unit tests for RowChange functionality, including TestNewRowChange, TestRowChangeType, TestGenDelete, TestGenUpdate, TestExpressionIndex, and TestGenInsert.
pkg/sink/sqlmodel/utils.go
- Removed the ColValAsStr utility function.
pkg/sink/sqlmodel/utils_test.go
- Added new file with unit tests for valuesHolder and generatedColumnsNameSet.
pkg/sink/sqlmodel/where_handle.go
- Replaced log.L().DPanic call with log.Panic in getWhereIdxByData.
pkg/sink/sqlmodel/where_handle_test.go
- Added new file with unit tests for GetWhereHandle, allColsNotNull, and getWhereIdxByData.
pkg/workerpool/pool_impl.go
- Replaced log.L().DPanic call with log.Panic in run method for cancellation tasks.

Activity

This pull request is an automated cherry-pick of sink: use where in operator in delete or update statement #3788, indicating that the changes originated from another branch or repository.
The pull request includes manual test results with screenshots demonstrating the performance improvement and reduced CPU usage after applying the changes.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-03-13T11:16:38Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 39294b7a-4945-4998-9095-bb6539188b59

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

The pull request introduces a new DML batching mechanism for the MySQL sink, refactoring existing logic into mysql_writer_dml_batch.go to support three levels of batching: normal, per-event, and cross-event. A key enhancement is the introduction of a configurable where-clause strategy for multi-row UPDATE/DELETE statements, allowing for a more performant (col1,col2) IN ((?,?),(?,?)) format (v2) which falls back to the (... ) OR (... ) form (v1) when key columns contain NULL values. This new strategy is integrated into the sink configuration and tested. A review comment highlights a potential inaccuracy in the code's complexity analysis, noting that the generateBatchSQLInSafeMode might be more efficient than stated, while buildRowChangesForUnSafeBatch could have a higher complexity than implied.

gemini-code-assist · 2026-03-13T11:20:03Z

+// Considering the batch algorithm in safe mode is O(n^3), which n is the number of rows.
+// So we need to limit the number of rows in one batch to avoid performance issues.


The comment states that the safe mode batch algorithm is O(n^3). However, the implementation of generateBatchSQLInSafeMode uses a map to group rows, which appears to be more efficient than O(n^3). Conversely, the buildRowChangesForUnSafeBatch function for unsafe mode has a nested loop structure that could potentially approach O(n^3) complexity in worst-case scenarios. Please verify and update the comment for clarity, as it might be outdated or swapped.

ti-chi-bot · 2026-03-13T11:22:34Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from wk989898. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: wk989898 <nhsmwk@gmail.com>

wk989898 · 2026-03-17T07:59:13Z

/test all

Signed-off-by: wk989898 <nhsmwk@gmail.com>

wk989898 · 2026-03-19T10:39:21Z

/test all

ti-chi-bot · 2026-03-19T10:40:11Z

@ti-chi-bot: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-error-log-review	`f3aec0b`	link	true	`/test pull-error-log-review`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

This is an automated cherry-pick of pingcap#3788

9849480

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>

ti-chi-bot bot added the do-not-merge/cherry-pick-not-approved label Mar 13, 2026

ti-chi-bot mentioned this pull request Mar 13, 2026

sink: use where in operator in delete or update statement #3788

Merged

ti-chi-bot assigned wk989898 Mar 13, 2026

gemini-code-assist bot reviewed Mar 13, 2026

View reviewed changes

Update config.go

fdb3c29

wk989898 added 3 commits March 13, 2026 19:31

Update config.go

2f4bf97

Fix syntax by adding a comma after SlowQuery

6a3d382

update

cc502ae

Signed-off-by: wk989898 <nhsmwk@gmail.com>

update

f3aec0b

Signed-off-by: wk989898 <nhsmwk@gmail.com>

wk989898 merged commit 0864fb7 into pingcap:release-8.5 Mar 19, 2026
16 of 20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sink: use where in operator in delete or update statement (#3788)#4470

sink: use where in operator in delete or update statement (#3788)#4470
wk989898 merged 6 commits intopingcap:release-8.5from
ti-chi-bot:cherry-pick-3788-to-release-8.5

ti-chi-bot commented Mar 13, 2026 •

edited by wk989898

Loading

Uh oh!

ti-chi-bot commented Mar 13, 2026

Uh oh!

ti-chi-bot bot commented Mar 13, 2026

Uh oh!

gemini-code-assist bot commented Mar 13, 2026

Uh oh!

coderabbitai bot commented Mar 13, 2026 •

edited

Loading

Review skipped

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 13, 2026

Uh oh!

ti-chi-bot bot commented Mar 13, 2026

Uh oh!

wk989898 commented Mar 17, 2026

Uh oh!

wk989898 commented Mar 19, 2026

Uh oh!

ti-chi-bot bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		// Considering the batch algorithm in safe mode is O(n^3), which n is the number of rows.
		// So we need to limit the number of rows in one batch to avoid performance issues.

Conversation

ti-chi-bot commented Mar 13, 2026 • edited by wk989898 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What is changed and how it works?

Check List

Tests

Questions

Will it cause performance regression or break compatibility?

Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Summary by CodeRabbit

Uh oh!

ti-chi-bot commented Mar 13, 2026

Uh oh!

ti-chi-bot bot commented Mar 13, 2026

Uh oh!

gemini-code-assist bot commented Mar 13, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

ti-chi-bot bot commented Mar 13, 2026

Uh oh!

wk989898 commented Mar 17, 2026

Uh oh!

wk989898 commented Mar 19, 2026

Uh oh!

ti-chi-bot bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ti-chi-bot commented Mar 13, 2026 •

edited by wk989898

Loading

coderabbitai bot commented Mar 13, 2026 •

edited

Loading

ti-chi-bot bot commented Mar 19, 2026 •

edited

Loading