Skip to content

sink: use where in operator in delete or update statement (#3788)#4470

Merged
wk989898 merged 6 commits intopingcap:release-8.5from
ti-chi-bot:cherry-pick-3788-to-release-8.5
Mar 19, 2026
Merged

sink: use where in operator in delete or update statement (#3788)#4470
wk989898 merged 6 commits intopingcap:release-8.5from
ti-chi-bot:cherry-pick-3788-to-release-8.5

Conversation

@ti-chi-bot
Copy link
Copy Markdown
Member

@ti-chi-bot ti-chi-bot commented Mar 13, 2026

This is an automated cherry-pick of #3788

What problem does this PR solve?

Issue Number: close #4121 ref #1645

What is changed and how it works?

Since pingcap/tiflow#8818 was introduced in v7.2.0, ticdc uses where or instead of where in when there exist many rows. But actually, there is a performance regression because there is only an impact on the right side of the IN() predicate when there is one row constructor. For more details, see https://dev.mysql.com/doc/refman/5.7/en/range-optimization.html#row-constructor-range-optimization.

This PR uses where in as the default clause and decreases the downstream CPU usage.
You can specify the whereClause param in sink-uri to back the previous where clause. --sink-uri="mysql://root:123456@127.0.0.1:3306/?where-clause=v1"

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
image image

Questions

Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Reduce downstream TiDB resource consumption by reducing SQL parsing and compilation time

Summary by CodeRabbit

  • New Features

    • Added a configurable WHERE-clause strategy for multi-row UPDATE/DELETE with a new default ("v2") for more stable SQL generation.
  • Behavior Changes

    • Batch WHERE predicates now prefer IN-style clauses consistently.
    • Tighter error handling: several invalid states now cause immediate panics and adjusted logging.
  • Tests

    • Expanded unit tests and added benchmarks for multi-row DML generation, covering generated columns, composite keys, and varied batch sizes.

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot ti-chi-bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. type/cherry-pick-for-release-8.5 This PR is cherry-picked to release-8.5 from a source PR. labels Mar 13, 2026
@ti-chi-bot
Copy link
Copy Markdown
Member Author

@wk989898 This PR has conflicts, I have hold it.
Please resolve them or ask others to resolve them, then comment /unhold to remove the hold label.

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Mar 13, 2026

@ti-chi-bot: ## If you want to know how to resolve it, please read the guide in TiDB Dev Guide.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the performance of multi-row DELETE and UPDATE operations by adopting a more efficient SQL generation strategy. By defaulting to the WHERE IN operator for batch DML, it addresses a known performance regression and provides users with the flexibility to configure this behavior. The changes also improve the robustness of the system by tightening error handling for critical internal inconsistencies.

Highlights

  • Optimized Multi-Row DML Generation: The default WHERE clause strategy for multi-row DELETE and UPDATE statements has been switched from using WHERE OR to the more performant WHERE IN operator (referred to as 'v2'). This change leverages MySQL's optimization for IN() predicates with row constructors, leading to reduced downstream CPU usage.
  • Configurable WHERE Clause Strategy: A new where-clause parameter has been introduced in the sink URI, allowing users to explicitly choose between the 'v1' (WHERE OR) and 'v2' (WHERE IN) strategies for multi-row DML. The default is now 'v2'.
  • Robust NULL Handling in WHERE Clauses: The 'v2' WHERE IN strategy automatically falls back to the 'v1' WHERE OR strategy if any key column in the WHERE clause contains a NULL value. This ensures correct matching semantics, as IN predicates do not handle NULLs identically to IS NULL predicates.
  • Enhanced Error Handling: Several internal log.L().DPanic calls have been replaced with log.Panic, indicating more critical and immediate failure conditions for unexpected states.
  • Expanded Test Coverage and Benchmarks: New unit tests and benchmarks have been added for multi-row DML generation, specifically covering scenarios with generated columns, composite keys, and varying batch sizes to validate the new WHERE clause strategies.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • pkg/sink/mysql/config.go
    • Added whereClause field to Config struct to store the WHERE clause strategy.
    • Initialized whereClause with sqlmodel.DefaultWhereClause in New function.
    • Implemented getWhereClause function to parse the where-clause parameter from the sink URI.
  • pkg/sink/mysql/mysql_writer_dml_batch.go
    • Modified calls to sqlmodel.GenDeleteSQL and sqlmodel.GenUpdateSQL to pass the configured whereClause.
  • pkg/sink/mysql/mysql_writer_dml_test.go
    • Updated expected SQL strings in various test cases to reflect the new WHERE IN syntax for DELETE statements.
  • pkg/sink/pulsar/config.go
    • Replaced log.L().Debug calls with log.Debug for consistent logging.
  • pkg/sink/sqlmodel/multi_row.go
    • Refactored GenDeleteSQL and GenUpdateSQL to use a strategy pattern based on whereClause ('v1' or 'v2').
    • Introduced constants whereClauseV1, whereClauseV2, and DefaultWhereClause.
    • Added canUseWhereClauseV2 and hasNilValue functions to determine when the 'v2' strategy is applicable, with fallback to 'v1' for NULL values.
    • Implemented genDeleteSQLV2 and genUpdateSQLV2 for the new WHERE IN based SQL generation.
    • Moved the GenInsertSQL logic to be a standalone function, removing it from the GenDeleteSQL and GenUpdateSQL context.
  • pkg/sink/sqlmodel/multi_row_bench_test.go
    • Added new benchmark tests for GenUpdateSQL to compare 'v1' and 'v2' performance across different batch sizes and primary key configurations (single and multi-column).
  • pkg/sink/sqlmodel/multi_row_test.go
    • Added new test cases for GenDeleteMultiRows and TestGenUpdateMultiRows to verify the 'v2' WHERE IN behavior.
    • Included tests for GenDeleteMultiRowsWithNullFallbackToV1 and TestGenUpdateMultiRowsWithNullFallbackToV1 to confirm correct fallback when NULLs are present.
    • Expanded update tests to cover virtual and stored generated columns.
  • pkg/sink/sqlmodel/multi_row_v1.go
    • Added new file containing the genDeleteSQLV1 and genUpdateSQLV1 implementations, which represent the previous WHERE OR based logic.
  • pkg/sink/sqlmodel/row_change.go
    • Replaced log.L().DPanic calls with log.Panic in NewRowChange, calculateType, genDeleteSQL, genUpdateSQL, and GenSQL for more direct error handling.
  • pkg/sink/sqlmodel/row_change_test.go
    • Added new file with comprehensive unit tests for RowChange functionality, including TestNewRowChange, TestRowChangeType, TestGenDelete, TestGenUpdate, TestExpressionIndex, and TestGenInsert.
  • pkg/sink/sqlmodel/utils.go
    • Removed the ColValAsStr utility function.
  • pkg/sink/sqlmodel/utils_test.go
    • Added new file with unit tests for valuesHolder and generatedColumnsNameSet.
  • pkg/sink/sqlmodel/where_handle.go
    • Replaced log.L().DPanic call with log.Panic in getWhereIdxByData.
  • pkg/sink/sqlmodel/where_handle_test.go
    • Added new file with unit tests for GetWhereHandle, allColsNotNull, and getWhereIdxByData.
  • pkg/workerpool/pool_impl.go
    • Replaced log.L().DPanic call with log.Panic in run method for cancellation tasks.
Activity
  • This pull request is an automated cherry-pick of sink: use where in operator in delete or update statement #3788, indicating that the changes originated from another branch or repository.
  • The pull request includes manual test results with screenshots demonstrating the performance improvement and reduced CPU usage after applying the changes.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 13, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 39294b7a-4945-4998-9095-bb6539188b59

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces a new DML batching mechanism for the MySQL sink, refactoring existing logic into mysql_writer_dml_batch.go to support three levels of batching: normal, per-event, and cross-event. A key enhancement is the introduction of a configurable where-clause strategy for multi-row UPDATE/DELETE statements, allowing for a more performant (col1,col2) IN ((?,?),(?,?)) format (v2) which falls back to the (... ) OR (... ) form (v1) when key columns contain NULL values. This new strategy is integrated into the sink configuration and tested. A review comment highlights a potential inaccuracy in the code's complexity analysis, noting that the generateBatchSQLInSafeMode might be more efficient than stated, while buildRowChangesForUnSafeBatch could have a higher complexity than implied.

Comment on lines +191 to +192
// Considering the batch algorithm in safe mode is O(n^3), which n is the number of rows.
// So we need to limit the number of rows in one batch to avoid performance issues.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment states that the safe mode batch algorithm is O(n^3). However, the implementation of generateBatchSQLInSafeMode uses a map to group rows, which appears to be more efficient than O(n^3). Conversely, the buildRowChangesForUnSafeBatch function for unsafe mode has a nested loop structure that could potentially approach O(n^3) complexity in worst-case scenarios. Please verify and update the comment for clarity, as it might be outdated or swapped.

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Mar 13, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from wk989898. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@wk989898
Copy link
Copy Markdown
Collaborator

/test all

Signed-off-by: wk989898 <nhsmwk@gmail.com>
@wk989898
Copy link
Copy Markdown
Collaborator

/test all

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Mar 19, 2026

@ti-chi-bot: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-error-log-review f3aec0b link true /test pull-error-log-review

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@wk989898 wk989898 merged commit 0864fb7 into pingcap:release-8.5 Mar 19, 2026
16 of 20 checks passed
@ti-chi-bot ti-chi-bot bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. cherry-pick-approved Cherry pick PR approved by release team. and removed release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/cherry-pick-not-approved labels Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-pick-approved Cherry pick PR approved by release team. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. type/cherry-pick-for-release-8.5 This PR is cherry-picked to release-8.5 from a source PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants