sink: support batch dml with virtual column (#3787) by ti-chi-bot · Pull Request #4299 · pingcap/ticdc

ti-chi-bot · 2026-02-27T10:36:29Z

This is an automated cherry-pick of #3787

What problem does this PR solve?

Issue Number: close #3294 close #4184

What is changed and how it works?

Batch DML for Virtual Columns: Batch Data Manipulation Language (DML) operations are now supported for tables that include virtual columns, removing a previous limitation.
Updated Batch SQL Generation Logic: The shouldGenBatchSQL function has been modified to no longer consider the presence of virtual columns as a disqualifier for generating batch DML, streamlining the process.
Refined Generated Column Handling: Generated columns (including virtual ones) are now explicitly skipped when preparing SQL arguments and constructing WHERE clauses, ensuring correct DML execution without interference from computed values.
Simplified Batching Logic: The SameTypeTargetAndColumns function has been removed, which may indicate a simplification or refactoring of the logic used to determine if multiple row changes can be merged into a multi-value DML.

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No code

Questions

Will it cause performance regression or break compatibility?

Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Please refer to [Release Notes Language Style Guide](https://pingcap.github.io/tidb-dev-guide/contribute-to-tidb/release-notes-style-guide.html) to write a quality release note.

If you don't think this PR needs a release note then fill it with `None`.

Summary by CodeRabbit

Bug Fixes
- Prevented writing nil-derived data during partition indexing.
- Excluded virtual/generated columns from WHERE and index-key construction to avoid incorrect DMLs.
- Batch DML generation now skips multi-row batches when handle-key columns are virtual.
- Stricter handling of unexpected nil column values (now fails loudly via panic).
New Features
- Improved handling for unique indexes involving stored and generated columns.
Refactor
- Removed obsolete multi-row mergability logic.
Tests
- Refactored and expanded tests to cover stored and virtual generated column scenarios.

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>

ti-chi-bot · 2026-02-27T10:36:34Z

This cherry pick PR is for a release branch and has not yet been approved by triage owners.
Adding the do-not-merge/cherry-pick-not-approved label.

To merge this cherry pick:

It must be LGTMed and approved by the reviewers firstly.
For pull requests to TiDB-x branches, it must have no failed tests.
AFTER it has lgtm and approved labels, please wait for the cherry-pick merging approval from triage owners.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

ti-chi-bot · 2026-02-27T10:36:34Z

@wk989898 This PR has conflicts, I have hold it.
Please resolve them or ask others to resolve them, then comment /unhold to remove the hold label.

ti-chi-bot · 2026-02-27T10:36:36Z

@ti-chi-bot: ## If you want to know how to resolve it, please read the guide in TiDB Dev Guide.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

gemini-code-assist · 2026-02-27T10:37:03Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the sink component by enabling batch DML operations for tables that include virtual columns. It refines the logic for generating batch SQL, ensuring that virtual generated columns are correctly handled and excluded from critical DML components like WHERE clauses and index key constructions. These changes streamline data manipulation for tables with generated columns, improving efficiency and correctness across the system.

Highlights

Batch DML Support for Virtual Columns: Enabled batch Data Manipulation Language (DML) operations for tables that include virtual columns, removing a previous limitation.
Refined Batch SQL Generation Logic: Modified the shouldGenBatchSQL function to prevent batching if the handle key contains virtual generated columns, ensuring correctness while still allowing batch DML for other virtual column scenarios.
Generated Column Handling in DML: Ensured generated columns (both virtual and stored) are explicitly skipped when preparing SQL arguments and constructing WHERE clauses, preventing interference from computed values during DML execution.
Simplified Batching Logic: Removed the SameTypeTargetAndColumns function, streamlining the logic used to determine if multiple row changes can be merged into a multi-value DML.
Stricter Nil Value Handling: Introduced stricter handling for unexpected nil column values, causing a panic if encountered during string conversion.
Partition Indexing Robustness: Prevented writing nil-derived data during partition indexing by explicitly checking for nil column values.
Enhanced Test Coverage: Expanded integration tests to cover various scenarios involving stored and virtual generated columns, ensuring the robustness of the new DML handling.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

downstreamadapter/sink/eventrouter/partition/index_value.go
- Added a check for nil values when extracting column values for handle keys during partition index generation, skipping nil values to prevent nil-derived data.
pkg/common/column.go
- Imported the log package.
- Modified ColumnValueString to panic if a nil value is passed, enforcing stricter nil handling.
pkg/common/table_info_helper.go
- Removed the IsColCDCVisible check when initializing index column offsets, allowing all index columns to be considered.
pkg/sink/mysql/mysql_writer_dml.go
- Refactored shouldGenBatchSQL to accept tableInfo directly.
- Added logic to shouldGenBatchSQL to prevent batching if the handle key contains virtual generated columns.
- Introduced genActiveActiveSQL to handle active-active SQL generation based on batching conditions.
pkg/sink/mysql/mysql_writer_dml_test.go
- Updated TestShouldGenBatchSQL to use tableInfo objects in test cases instead of separate boolean flags.
- Added new test cases to specifically cover scenarios with virtual generated columns in handle keys, ensuring correct batching behavior.
pkg/sink/mysql/sql_builder.go
- Modified whereSlice to explicitly exclude virtual generated columns from the WHERE clause construction, both for handle keys and when using all columns.
pkg/sink/sqlmodel/multi_row.go
- Removed the SameTypeTargetAndColumns function, simplifying the logic for merging row changes.
pkg/sink/sqlmodel/row_change.go
- Updated whereColumnsAndValues to filter out virtual generated columns from the column names and values used in WHERE clauses.
- Added a panic if the number of column names and values do not match after filtering, indicating an internal inconsistency.
pkg/sink/sqlmodel/utils.go
- Modified getColsAndValuesOfIdx to exclude virtual generated columns when retrieving columns and values for an index.
pkg/sink/sqlmodel/where_handle.go
- Updated allColsNotNull to consider virtual generated columns as not suitable for NOT NULL checks in indexes.
tests/integration_tests/generate_column/data/stored.sql
- Expanded test data by renaming t2 to s1 and adding new tables s2 through s5 with various stored generated column configurations and DML operations.
tests/integration_tests/generate_column/data/virtual.sql
- Expanded test data by renaming t1 to v1 and adding new tables v2 through v6 with various virtual generated column configurations and DML operations.
tests/integration_tests/generate_column/run.sh
- Updated check_table_exists calls to reflect the new table names (v6 and s5) in the integration tests, ensuring proper validation of generated column replication.

Activity

The pull request was created as an automated cherry-pick of sink: support batch dml with virtual column #3787.
The CodeRabbit AI provided a summary categorizing changes into bug fixes, new features, refactoring, and tests, offering a structured overview of the modifications.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-02-27T10:37:35Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request aims to enable batch DML operations for tables with virtual columns, refining the logic in shouldGenBatchSQL and updating SQL generation and data modeling functions to correctly skip virtual columns. While the core logic appears sound, a critical security concern was identified in pkg/sink/sqlmodel/row_change.go where raw database values are logged without redaction during a panic, potentially leaking sensitive information. Additionally, the PR contains unresolved merge conflict markers in pkg/sink/mysql/mysql_writer_dml.go, which will prevent compilation and must be resolved.

gemini-code-assist · 2026-02-27T10:39:40Z

+<<<<<<< HEAD
 				queryList, argsList = w.generateBatchSQL(eventsInGroup)
+=======
+				if !w.shouldGenBatchSQL(tableInfo, eventsInGroup) {
+					queryList, argsList = w.generateNormalSQLs(eventsInGroup)
+				} else {
+					queryList, argsList = w.generateBatchSQL(eventsInGroup)
+				}
+>>>>>>> f420f1aed (sink: support batch dml with virtual column (#3787))


Unresolved merge markers (<<<<<<< HEAD, =======, >>>>>>>) are detected in this block, which will cause compilation errors and indicates an incomplete merge process. The logic for deciding whether to generate batch SQL is duplicated and incorrectly structured due to this conflict. Please resolve this by replacing the entire conflicting block with the intended logic.

if !w.shouldGenBatchSQL(tableInfo, eventsInGroup) { queryList, argsList = w.generateNormalSQLs(eventsInGroup) } else { queryList, argsList = w.generateBatchSQL(eventsInGroup) }

gemini-code-assist · 2026-02-27T10:39:40Z

+<<<<<<< HEAD
+=======
+func (w *Writer) genActiveActiveSQL(tableInfo *common.TableInfo, eventsInGroup []*commonEvent.DMLEvent) ([]string, [][]interface{}) {
+	if !w.shouldGenBatchSQL(tableInfo, eventsInGroup) {
+		return w.generateActiveActiveNormalSQLs(eventsInGroup)
+	}
+	return w.generateActiveActiveBatchSQL(eventsInGroup)
+}
+
+>>>>>>> f420f1aed (sink: support batch dml with virtual column (#3787))


Unresolved merge markers (<<<<<<< HEAD, >>>>>>>) are detected in this block, which will cause compilation errors. This appears to be a merge conflict artifact, and the introduced genActiveActiveSQL function seems unrelated to the PR's scope. This block should be removed to resolve the conflict and maintain the focus of the PR.

func (w *Writer) genActiveActiveSQL(tableInfo *common.TableInfo, eventsInGroup []*commonEvent.DMLEvent) ([]string, [][]interface{}) { if !w.shouldGenBatchSQL(tableInfo, eventsInGroup) { return w.generateActiveActiveNormalSQLs(eventsInGroup) } return w.generateActiveActiveBatchSQL(eventsInGroup) }

gemini-code-assist · 2026-02-27T10:39:40Z

+		log.Panic("columnNames are not equal columnValues", zap.Int("len(columnNames)", len(columnNames)), zap.Int("len(values)", len(columnValues)),
+			zap.Any("columnNames", columnNames), zap.Any("columnValues", columnValues), zap.Any("table", r.targetTable))


The log.Panic call logs raw column values (columnValues) which may contain sensitive information (PII). In production environments, logging raw database values can lead to data leakage in log files. Use util.RedactAny to mask sensitive data before logging.

Suggested change

log.Panic("columnNames are not equal columnValues", zap.Int("len(columnNames)", len(columnNames)), zap.Int("len(values)", len(columnValues)),

zap.Any("columnNames", columnNames), zap.Any("columnValues", columnValues), zap.Any("table", r.targetTable))

log.Panic("columnNames are not equal columnValues", zap.Int("len(columnNames)", len(columnNames)), zap.Int("len(columnValues)", len(columnValues)),

zap.Any("columnNames", columnNames), zap.Any("columnValues", util.RedactAny(columnValues)), zap.Any("table", r.targetTable))

Signed-off-by: wk989898 <nhsmwk@gmail.com>

ti-chi-bot · 2026-02-27T15:32:29Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wk989898

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [wk989898]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ti-chi-bot · 2026-02-27T15:33:27Z

@ti-chi-bot: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-error-log-review	`e0495f1`	link	true	`/test pull-error-log-review`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

This is an automated cherry-pick of pingcap#3787

0ebefce

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>

ti-chi-bot mentioned this pull request Feb 27, 2026

sink: support batch dml with virtual column #3787

Merged

ti-chi-bot Bot added the do-not-merge/cherry-pick-not-approved label Feb 27, 2026

ti-chi-bot assigned wk989898 Feb 27, 2026

gemini-code-assist Bot reviewed Feb 27, 2026

View reviewed changes

update

e0495f1

Signed-off-by: wk989898 <nhsmwk@gmail.com>

wk989898 approved these changes Feb 27, 2026

View reviewed changes

ti-chi-bot Bot added the approved label Feb 27, 2026

wk989898 removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 27, 2026

wk989898 merged commit af96ae3 into pingcap:release-8.5 Feb 28, 2026
7 of 11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sink: support batch dml with virtual column (#3787)#4299

sink: support batch dml with virtual column (#3787)#4299
wk989898 merged 2 commits into
pingcap:release-8.5from
ti-chi-bot:cherry-pick-3787-to-release-8.5

ti-chi-bot commented Feb 27, 2026

Uh oh!

ti-chi-bot Bot commented Feb 27, 2026

Uh oh!

ti-chi-bot commented Feb 27, 2026

Uh oh!

ti-chi-bot Bot commented Feb 27, 2026

Uh oh!

gemini-code-assist Bot commented Feb 27, 2026

Uh oh!

coderabbitai Bot commented Feb 27, 2026

Review skipped

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Feb 27, 2026

Uh oh!

gemini-code-assist Bot Feb 27, 2026

Uh oh!

gemini-code-assist Bot Feb 27, 2026

Uh oh!

ti-chi-bot Bot commented Feb 27, 2026

Uh oh!

ti-chi-bot Bot commented Feb 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		log.Panic("columnNames are not equal columnValues", zap.Int("len(columnNames)", len(columnNames)), zap.Int("len(values)", len(columnValues)),
		zap.Any("columnNames", columnNames), zap.Any("columnValues", columnValues), zap.Any("table", r.targetTable))

Conversation

ti-chi-bot commented Feb 27, 2026

What problem does this PR solve?

What is changed and how it works?

Check List

Tests

Questions

Will it cause performance regression or break compatibility?

Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Summary by CodeRabbit

Uh oh!

ti-chi-bot Bot commented Feb 27, 2026

Uh oh!

ti-chi-bot commented Feb 27, 2026

Uh oh!

ti-chi-bot Bot commented Feb 27, 2026

Uh oh!

gemini-code-assist Bot commented Feb 27, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai Bot commented Feb 27, 2026

Review skipped

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

ti-chi-bot Bot commented Feb 27, 2026

Uh oh!

ti-chi-bot Bot commented Feb 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants