Skip to content

ddl, table: skip pre-check SELECT for widening modify-column changes#67994

Open
joechenrh wants to merge 1 commit intopingcap:masterfrom
joechenrh:feature/modify-column-reorg-optimization
Open

ddl, table: skip pre-check SELECT for widening modify-column changes#67994
joechenrh wants to merge 1 commit intopingcap:masterfrom
joechenrh:feature/modify-column-reorg-optimization

Conversation

@joechenrh
Copy link
Copy Markdown
Contributor

@joechenrh joechenrh commented Apr 23, 2026

What problem does this PR solve?

Issue Number: ref #68013

Problem Summary:

ALTER TABLE ... MODIFY COLUMN on the ModifyTypeIndexReorg path runs a synchronous pre-check SELECT in StateDeleteOnly that has no timeout, pins the cluster GC safepoint for its whole duration, blocks subsequent DDL on the same table, and is invisible to SHOW PROCESSLIST. For widening changes (for example char(60) -> varchar(70)) this scan is pure overhead — no existing row can violate.

What changed and how does it work?

ModifyTypeIndexReorg is only reachable via isCharChange in getModifyColumnType. Gate the pre-check SELECT in doModifyColumnIndexReorg's StateDeleteOnly branch on isCharChange(oldCol, newCol) && newCol.GetFlen() >= oldCol.GetFlen(); skip it for widening, keep it otherwise.

Add a NULL guard in castIndexValuesToChangingTypes right after the strict cast: Datum.ConvertTo passes NULL through, so when the new type carries NotNullFlag the guard rejects existing NULL rows during reorg. This keeps NULL -> NOT NULL safe on the widening skip path.

Add a checkModifyColumnDataEntry failpoint at the top of checkModifyColumnData so tests can observe whether the pre-check was reached.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test

Three unit tests cover: widening skip (pre-check not entered), narrowing with an oversized row (pre-check catches), and reorg-phase rollback via errorMockPanic.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Speed up `ALTER TABLE ... MODIFY COLUMN` jobs that only widen a char/varchar column on a column with a secondary index, by skipping the previously-required pre-check SELECT. Eliminates a multi-minute GC safepoint pin on large tables.

@ti-chi-bot ti-chi-bot Bot added do-not-merge/needs-linked-issue release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 23, 2026
@tiprow
Copy link
Copy Markdown

tiprow Bot commented Apr 23, 2026

Hi @joechenrh. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 23, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Conditionally skips data pre-validation during MODIFY COLUMN index reorganization for certain char/varchar flen widenings, adds a failpoint at check entry, tightens NULL handling during index casting, and adds five tests covering reorg, rollback, and widening behaviors.

Changes

Cohort / File(s) Summary
Modify Column Core
pkg/ddl/modify_column.go
Adds a failpoint at start of checkModifyColumnData. Updates doModifyColumnIndexReorg to bypass checkModifyColumnData SELECT/validation for char/varchar flen widening while preserving existing rollback/tracing behavior for other cases.
Modify Column Tests
pkg/ddl/modify_column_test.go
Adds five tests targeting ModifyTypeIndexReorg for indexed string columns: injected reorg panic -> full rollback, narrowing overflow failure, narrowing NULL→NOT NULL failure, widening NULL→NOT NULL (pre-check skipped but reorg casting fails), and a successful widening where pre-check is skipped and final state is validated.
Index Casting Validation
pkg/table/tables/index.go
castIndexValuesToChangingTypes now returns ErrColumnCantNull if a cast yields NULL while the changing field type is NOT NULL, preventing invalid NULLs during index casting.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

size/M, ok-to-test, approved, lgtm

Suggested reviewers

  • wjhuang2016
  • tiancaiamao
  • xhebox

Poem

🐰 I hopped through columns, wide and narrow,
Skipping checks where growth would sparrow,
But NULLs I guard with nose so keen,
Tests for rollback, and a happy green! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: skipping pre-check SELECT for widening modify-column changes, which is the core optimization in this PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description includes all required sections: problem statement with issue reference, what changed, test coverage with checklist, side effects assessment, and release notes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@joechenrh joechenrh force-pushed the feature/modify-column-reorg-optimization branch from 713e6a2 to d74a639 Compare April 23, 2026 07:00
@ti-chi-bot ti-chi-bot Bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 23, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 23, 2026

Codecov Report

❌ Patch coverage is 35.00000% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.6992%. Comparing base (52d01c6) to head (46da87f).
⚠️ Report is 20 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #67994        +/-   ##
================================================
+ Coverage   77.7904%   78.6992%   +0.9087%     
================================================
  Files          1982       2002        +20     
  Lines        549742     559286      +9544     
================================================
+ Hits         427647     440154     +12507     
+ Misses       121175     116877      -4298     
- Partials        920       2255      +1335     
Flag Coverage Δ
integration 46.9450% <35.0000%> (+7.1478%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 60.4888% <ø> (ø)
parser ∅ <ø> (∅)
br 66.0671% <ø> (+2.9722%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/ddl/modify_column_test.go`:
- Around line 785-815: The test inserts a value ('too_long_to_fit_varchar_10')
that's longer than the source CHAR(20) and can fail before the ALTER reaches the
index-reorg cast; replace that string with a value between 11 and 20 bytes
(e.g., 11–20 ASCII chars) so it fits CHAR(20) but exceeds VARCHAR(10), keeping
the rest of the setup (the insert, the ALTER in castIndexValuesToChangingTypes
path, and the testfailpoint that captures gotTp) unchanged; update the inserted
tuple in the test block that currently inserts into t and assert behavior as
before.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 1241af33-889b-4746-9cdd-84980727de51

📥 Commits

Reviewing files that changed from the base of the PR and between 52d01c6 and d74a639.

📒 Files selected for processing (3)
  • pkg/ddl/modify_column.go
  • pkg/ddl/modify_column_test.go
  • pkg/table/tables/index.go

Comment thread pkg/ddl/modify_column_test.go Outdated
@joechenrh joechenrh force-pushed the feature/modify-column-reorg-optimization branch from d74a639 to 7ece496 Compare April 23, 2026 07:58
@ti-chi-bot ti-chi-bot Bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 23, 2026
@joechenrh joechenrh changed the title ddl, table: skip pre-check SELECT for modify-column index-reorg path ddl, table: skip pre-check SELECT for widening modify-column changes Apr 23, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
pkg/ddl/modify_column_test.go (1)

776-806: ⚠️ Potential issue | 🔴 Critical

Use a value that fits CHAR(20) but not VARCHAR(10).

'too_long_to_fit_varchar_10' is 26 bytes, so the setup insert can fail before the ALTER reaches the intended modify-column path. Use an 11–20 byte value to isolate the target-cast failure.

🐛 Proposed test fixture fix
-	// 'too_long_to_fit_varchar_10' is 26 bytes, over the target varchar(10)
-	// limit. The cast in castIndexValuesToChangingTypes must reject it.
-	tk.MustExec("insert into t values ('ok'), ('too_long_to_fit_varchar_10')")
+	// 'too_long_11' fits the source char(20), but is over the target
+	// varchar(10) limit. The cast must reject it during modify-column
+	// validation.
+	tk.MustExec("insert into t values ('ok'), ('too_long_11')")
@@
 	tk.MustExec("admin check table t")
 	tk.MustQuery("select a from t order by a").
-		Check(testkit.Rows("ok", "too_long_to_fit_varchar_10"))
+		Check(testkit.Rows("ok", "too_long_11"))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/ddl/modify_column_test.go` around lines 776 - 806, The inserted test row
uses a 26-byte string which can fail earlier than the intended cast check;
change the insert values in the test setup (the tk.MustExec call that inserts
into table t) so the second value is between 11 and 20 bytes (fits CHAR(20) but
exceeds VARCHAR(10)) to force castIndexValuesToChangingTypes to reject it and
exercise the ModifyTypeIndexReorg path; keep the rest of the test (the
testfailpoint.EnableCall on github.com/pingcap/tidb/pkg/ddl/getModifyColumnType,
the ALTER statement, and subsequent assertions on tblInfo/col/indices)
unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@pkg/ddl/modify_column_test.go`:
- Around line 776-806: The inserted test row uses a 26-byte string which can
fail earlier than the intended cast check; change the insert values in the test
setup (the tk.MustExec call that inserts into table t) so the second value is
between 11 and 20 bytes (fits CHAR(20) but exceeds VARCHAR(10)) to force
castIndexValuesToChangingTypes to reject it and exercise the
ModifyTypeIndexReorg path; keep the rest of the test (the
testfailpoint.EnableCall on github.com/pingcap/tidb/pkg/ddl/getModifyColumnType,
the ALTER statement, and subsequent assertions on tblInfo/col/indices)
unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d1640ef4-d785-4f85-a60a-89455f294c19

📥 Commits

Reviewing files that changed from the base of the PR and between d74a639 and 7ece496.

📒 Files selected for processing (7)
  • pkg/ddl/modify_column.go
  • pkg/ddl/modify_column_test.go
  • pkg/lightning/mydump/parquet_parser_test.go
  • pkg/table/column.go
  • pkg/table/column_test.go
  • pkg/table/tables/index.go
  • pkg/table/tables/tables.go
✅ Files skipped from review due to trivial changes (1)
  • pkg/table/tables/index.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/ddl/modify_column.go

@joechenrh joechenrh changed the title ddl, table: skip pre-check SELECT for widening modify-column changes ddl: skip pre-check for widening modify-column changes Apr 23, 2026
@joechenrh joechenrh force-pushed the feature/modify-column-reorg-optimization branch from 7ece496 to 58a9719 Compare April 24, 2026 01:13
@joechenrh joechenrh changed the title ddl: skip pre-check for widening modify-column changes ddl, table: skip pre-check SELECT for widening modify-column changes Apr 24, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/ddl/modify_column.go`:
- Around line 68-83: The integer branch in dataValuesFitNewType (the
isIntegerChange block) is effectively unreachable on the ModifyTypeIndexReorg
path because getModifyColumnType/needIndexReorg route unsigned-flag changes away
from that path; either delete the integer branch to keep the helper's contract
accurate, or keep it but add a targeted unit test exercising an integer widening
scenario (e.g., tinyint→int with an index) plus a clear comment in
dataValuesFitNewType explaining why this defensive check is kept; update/restore
tests that assert ModifyTypeIndexReorg behavior accordingly and reference
dataValuesFitNewType, isIntegerChange, getModifyColumnType, needIndexReorg, and
ModifyTypeIndexReorg when making the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 60c81f69-2fe1-42df-8916-29169c6e954f

📥 Commits

Reviewing files that changed from the base of the PR and between 7ece496 and 58a9719.

📒 Files selected for processing (7)
  • pkg/ddl/modify_column.go
  • pkg/ddl/modify_column_test.go
  • pkg/lightning/mydump/parquet_parser_test.go
  • pkg/table/column.go
  • pkg/table/column_test.go
  • pkg/table/tables/index.go
  • pkg/table/tables/tables.go
✅ Files skipped from review due to trivial changes (1)
  • pkg/table/column_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/table/tables/tables.go
  • pkg/ddl/modify_column_test.go

Comment thread pkg/ddl/modify_column.go Outdated
@joechenrh joechenrh force-pushed the feature/modify-column-reorg-optimization branch from 58a9719 to 0af40d6 Compare April 24, 2026 01:33
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 24, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign fzzf678 for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
pkg/ddl/modify_column.go (1)

68-83: Doc comment slightly mislocates the NotNullFlag guard.

The comment claims "the NotNullFlag guard in CastColumnValueWithStrictMode catches existing NULLs during reorg," but CastColumnValueWithStrictMode itself doesn't reject NULLs — the guard is implemented right after the call, in castIndexValuesToChangingTypes (pkg/table/tables/index.go lines 142–143). Consider tightening the wording so future readers can find the guard.

📝 Proposed wording tweak
-// dataValuesFitNewType reports whether every stored value of oldCol is
-// guaranteed to fit newCol so the pre-check SQL can be skipped on the
-// ModifyTypeIndexReorg path. Only integer and char/varchar flen widening
-// qualify; NULL -> NOT NULL is still safe because the NotNullFlag guard in
-// CastColumnValueWithStrictMode catches existing NULLs during reorg.
+// dataValuesFitNewType reports whether every stored value of oldCol is
+// guaranteed to fit newCol so the pre-check SQL can be skipped on the
+// ModifyTypeIndexReorg path. Only integer and char/varchar flen widening
+// qualify; NULL -> NOT NULL is still safe because castIndexValuesToChangingTypes
+// (pkg/table/tables/index.go) rejects NULLs against NotNullFlag when the
+// changing index is populated during reorg.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/ddl/modify_column.go` around lines 68 - 83, Update the doc comment on
dataValuesFitNewType to correct the location of the NotNullFlag guard: state
that NULL rejection is enforced after calling CastColumnValueWithStrictMode (not
inside it) and point to the guard implemented in castIndexValuesToChangingTypes
where NotNullFlag is checked; keep the rest of the explanation about safe type
changes intact and only adjust wording to reference
CastColumnValueWithStrictMode and castIndexValuesToChangingTypes specifically.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/table/tables/index.go`:
- Around line 140-144: castIndexValuesToChangingTypes currently rejects
pre-existing NULLs unconditionally (returning ErrColumnCantNull), which blocks
DELETE/UPDATE during StateWriteOnly; change the function signature to accept an
explicit boolean (e.g., isReorgPhase or enforceNotNullForReorg) and only perform
the IsNull + HasNotNullFlag check when that flag is true; update callers
(GenIndexKey, GenIndexValue) to pass false for normal DML paths and have the
reorg code/path pass true so the NULL rejection only applies to the reorg
enforcement path (retain UseChangingType behavior otherwise).

---

Nitpick comments:
In `@pkg/ddl/modify_column.go`:
- Around line 68-83: Update the doc comment on dataValuesFitNewType to correct
the location of the NotNullFlag guard: state that NULL rejection is enforced
after calling CastColumnValueWithStrictMode (not inside it) and point to the
guard implemented in castIndexValuesToChangingTypes where NotNullFlag is
checked; keep the rest of the explanation about safe type changes intact and
only adjust wording to reference CastColumnValueWithStrictMode and
castIndexValuesToChangingTypes specifically.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 98f9de79-03b9-4201-9a56-8c3cb171ff3f

📥 Commits

Reviewing files that changed from the base of the PR and between 58a9719 and 0af40d6.

📒 Files selected for processing (3)
  • pkg/ddl/modify_column.go
  • pkg/ddl/modify_column_test.go
  • pkg/table/tables/index.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/ddl/modify_column_test.go

Comment thread pkg/table/tables/index.go
@joechenrh joechenrh force-pushed the feature/modify-column-reorg-optimization branch 2 times, most recently from d43dcfd to 605bc57 Compare April 24, 2026 02:06
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
pkg/ddl/modify_column_test.go (1)

771-801: ⚠️ Potential issue | 🟠 Major

Test input still exceeds source CHAR(20) — INSERT will fail before the ALTER runs.

'too_long_to_fit_varchar_10' is 26 characters, which exceeds the source column type char(20). Under TiDB's default sql_mode (which includes STRICT_TRANS_TABLES), the INSERT at line 773 should error with "Data too long for column 'a'" before control ever reaches the ALTER under test. The subsequent assertion at lines 800-801 that expects too_long_to_fit_varchar_10 to be present in t is also inconsistent with that. Please use an 11–20 byte value so the setup succeeds and the failure isolates to the target-type cast during ModifyTypeIndexReorg.

🐛 Proposed fix
-	// 'too_long_to_fit_varchar_10' is 26 bytes, over the target varchar(10)
-	// limit. The cast in castIndexValuesToChangingTypes must reject it.
-	tk.MustExec("insert into t values ('ok'), ('too_long_to_fit_varchar_10')")
+	// 'too_long_11' fits the source char(20) but exceeds the target
+	// varchar(10); the cast in castIndexValuesToChangingTypes must reject it
+	// during index reorg.
+	tk.MustExec("insert into t values ('ok'), ('too_long_11')")
@@
 	tk.MustExec("admin check table t")
 	tk.MustQuery("select a from t order by a").
-		Check(testkit.Rows("ok", "too_long_to_fit_varchar_10"))
+		Check(testkit.Rows("ok", "too_long_11"))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/ddl/modify_column_test.go` around lines 771 - 801, The test inserts a
26-byte string that exceeds the source column CHAR(20) so the INSERT fails
before the ALTER; replace the literal 'too_long_to_fit_varchar_10' used in
tk.MustExec("insert into t values ('ok'), ('too_long_to_fit_varchar_10')") with
a shorter string of length 11–20 bytes (e.g., an 11-byte value) so the insert
succeeds and the failure is triggered by the ALTER/ModifyTypeIndexReorg path
(also update the preceding comment to reflect the new length constraint).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@pkg/ddl/modify_column_test.go`:
- Around line 771-801: The test inserts a 26-byte string that exceeds the source
column CHAR(20) so the INSERT fails before the ALTER; replace the literal
'too_long_to_fit_varchar_10' used in tk.MustExec("insert into t values ('ok'),
('too_long_to_fit_varchar_10')") with a shorter string of length 11–20 bytes
(e.g., an 11-byte value) so the insert succeeds and the failure is triggered by
the ALTER/ModifyTypeIndexReorg path (also update the preceding comment to
reflect the new length constraint).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: ccb5e8e8-92dc-4ad6-86aa-5c8bfd0967da

📥 Commits

Reviewing files that changed from the base of the PR and between 0af40d6 and 605bc57.

📒 Files selected for processing (3)
  • pkg/ddl/modify_column.go
  • pkg/ddl/modify_column_test.go
  • pkg/table/tables/index.go

@joechenrh
Copy link
Copy Markdown
Contributor Author

/retest

@tiprow
Copy link
Copy Markdown

tiprow Bot commented Apr 24, 2026

@joechenrh: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@joechenrh joechenrh force-pushed the feature/modify-column-reorg-optimization branch from 605bc57 to 97b125f Compare April 24, 2026 05:16
@ti-chi-bot ti-chi-bot Bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 24, 2026
@joechenrh joechenrh force-pushed the feature/modify-column-reorg-optimization branch from 97b125f to 71a21e6 Compare April 24, 2026 06:12
@ingress-bot
Copy link
Copy Markdown

🔍 Starting code review for this PR...

When ALTER TABLE ... MODIFY COLUMN is classified as ModifyTypeIndexReorg
(for example char(N) -> varchar(M) with utf8mb4_bin + a secondary index),
the StateDeleteOnly handler runs a synchronous
  SELECT ... FROM t WHERE LENGTH(col) > N OR col IS NULL LIMIT 1
through the internal session pool. On large tables this SELECT has no
timeout, pins the cluster GC safepoint for its entire duration, blocks
subsequent DDL on the same table, and is invisible to SHOW PROCESSLIST.

getModifyColumnType only routes isCharChange cases into
ModifyTypeIndexReorg (integer signedness flips go to ModifyTypeReorg,
same-signedness integer changes resolve to noReorgDataStrict or
NoReorgWithCheck; decimal, temporal, enum/set are similarly handled
before this path). So the only widening case that reaches this handler
is char/varchar flen widening, and for that case no existing row can
possibly violate the new type: the pre-check is pure overhead.

Gate the pre-check in doModifyColumnIndexReorg's StateDeleteOnly branch
on `isCharChange(oldCol, newCol) && newCol.GetFlen() >= oldCol.GetFlen()`;
otherwise keep the existing pre-check.

In pkg/table/tables/index.go, extend castIndexValuesToChangingTypes
with a NULL guard right after the strict cast. The underlying
Datum.ConvertTo converts NULL to NULL without error, so when the new
column type carries NotNullFlag this guard is what protects the race
window between the statement-build NULL check (GetModifiableColumnJob
calls checkForNullValue) and the DDL job's StateNone handler setting
PreventNullInsertFlag. A NULL row that lands in that window would
otherwise slip through the reorg silently.

Add failpoint hook checkModifyColumnDataEntry at the top of
checkModifyColumnData so tests can observe whether the pre-check was
reached.

Tests:

- TestModifyColumnIndexReorgWideningSkipsPrecheck covers the widening
  happy path; checkModifyColumnData is never entered and the ALTER
  succeeds.
- TestModifyColumnIndexReorgRangeViolation preserves the narrowing
  failure path (pre-check still runs and catches the oversized row).
- TestModifyColumnIndexReorgRollbackOnReorgPanic pins the reorg-phase
  rollback path end-to-end via errorMockPanic.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@joechenrh joechenrh force-pushed the feature/modify-column-reorg-optimization branch from 71a21e6 to 46da87f Compare April 24, 2026 06:45
@ti-chi-bot ti-chi-bot Bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 24, 2026
Copy link
Copy Markdown

@ingress-bot ingress-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This review was generated by AI and should be verified by a human reviewer.
Manual follow-up is recommended before merge.

Summary

  • Total findings: 7
  • Inline comments: 7
  • Summary-only findings (no inline anchor): 0
Findings (highest risk first)

🚨 [Blocker] (2)

  1. Delete-only precheck skip is not version-gated for NULL-to-NOT-NULL widening (pkg/ddl/modify_column.go:1208, pkg/table/tables/index.go:142)
  2. Flen-widening fast path skips VARCHAR-to-CHAR trailing-space validation (pkg/ddl/modify_column.go:1208, pkg/ddl/modify_column.go:867, pkg/table/column.go:385)

⚠️ [Major] (2)

  1. NULL-to-NOT-NULL failures are deferred into reorg for flen-widening index alters (pkg/ddl/modify_column.go:1208, pkg/ddl/modify_column.go:873, pkg/table/tables/index.go:142)
  2. Widening fast-path changes NULL->NOT NULL failure contract for indexed char/varchar (pkg/ddl/modify_column.go:1208, pkg/table/tables/index.go:142)

🟡 [Minor] (3)

  1. Index-reorg precheck policy is split across classification, execution, and cast enforcement (pkg/ddl/modify_column.go:1208, pkg/ddl/modify_column.go:129, pkg/table/tables/index.go:142)
  2. widensFlen name and branch comment understate the actual skip-precheck gate (pkg/ddl/modify_column.go:1208)
  3. castIndexValuesToChangingTypes no longer signals its validation side effect (pkg/table/tables/index.go:142)

Comment thread pkg/ddl/modify_column.go
// getModifyColumnType, so this is the one case where no existing
// row can violate. NULL -> NOT NULL is caught by the NotNullFlag
// guard in castIndexValuesToChangingTypes during reorg.
widensFlen := isCharChange(oldCol, args.Column) && args.Column.GetFlen() >= oldCol.GetFlen()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 [Blocker] Delete-only precheck skip is not version-gated for NULL-to-NOT-NULL widening

Impact
During rolling-upgrade owner handoff, a char/varchar widening NULL -> NOT NULL modify-column job reaches StateWriteOnly without running the NULL precheck and then resumes on an older owner that lacks the new reorg NULL guard. The job then finalizes a NOT NULL schema over preexisting NULL rows, which breaks mixed-version upgrade and rollback safety.

Scope

  • pkg/ddl/modify_column.go:1208worker.doModifyColumnIndexReorg
  • pkg/table/tables/index.go:142index.castIndexValuesToChangingTypes

Evidence
widensFlen skips checkModifyColumnData for all char/varchar widening cases in StateDeleteOnly, and there is no job.Version or capability gate around that branch. The skip now relies on the new NotNullFlag rejection in castIndexValuesToChangingTypes, but a resumed job in StateWriteOnly does not rerun the delete-only precheck, so cross-version owner takeover depends on this new guard being present on every owner binary.

Change request
Do not skip delete-only precheck for NULL -> NOT NULL transitions, even when widensFlen is true, or add an explicit version/capability gate before enabling this optimization. Ensure unsupported mixed-version resume paths fail fast with an actionable error instead of relying on implicit owner-version behavior. Add a compatibility-focused test that forces owner handoff between StateDeleteOnly and StateWriteOnly with preexisting NULL rows.

Comment thread pkg/ddl/modify_column.go
// getModifyColumnType, so this is the one case where no existing
// row can violate. NULL -> NOT NULL is caught by the NotNullFlag
// guard in castIndexValuesToChangingTypes during reorg.
widensFlen := isCharChange(oldCol, args.Column) && args.Column.GetFlen() >= oldCol.GetFlen()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 [Blocker] Flen-widening fast path skips VARCHAR-to-CHAR trailing-space validation

Impact
For indexed VARCHAR -> CHAR conversions on _bin collations with new_flen >= old_flen, rows ending with spaces are no longer rejected in strict mode and the DDL can succeed after trimming those values.
This changes the canonical accept/reject behavior and can silently alter user-visible stored values.

Scope

  • pkg/ddl/modify_column.go:1208(*worker).doModifyColumnIndexReorg
  • pkg/ddl/modify_column.go:867buildCheckSQLFromModifyColumn
  • pkg/table/column.go:385castColumnValue

Evidence
widensFlen now bypasses checkModifyColumnData for all char/varchar widen-or-equal transitions in StateDeleteOnly.
The skipped checker is the only path that enforces trailing-space rejection for VARCHAR -> CHAR via LIKE '% ', while the reorg path uses CastColumnValueWithStrictMode, which trims trailing spaces for TypeString through truncateTrailingSpaces without returning an error.

Change request
Keep the widening optimization only for pure length-range checks, but do not skip the VARCHAR -> CHAR trailing-space validation path.
A minimal fix is to exclude oldTp == mysql.TypeVarchar && newTp == mysql.TypeString from widensFlen skip handling (or split precheck so this semantic check always runs).

Comment thread pkg/ddl/modify_column.go
// getModifyColumnType, so this is the one case where no existing
// row can violate. NULL -> NOT NULL is caught by the NotNullFlag
// guard in castIndexValuesToChangingTypes during reorg.
widensFlen := isCharChange(oldCol, args.Column) && args.Column.GetFlen() >= oldCol.GetFlen()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [Major] NULL-to-NOT-NULL failures are deferred into reorg for flen-widening index alters

Impact
Indexed CHAR/VARCHAR alters that widen flen and add NOT NULL now enter WriteReorganization even when the table already contains NULL rows that must fail.
The job spends backfill scan/write resources before returning the same deterministic invalid-data failure, which creates avoidable long-running DDL work and load spikes on larger tables.

Scope

  • pkg/ddl/modify_column.go:1208(*worker).doModifyColumnIndexReorg
  • pkg/ddl/modify_column.go:873buildCheckSQLFromModifyColumn
  • pkg/table/tables/index.go:142(*index).castIndexValuesToChangingTypes

Evidence
doModifyColumnIndexReorg skips checkModifyColumnData(...) entirely when widensFlen is true, so the delete-only precheck no longer runs in that branch. buildCheckSQLFromModifyColumn includes the IS NULL predicate for isNullToNotNullChange, but that early check is bypassed by the widening guard. NULL rejection is instead raised from castIndexValuesToChangingTypes during reorg backfill via table.ErrColumnCantNull.

Change request
Keep the widening optimization for length/range checks, but retain a null-only precheck when isNullToNotNullChange(oldCol, args.Column) is true. This preserves early failure before reorg for already-invalid datasets while still avoiding the unnecessary widening-length pre-scan.

Comment thread pkg/ddl/modify_column.go
// getModifyColumnType, so this is the one case where no existing
// row can violate. NULL -> NOT NULL is caught by the NotNullFlag
// guard in castIndexValuesToChangingTypes during reorg.
widensFlen := isCharChange(oldCol, args.Column) && args.Column.GetFlen() >= oldCol.GetFlen()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [Major] Widening fast-path changes NULL->NOT NULL failure contract for indexed char/varchar

Impact
For indexed char/varchar widening with NULL -> NOT NULL, existing NULL rows now fail in reorg with [table:1048] instead of the prior pre-check error [ddl:1138].
During rolling upgrade, the same DDL can return different error codes depending on which binary version executes the relevant job stage, which breaks repeatable upgrade-time failure handling.

Scope

  • pkg/ddl/modify_column.go:1208doModifyColumnIndexReorg
  • pkg/table/tables/index.go:142castIndexValuesToChangingTypes

Evidence
doModifyColumnIndexReorg now skips checkModifyColumnData when widensFlen is true, so the old pre-check path is bypassed for widening (pkg/ddl/modify_column.go:1208-1210).
checkModifyColumnData is the branch that returns dbterror.ErrInvalidUseOfNull (pkg/ddl/modify_column.go:835), while the new fallback emits table.ErrColumnCantNull from castIndexValuesToChangingTypes (pkg/table/tables/index.go:142-143).

Change request
Keep the widening optimization for length checks, but preserve error-contract parity for NULL -> NOT NULL by still running the NULL pre-check (or by translating the reorg null failure to ErrInvalidUseOfNull).
Add a focused regression case in TestModifyColumnIndexReorgPrecheck that inserts NULL data, performs the widening+NOT NULL alter, and asserts stable error code plus rollback result across repeated runs.

Comment thread pkg/ddl/modify_column.go
// getModifyColumnType, so this is the one case where no existing
// row can violate. NULL -> NOT NULL is caught by the NotNullFlag
// guard in castIndexValuesToChangingTypes during reorg.
widensFlen := isCharChange(oldCol, args.Column) && args.Column.GetFlen() >= oldCol.GetFlen()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 [Minor] Index-reorg precheck policy is split across classification, execution, and cast enforcement

Impact
The rule for skipping checkModifyColumnData during ModifyTypeIndexReorg now lives in three separate locations across two packages. This parallel contract raises maintenance cost and makes policy updates harder to keep consistent because no single function owns the full decision boundary.

Scope

  • pkg/ddl/modify_column.go:1208doModifyColumnIndexReorg
  • pkg/ddl/modify_column.go:129getModifyColumnType
  • pkg/table/tables/index.go:142castIndexValuesToChangingTypes

Evidence
doModifyColumnIndexReorg introduces widensFlen to gate checkModifyColumnData, while the same behavior depends on getModifyColumnType selecting ModifyTypeIndexReorg and on castIndexValuesToChangingTypes enforcing NotNullFlag at reorg time. The new comment in this branch explicitly ties these cross-location assumptions together, which shows the policy is coordinated by convention instead of a shared helper contract.

Change request
Extract a dedicated DDL-level helper for index-reorg precheck eligibility and use it from the state-machine branch instead of embedding the policy inline. Keep castIndexValuesToChangingTypes as runtime enforcement, but make the dependency explicit through a named contract and add a focused test that asserts the precheck decision matrix in one place.

Comment thread pkg/ddl/modify_column.go
// getModifyColumnType, so this is the one case where no existing
// row can violate. NULL -> NOT NULL is caught by the NotNullFlag
// guard in castIndexValuesToChangingTypes during reorg.
widensFlen := isCharChange(oldCol, args.Column) && args.Column.GetFlen() >= oldCol.GetFlen()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 [Minor] widensFlen name and branch comment understate the actual skip-precheck gate

Impact
The StateDeleteOnly path skips checkModifyColumnData for both widening and equal-length CHAR/VARCHAR changes because the gate is args.Column.GetFlen() >= oldCol.GetFlen(). The identifier widensFlen and the surrounding wording describe widening only, so readers get an inaccurate contract for when pre-check is bypassed.

Scope

  • pkg/ddl/modify_column.go:1208doModifyColumnIndexReorg

Evidence
widensFlen is computed as isCharChange(oldCol, args.Column) && args.Column.GetFlen() >= oldCol.GetFlen(), which includes the equality case. The adjacent comment labels the rule as flen widening and does not state that equal length also skips the pre-check.

Change request
Rename the gate to reflect non-narrowing semantics, such as nonNarrowingFlen, or tighten the condition to strict widening if that is the intended policy. Update the nearby comment to explicitly document the exact skip condition (>= or >) so branch behavior and text stay aligned.

Comment thread pkg/table/tables/index.go
}
// Strict cast passes NULL through; reject it here when the new type
// carries NotNullFlag so reorg surfaces existing NULL rows.
if indexedValues[i].IsNull() && mysql.HasNotNullFlag(tblCol.ChangingFieldType.GetFlag()) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 [Minor] castIndexValuesToChangingTypes no longer signals its validation side effect

Impact
castIndexValuesToChangingTypes now enforces a NOT NULL constraint and returns ErrColumnCantNull in addition to performing type conversion. A cast-only name hides that semantic check and obscures the error contract that GenIndexKey and GenIndexValue now inherit from this helper.

Scope

  • pkg/table/tables/index.go:142castIndexValuesToChangingTypes

Evidence
Inside castIndexValuesToChangingTypes, the new branch indexedValues[i].IsNull() && mysql.HasNotNullFlag(tblCol.ChangingFieldType.GetFlag()) returns table.ErrColumnCantNull. That branch introduces explicit constraint validation beyond casting, but the function name still communicates conversion only.

Change request
Either rename this helper to include both behaviors, for example castAndValidateIndexValuesForChangingTypes, or split validation into a separate helper with an intent-focused name. Keep a short function-level comment that states both conversion and NOT NULL enforcement so callers can read the contract correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants