ttl: honor scan task cancellation across statement boundaries by zanmato1984 · Pull Request #67285 · pingcap/tidb

zanmato1984 · 2026-03-25T07:14:59Z

What problem does this PR solve?

Issue Number: ref #66982

Problem Summary

TestCancelWhileScan is flaky because TTL scan cancellation can fall into a statement-boundary gap.

The scan task currently relies on KillStmt to interrupt the running internal SQL. If cancellation happens between statements, the next internal statement resets the statement context before execution, which clears the statement-bound kill state. As a result, the scan SELECT can still start and continue running even though the TTL task has already been canceled.

What is changed and how it works?

This PR fixes the issue in two parts:

Pass the TTL scan task cancellation context into the actual internal SQL execution path, instead of relying only on KillStmt.
After ResetContextOfStmt, immediately honor a canceled caller context before executing the next statement.

Together, these changes close the statement-boundary cancellation gap and make TTL scan cancellation respond to task cancellation directly.

This PR also adds targeted regression coverage for the statement-boundary cancellation case.

Check List

Tests

Unit test
Integration test
Lint

Side effects

This changes internal SQL behavior only when the caller context is already canceled. That behavior should be more correct and should have negligible performance impact beyond an additional ctx.Err() check and context propagation in TTL scan.

Release note

Fix flaky TTL scan cancellation caused by a statement-boundary gap between task cancellation and internal SQL execution.

Summary by CodeRabbit

Bug Fixes
- TTL scans now honor cancellation promptly at SQL statement boundaries and propagate cancellation through scan operations, preventing stalled aborts.
- Execution path now checks for cancellation before starting subsequent statements, stopping further work when a caller cancels.
- Transaction finalization now more reliably commits or rolls back around canceled contexts.
Tests
- Added integration and unit tests validating TTL scan cancellation timing and transaction behavior after cancellation.
Chores
- Test shard configuration adjusted.

pantheon-ai · 2026-03-25T07:15:05Z

Review failed due to infrastructure/execution failure after retries. Please re-trigger review.

_{ℹ️ Learn more details on Pantheon AI.}

tiprow · 2026-03-25T07:15:18Z

Hi @zanmato1984. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

coderabbitai · 2026-03-25T07:15:22Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Threads a cancellable per-scan context into TTL scan execution and sessions, injects a failpoint immediately before SQL-killer reset at statement cleanup, and enforces an explicit cancellation check after statement-context reset at statement boundaries.

Changes

Cohort / File(s)	Summary
Executor statement reset `pkg/executor/select.go`	Injects `beforeResetSQLKillerForTTLScan` failpoint immediately before `vars.SQLKiller.Reset()` when `vars.InRestrictedSQL` and `vars.InternalSQLScanUserTable` are true.
Session execution boundary `pkg/session/session.go`	After `ResetContextOfStmt(...)` in `executeStmtImpl`, check `ctx.Err()` and return early if canceled to honor caller cancellation at the statement boundary.
TTL scan flow `pkg/ttl/ttlworker/scan.go`	Create per-task `scanCtx` via `context.WithCancel(ctx)`, cancel it from the kill goroutine (call `cancelScanCtx()` before `rawSess.KillStmt()`), use `scanCtx` for session creation and SQL execution, and make retry/wait/delete logic observe `scanCtx` (return `scanCtx.Err()`).
TTL session signature `pkg/ttl/ttlworker/session.go`	Change `NewScanSession` to accept `ctx context.Context` and use that context for internal `ExecuteSQL` calls (replacing `context.Background()`).
Tests — TTL and executor `pkg/ttl/ttlworker/scan_integration_test.go`, `pkg/ttl/ttlworker/session_integration_test.go`, `pkg/executor/test/executor/executor_test.go`	Add `TestCancelWhileScanAtStatementBoundary` integration test using failpoints to trigger cancellation at statement boundary; update test call sites to new `NewScanSession` signature; relax cancellation assertion to use `errors.Is` semantics.
Transaction handling tests & manager `pkg/dxf/framework/storage/task_state_test.go`, `pkg/dxf/framework/storage/task_table.go`	Add test for txn rollback on canceled context; change `TaskManager.WithNewTxn` defer to call `se.CommitTxn(ctx)` on success and `se.RollbackTxn(...)` (with internal source context) on failure instead of raw SQL exec.
Build config `pkg/dxf/framework/storage/BUILD.bazel`	Increase `go_test` `shard_count` from 28 to 29 for `storage_test`.

Sequence Diagram

sequenceDiagram
    participant KG as Kill Goroutine
    participant Task as TTL Scan Task
    participant Sess as Scan Session / Executor
    participant Killer as SQL Killer

    Task->>Task: scanCtx = context.WithCancel(ctx)
    Task->>Sess: NewScanSession(scanCtx, ...)
    Task->>Sess: Execute SQL using scanCtx

    alt normal execution
        Sess->>Sess: run statement
        Note over Sess: failpoint beforeResetSQLKillerForTTLScan
        Sess->>Killer: vars.SQLKiller.Reset()
        Sess->>Sess: ResetContextOfStmt returns
        Sess->>Sess: check ctx.Err() -> continue if nil
    else cancellation path
        KG->>KG: receive kill trigger
        KG->>Task: cancelScanCtx()
        KG->>Killer: rawSess.KillStmt()
        Task->>Sess: scanCtx canceled -> operations return scanCtx.Err()
    end

    Sess-->>Task: propagate cancellation error

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

[release-nextgen-20251011] *: adapt TopSQL naming for TopProfiling (prepare for TopRU) (#65820) #67238: Modifies ResetContextOfStmt in pkg/executor/select.go (related to SQL-killer/reset logic and failpoint placement).
*: align RU v2 bypass statement filtering with v1 #67410: Modifies executeStmtImpl in pkg/session/session.go (related to statement execution boundary handling and cancellation checks).

Suggested reviewers

lcwangchao
YangKeao
wjhuang2016
hawkingrei

Poem

🐰 I threaded a quiet scan-bound thread,
A cancel whisper nudged the work to bed,
A failpoint paused the killer's reset beat,
One hop, one cancel — scan and killer meet,
I munch the logs and hop away, content.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically summarizes the main change: fixing TTL scan cancellation to work across statement boundaries.
Description check	✅ Passed	The PR description is comprehensive and follows the template structure with all required sections properly filled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

hawkingrei · 2026-03-25T07:16:17Z

/ok-to-test

codecov · 2026-03-25T07:50:57Z

Codecov Report

❌ Patch coverage is 93.10345% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.6777%. Comparing base (8412422) to head (7659f52).
⚠️ Report is 14 commits behind head on master.

Additional details and impacted files

@@               Coverage Diff                @@
##             master     #67285        +/-   ##
================================================
- Coverage   77.7173%   77.6777%   -0.0396%     
================================================
  Files          1959       1945        -14     
  Lines        543377     545796      +2419     
================================================
+ Hits         422298     423962      +1664     
- Misses       120238     121832      +1594     
+ Partials        841          2       -839

Flag	Coverage Δ
integration	`41.2423% <10.3448%> (+5.0675%)`	⬆️
unit	`76.8749% <93.1034%> (+0.5318%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
dumpling	`61.5065% <ø> (ø)`
parser	`∅ <ø> (∅)`
br	`48.9152% <ø> (-12.0649%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

zanmato1984 · 2026-03-25T08:15:23Z

/retest

bb7133

LGTM

zanmato1984 · 2026-03-25T19:20:22Z

/retest

zanmato1984 · 2026-03-25T19:20:50Z

@pantheon-ai please review

pantheon-ai · 2026-03-25T19:20:58Z

Review Complete

Findings: 0 issues
Posted: 0
Duplicates/Skipped: 0

_{ℹ️ Learn more details on Pantheon AI.}

pantheon-ai

✅ Code looks good. No issues found.

zanmato1984 · 2026-03-26T02:21:02Z

/retest

zanmato1984 · 2026-03-27T20:43:23Z

/retest

zanmato1984 · 2026-03-27T22:13:52Z

/check-issue-triage-complete

zanmato1984 · 2026-03-31T18:55:38Z

/retest

zanmato1984 · 2026-04-01T08:00:33Z

/hold

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/dxf/framework/storage/task_state_test.go`:
- Around line 151-153: The test currently calls sqlexec.ExecSQL(ctx,
se.GetSQLExecutor(), "select sleep(10)"), asserts require.NoError(t, err) and
then returns ctx.Err(); instead capture and return the ExecSQL error directly so
the test observes statement-level cancellation: replace the require.NoError
check with returning the err from the ExecSQL call (i.e., keep the call to
sqlexec.ExecSQL using ctx and se.GetSQLExecutor(), assign its error and return
that error) so cancellation surfaced by ExecSQL is asserted instead of
fabricating context.Canceled.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: e60c7db3-0ab7-4a56-b235-434f4553a6d6

📥 Commits

Reviewing files that changed from the base of the PR and between 1cf552f and af1968c.

📒 Files selected for processing (2)

pkg/dxf/framework/storage/task_state_test.go
pkg/dxf/framework/storage/task_table.go

YangKeao

LGTM

Though I doubt whether it'll be very helpful, because the original implementation will leak at most one statement, but after all this PR makes things better 👍 .,

ti-chi-bot · 2026-04-02T07:17:24Z

[LGTM Timeline notifier]

Timeline:

2026-03-25 18:31:02.987070613 +0000 UTC m=+379459.023140873: ☑️ agreed by bb7133.
2026-04-02 07:17:23.336430063 +0000 UTC m=+422248.541790120: ☑️ agreed by YangKeao.

D3Hunter

rest lgtm

D3Hunter · 2026-04-02T07:19:29Z

-			_, commitErr := sqlexec.ExecSQL(ctx, se.GetSQLExecutor(), sql)
-			if err == nil && commitErr != nil {
-				err = commitErr
+				commitErr := se.CommitTxn(ctx)


can you put this inside the comment on why we use begin explicitly, but use named method for commit/rollback

zanmato1984 · 2026-04-02T08:58:23Z

/retest

ti-chi-bot · 2026-04-03T02:13:46Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bb7133, D3Hunter, YangKeao

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [D3Hunter,YangKeao,bb7133]
~~pkg/dxf/OWNERS~~ [D3Hunter]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

zanmato1984 · 2026-04-03T08:26:59Z

/retest

hawkingrei · 2026-04-03T09:00:45Z

/retest

hawkingrei · 2026-04-03T15:35:39Z

/retest

hawkingrei · 2026-04-03T17:27:33Z

/retest

zanmato1984 · 2026-04-03T19:57:17Z

/retest

zanmato1984 · 2026-04-03T20:06:01Z

/retest

zanmato1984 · 2026-04-03T20:36:59Z

/retest

zanmato1984 · 2026-04-03T21:38:57Z

/retest

zanmato1984 · 2026-04-03T21:43:16Z

/hold

hawkingrei · 2026-04-03T23:47:33Z

/retest

zanmato1984 · 2026-04-03T23:52:06Z

/unhold

zanmato1984 · 2026-04-04T02:15:04Z

/retest

ti-chi-bot Bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/needs-tests-checked do-not-merge/needs-triage-completed labels Mar 25, 2026

ti-chi-bot Bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 25, 2026

ti-chi-bot Bot added the ok-to-test Indicates a PR is ready to be tested. label Mar 25, 2026

zanmato1984 requested a review from bb7133 March 25, 2026 07:22

zanmato1984 force-pushed the issue-66982-flaky branch from 703660b to 6ed9f08 Compare March 25, 2026 07:26

YangKeao self-requested a review March 25, 2026 07:47

bb7133 approved these changes Mar 25, 2026

View reviewed changes

ti-chi-bot Bot added approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Mar 25, 2026

pantheon-ai Bot reviewed Mar 25, 2026

View reviewed changes

ti-chi-bot Bot removed the do-not-merge/needs-triage-completed label Mar 27, 2026

ti-chi-bot Bot removed the approved label Apr 1, 2026

ti-chi-bot Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 1, 2026

coderabbitai Bot reviewed Apr 1, 2026

View reviewed changes

Comment thread pkg/dxf/framework/storage/task_state_test.go

ti-chi-bot Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 2, 2026

YangKeao approved these changes Apr 2, 2026

View reviewed changes

ti-chi-bot Bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Apr 2, 2026

D3Hunter reviewed Apr 2, 2026

View reviewed changes

dxf: clarify WithNewTxn transaction cleanup

7659f52

D3Hunter approved these changes Apr 3, 2026

View reviewed changes

ti-chi-bot Bot added approved and removed do-not-merge/needs-tests-checked labels Apr 3, 2026

ti-chi-bot Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 3, 2026

ti-chi-bot Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 3, 2026

ti-chi-bot Bot merged commit 7d3162c into pingcap:master Apr 4, 2026
35 checks passed

This was referenced Apr 4, 2026

Flaky test: TestCancelWhileScan in pkg/ttl/ttlworker #66982

Open

ttl: stabilize TestCancelWhileScan runtime #67657

Open

coderabbitai Bot mentioned this pull request Apr 18, 2026

pkg/ttl/ttlworker: stabilize flaky TestCancelWhileScan #67885

Closed

13 tasks

Conversation

zanmato1984 commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

Problem Summary

What is changed and how it works?

Check List

Side effects

Release note

Summary by CodeRabbit

Uh oh!

pantheon-ai Bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tiprow Bot commented Mar 25, 2026

Uh oh!

coderabbitai Bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

hawkingrei commented Mar 25, 2026

Uh oh!

codecov Bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

zanmato1984 commented Mar 25, 2026

Uh oh!

bb7133 left a comment

Choose a reason for hiding this comment

Uh oh!

zanmato1984 commented Mar 25, 2026

Uh oh!

zanmato1984 commented Mar 25, 2026

Uh oh!

pantheon-ai Bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pantheon-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

zanmato1984 commented Mar 26, 2026

Uh oh!

zanmato1984 commented Mar 27, 2026

Uh oh!

zanmato1984 commented Mar 27, 2026

Uh oh!

zanmato1984 commented Mar 31, 2026

Uh oh!

zanmato1984 commented Apr 1, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

YangKeao left a comment

Choose a reason for hiding this comment

Uh oh!

ti-chi-bot Bot commented Apr 2, 2026

[LGTM Timeline notifier]

Uh oh!

D3Hunter left a comment

Choose a reason for hiding this comment

Uh oh!

D3Hunter Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

zanmato1984 commented Apr 2, 2026

Uh oh!

ti-chi-bot Bot commented Apr 3, 2026

Uh oh!

zanmato1984 commented Apr 3, 2026

zanmato1984 commented Mar 25, 2026 •

edited

Loading

pantheon-ai Bot commented Mar 25, 2026 •

edited

Loading

coderabbitai Bot commented Mar 25, 2026 •

edited

Loading

codecov Bot commented Mar 25, 2026 •

edited

Loading

pantheon-ai Bot commented Mar 25, 2026 •

edited

Loading