Skip to content

ttl: stabilize TestCancelWhileScan runtime#67657

Open
zanmato1984 wants to merge 2 commits intopingcap:masterfrom
zanmato1984:issue-66982-flaky-timeout
Open

ttl: stabilize TestCancelWhileScan runtime#67657
zanmato1984 wants to merge 2 commits intopingcap:masterfrom
zanmato1984:issue-66982-flaky-timeout

Conversation

@zanmato1984
Copy link
Copy Markdown
Contributor

@zanmato1984 zanmato1984 commented Apr 9, 2026

What problem does this PR solve?

Issue Number: ref #66982

Problem Summary:

TestCancelWhileScan still times out in CI under resource pressure. The previous fix in #67285 addressed statement-boundary cancellation correctness, but this test still spent too much time in long-mode stress setup/execution and could exceed shard timeout.

What changed and how does it work?

This PR keeps the same cancellation assertions and makes the stress path cheaper and more deterministic:

  • Batch test data inserts in TestCancelWhileScan instead of issuing 10k single-row inserts.
  • Use bounded rounds (10 default, 30 in -long) instead of time-based loops.
  • Add a small scan delay failpoint (sleepCoprRequest=200ms) so each round still exercises cancellation while avoiding heavy table size/time requirements.

This keeps the regression coverage focused on cancellation responsiveness while removing the long-tail runtime behavior that caused timeouts.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

Summary by CodeRabbit

Release Notes

This release contains internal test improvements with no user-facing changes.

  • Tests
    • Improved test efficiency by batching data setup and reducing redundant operations.
    • Made scan/cancel test runs deterministic by switching to fixed iterations.
    • Enhanced fault-injection coverage to better exercise cancellation behavior.

@ti-chi-bot ti-chi-bot Bot added the release-note-none Denotes a PR that doesn't merit a release note. label Apr 9, 2026
@pantheon-ai
Copy link
Copy Markdown

pantheon-ai Bot commented Apr 9, 2026

@zanmato1984 I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details.

⏳ This process typically takes 10-30 minutes depending on the complexity of the changes.

ℹ️ Learn more details on Pantheon AI.

@ti-chi-bot ti-chi-bot Bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Apr 9, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 9, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign tangenta for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tiprow
Copy link
Copy Markdown

tiprow Bot commented Apr 9, 2026

Hi @zanmato1984. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 9, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 10fa171a-1c8f-4937-b992-998a2d60b2cf

📥 Commits

Reviewing files that changed from the base of the PR and between fca6b36 and 10aeb41.

📒 Files selected for processing (1)
  • pkg/ttl/ttlworker/scan_integration_test.go
✅ Files skipped from review due to trivial changes (1)
  • pkg/ttl/ttlworker/scan_integration_test.go

📝 Walkthrough

Walkthrough

Test updated to use batched multi-row inserts (1,000 rows in 10 batches of 100), enable the sleepCoprRequest failpoint with return(200), and replace time-based scan/cancel looping with a fixed rounds := 10 iteration count; removed testflag import and related timing variables.

Changes

Cohort / File(s) Summary
TTL Scan Integration Test
pkg/ttl/ttlworker/scan_integration_test.go
Replaced 10,000 single-row INSERTs with batched multi-row inserts (1,000 rows total, 100 per batch via strings.Join); enabled github.com/pingcap/tidb/pkg/store/copr/sleepCoprRequest failpoint (return(200)); changed scan/cancel loop from time-based to fixed rounds := 10; removed testStart, testDuration, and testflag import.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested labels

ok-to-test, approved, lgtm

Suggested reviewers

  • wjhuang2016
  • YangKeao
  • bb7133

Poem

🐰 Batches hop in, neat and spry,
Ten rounds now dance beneath the sky,
A sleepy copr asks for a pause,
Tests run steady, no wild cause,
Hooray — the rabbit stamps its paws!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: stabilizing the runtime of the TestCancelWhileScan test through more efficient data insertion and deterministic looping.
Description check ✅ Passed The description includes all required sections: problem summary with issue reference, explanation of changes, completed checklist, and release note. It clearly explains the optimization strategy.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.11.4)

Command failed


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor Author

@zanmato1984 zanmato1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Role: Reviewer-R1

LGTM (round 1).

Checks:

  1. Original test intent is preserved: TestCancelWhileScan still asserts cancellation completes within 1s after cancel().
  2. Change scope is test-only and minimal: only pkg/ttl/ttlworker/scan_integration_test.go is updated.
  3. Recurrence analysis is consistent with evidence from the reopened flaky issue/build logs: current recurrence is timeout/runtime pressure rather than statement-boundary correctness.
  4. No unnecessary timing-only precise repro is retained: this update simplifies the stress profile and keeps deterministic cancellation coverage without extra reproducer-only paths.

Validation:

  • ./tools/check/failpoint-go-test.sh pkg/ttl/ttlworker -run '^TestCancelWhileScan$' -count=20 (pass)

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.3567%. Comparing base (997e75c) to head (10aeb41).
⚠️ Report is 29 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #67657        +/-   ##
================================================
- Coverage   77.5871%   77.3567%   -0.2305%     
================================================
  Files          1981       1965        -16     
  Lines        547950     551723      +3773     
================================================
+ Hits         425139     426795      +1656     
- Misses       122001     124914      +2913     
+ Partials        810         14       -796     
Flag Coverage Δ
integration 40.8968% <ø> (+6.5571%) ⬆️
unit 76.6452% <ø> (+0.3119%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 61.5065% <ø> (+0.0901%) ⬆️
parser ∅ <ø> (∅)
br 49.9148% <ø> (-10.5164%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

testStart := time.Now()
testDuration := time.Second
rounds := 10
if testflag.Long() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long mode doesn't affect the CI, so I don't think this change will be helpful.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I removed the testflag.Long() branch so CI and local runs now execute the same loop count (rounds := 10), while keeping the original cancellation assertion and test intent unchanged.

Validation:

  • ./tools/check/failpoint-go-test.sh pkg/ttl/ttlworker -run "^TestCancelWhileScan$" -count=1 (pass)
  • make lint (pass)

Included in commit 10aeb41daf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants