Skip to content

tests: add temporary next gen bootstrap delay#4439

Merged
ti-chi-bot[bot] merged 1 commit intopingcap:masterfrom
wlwilliamx:tests/nextgen-bootstrap-delay
Mar 11, 2026
Merged

tests: add temporary next gen bootstrap delay#4439
ti-chi-bot[bot] merged 1 commit intopingcap:masterfrom
wlwilliamx:tests/nextgen-bootstrap-delay

Conversation

@wlwilliamx
Copy link
Copy Markdown
Collaborator

@wlwilliamx wlwilliamx commented Mar 11, 2026

What problem does this PR solve?

Issue Number: close #4438

What is changed and how it works?

This PR adds a temporary 30 second delay in tests/integration_tests/_utils/start_tidb_cluster_nextgen after TiKV and TiKV-worker are ready and before the first SYSTEM TiDB starts.

PD health only guarantees that the service is up. In the flaky next-gen bootstrap path, giving PD a short quiet window here reduces cases where keyspace pre-allocation is still incomplete when SYSTEM TiDB starts.

Check List

Tests

  • Manual test

bash -n tests/integration_tests/_utils/start_tidb_cluster_nextgen

Questions

Will it cause performance regression or break compatibility?

No. This only changes integration test bootstrap timing.

Do you need to update user documentation, design documentation or monitoring documentation?

No.

Release note

None

Summary by CodeRabbit

  • Tests
    • Improved stability of test cluster bootstrap sequence by adding a pre-start delay, reducing CI startup flakiness.

@ti-chi-bot ti-chi-bot Bot added release-note-none Denotes a PR that doesn't merit a release note. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 11, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 11, 2026

📝 Walkthrough

Walkthrough

The change introduces a pre-start delay in the TiDB bootstrap sequence to await keyspace pre-allocation. A 30-second sleep with an echo statement is added before creating upstream TiDB system config and starting TiDB instances, with comments noting that PD health alone is insufficient and that this quiet window reduces CI startup flakiness.

Changes

Cohort / File(s) Summary
TiDB Cluster Bootstrap
tests/integration_tests/_utils/start_tidb_cluster_nextgen
Added pre-start delay with 30-second sleep and explanatory comments to reduce CI flakiness during keyspace pre-allocation.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Suggested labels

lgtm, approved, size/XS

Suggested reviewers

  • wk989898
  • hongyunyan
  • 3AceShowHand

Poem

🐰 A gentle sleep before the cluster wakes,
Thirty quiet seconds for the keyspace it makes,
No more flaky starts in the CI night,
Just patience and peace—oh, what a delight! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: adding a temporary delay to the next-gen bootstrap sequence in tests.
Description check ✅ Passed The description covers required sections including problem statement (Issue Number), changes explanation, test verification, and answers to compatibility and documentation questions.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/integration_tests/_utils/start_tidb_cluster_nextgen (1)

346-351: Consider making the delay configurable.

The fixed 30-second delay unconditionally adds time to every CI run. In slower CI environments, 30 seconds may still be insufficient, while in faster environments it wastes time.

Consider making this configurable via an environment variable with a sensible default:

♻️ Suggested refactor for configurable delay
 # TODO: replace this fixed delay with an explicit keyspace readiness check.
 # PD health only guarantees the service is up. In next-gen bootstrap we also
 # rely on PD finishing keyspace pre-allocation before the SYSTEM TiDB starts.
 # Giving PD a short quiet window here reduces flaky startup failures in CI.
+KEYSPACE_PREALLOC_DELAY=${KEYSPACE_PREALLOC_DELAY:-30}
 echo "Waiting for next-gen keyspace pre-allocation to settle..."
-sleep 30
+sleep "$KEYSPACE_PREALLOC_DELAY"

This allows overriding the delay per environment (e.g., longer for slow CI runners) without modifying the script.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration_tests/_utils/start_tidb_cluster_nextgen` around lines 346 -
351, Replace the hardcoded sleep with a configurable environment variable: read
a variable like NEXTGEN_KEYSPACE_SETTLE_SECONDS (defaulting to 30) and use it in
place of the literal "sleep 30" so CI can override the delay; keep the existing
echo "Waiting for next-gen keyspace pre-allocation to settle..." message and
validate/coerce the env value to an integer/fallback to 30 before calling sleep
to avoid invalid input.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/integration_tests/_utils/start_tidb_cluster_nextgen`:
- Around line 346-351: Replace the hardcoded sleep with a configurable
environment variable: read a variable like NEXTGEN_KEYSPACE_SETTLE_SECONDS
(defaulting to 30) and use it in place of the literal "sleep 30" so CI can
override the delay; keep the existing echo "Waiting for next-gen keyspace
pre-allocation to settle..." message and validate/coerce the env value to an
integer/fallback to 30 before calling sleep to avoid invalid input.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 94fb399f-3b4b-4a35-b23c-6e3c0cff6dba

📥 Commits

Reviewing files that changed from the base of the PR and between 27c7bbe and ff04ec0.

📒 Files selected for processing (1)
  • tests/integration_tests/_utils/start_tidb_cluster_nextgen

@wlwilliamx
Copy link
Copy Markdown
Collaborator Author

CC @wk989898 @tenfyzhong

@ti-chi-bot ti-chi-bot Bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Mar 11, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Mar 11, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tenfyzhong, wk989898

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [tenfyzhong,wk989898]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Mar 11, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Mar 11, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-03-11 10:58:30.255121809 +0000 UTC m=+434141.767179470: ☑️ agreed by tenfyzhong.
  • 2026-03-11 11:02:38.049561247 +0000 UTC m=+434389.561618928: ☑️ agreed by wk989898.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the reliability of integration tests by introducing a temporary delay during the bootstrap process of next-generation TiDB clusters. This delay ensures that the Placement Driver (PD) has sufficient time to complete keyspace pre-allocation before the SYSTEM TiDB instance starts, thereby mitigating test flakiness caused by timing issues in the CI environment.

Highlights

  • Integration Test Stability: Introduced a temporary 30-second delay in the start_tidb_cluster_nextgen script to allow for keyspace pre-allocation to complete before the SYSTEM TiDB starts, addressing CI flakiness in next-gen bootstrap paths.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • tests/integration_tests/_utils/start_tidb_cluster_nextgen
    • Added a 30-second sleep command to introduce a delay before starting TiDB.
    • Included comments explaining the rationale for the delay, noting it's a temporary measure until an explicit keyspace readiness check can be implemented.
Activity
  • No human activity (comments, reviews) has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a temporary 30-second delay into an integration test script to address CI flakiness, which is a reasonable short-term solution. My review includes a suggestion to make this delay configurable via an environment variable. This would improve the flexibility of the script, allowing for easier adjustment of the delay in different environments without requiring code changes.

# rely on PD finishing keyspace pre-allocation before the SYSTEM TiDB starts.
# Giving PD a short quiet window here reduces flaky startup failures in CI.
echo "Waiting for next-gen keyspace pre-allocation to settle..."
sleep 30
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a hardcoded sleep duration can be inflexible, even for a temporary solution. It's better to make the delay configurable via an environment variable. This allows for easier tuning in different environments (e.g., CI vs. local) without changing the code.

sleep "${NEXT_GEN_KEYSPACE_SETTLE_DELAY:-30}"

@ti-chi-bot ti-chi-bot Bot merged commit d2a95cf into pingcap:master Mar 11, 2026
26 checks passed
tenfyzhong pushed a commit that referenced this pull request Mar 18, 2026
ref #4438

(cherry picked from commit d2a95cf)
Signed-off-by: tenfyzhong <tenfy@tenfy.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tests: flaky next gen startup can fail before SYSTEM TiDB is ready

3 participants