Skip to content

tests: flaky next gen startup can fail before SYSTEM TiDB is ready #4438

@wlwilliamx

Description

@wlwilliamx

What did you do?

Investigated flaky presubmit failures on PR #4426 where these two jobs failed with Failed to start TiDB:

  • pull-cdc-mysql-integration-light-next-gen build 608
  • pull-cdc-pulsar-integration-light-next-gen build 84

Both archived artifacts show the same upstream bootstrap sequence:

  • upstream PD logs [PD:keyspace:ErrRegionSplitTimeout] region split timeout
  • upstream PD logs failed to create pre-alloc keyspace for keyspace1
  • upstream tidb-system then retries Load keyspace SYSTEM failed with ErrKeyspaceNotFound
  • the integration bootstrap script eventually times out in check_tidb_health and prints Failed to start TiDB

The shared bootstrap path is tests/integration_tests/_utils/start_tidb_cluster_nextgen. It waits for PD health, but it does not wait for next-gen keyspace pre-allocation to finish before starting the first SYSTEM TiDB.

What did you expect to see?

Next-gen integration bootstrap should not start SYSTEM TiDB until the upstream keyspace pre-allocation is ready, and these presubmit jobs should not fail with intermittent Failed to start TiDB.

What did you see instead?

The jobs failed before TiDB could accept MySQL connections because upstream keyspace pre-allocation had already timed out.

Relevant failure links:

Relevant log evidence:

  • upstream PD: ErrRegionSplitTimeout then failed to create pre-alloc keyspace
  • upstream system TiDB: Load keyspace SYSTEM failed: ... ErrKeyspaceNotFound

Versions of the cluster

Upstream TiDB cluster version (from the failed job artifact):

Release Version: v9.0.0-beta.2.pre-1334-g5766c79
Git Commit Hash: 5766c79bbff7d2ac273d7cc7cfe71d29fbfc5488
Kernel Type: Next Generation

Upstream TiKV version (from the failed job artifact):

Release Version: 8.5.4+branch-HEAD
Git Commit Hash: 2cfd099039e3bd207aea7efbe8725a413beb4313

TiCDC version (from the failed job artifact):

release-version=v8.5.4-nextgen.202510.5-115-g9d0f1f4d
git-hash=9d0f1f4dda05e49cc449515750bc1ec36dfb295e

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions