CI randomly fails due to mock system bringup timeout

**Describe the bug**
A random one of the CI tests often fails on a PR check because one of the mock system containers didn't become healthy in time.  Simply rerunning failed tests without changing anything virtually always resolves this problem, so it definitely seems spurious.  #1353 attempted to address this problem, but it seems that approach wasn't effective/correct since I'm seeing failures when the attempt to bring up the system hasn't been going for 90 seconds yet.  For instance, this run only attempted to bring up the mock system for 63 seconds (much of that dedicated to downloading and extracting the CRDB image) before failing:

<img width="872" height="445" alt="Image" src="https://github.com/user-attachments/assets/4d5d00c6-9b9a-4ebe-8b60-d33263c798c7" />

**To reproduce**
Observe checks on PRs.  Many PRs have a single check that failed in this way.

**Difference from expected behavior**
The CI should almost never succeed when retrying after a failure unless there is something completely beyond its control like a network outage (the CI should be a good indicator of actual code faults).

**Possible solution**
It seems like maybe the time to download and extract supporting (CRDB) images may be included in the container startup timeout -- ideally, we would exclude this time from the bringup timeout and have a separate timeout for image acquisition (so network failure and image bringup failure are both still detected, but independent from one another).  But regardless, the timeouts should clearly and correctly relate to what they are measuring, and it seems like the 90 seconds from #1353 is not actually a controlling timeout since the failure above occurred after only 63 seconds of attempt.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI randomly fails due to mock system bringup timeout #1390

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CI randomly fails due to mock system bringup timeout #1390

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions