Skip to content

Conversation

@leftwo
Copy link
Contributor

@leftwo leftwo commented Sep 4, 2025

Various tuning to reduce test flakes during CI.

Decrease the number of tests running at the same time on buildomat runs.
The buildomat AWS instance has 8 cores, so we configure the nextest run to have
two less than that. This reduces overall system load.

Increase some internal timeouts, and add loops to some tests to allow them to finish
even if the system is under heavy load.

Added a few debug messages, but, they may be of limited value.

@leftwo leftwo changed the title WIP test flake debugging Reduce test flakes Sep 8, 2025
@leftwo leftwo requested a review from jmpesp September 8, 2025 17:56
@leftwo leftwo marked this pull request as ready for review September 8, 2025 17:56
RegionSnapshotReplacementState::Requested
) {
eprintln!(
"loop {i} Failed {:?} != Requested",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What state was this in when you saw this? This seems like a failure to properly unwind the saga as it should never have reached this check without going through rsrss_set_saga_id_undo. Likewise, the assert_eq above checking operating_saga_id should have fired if the request was in an intermediate state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does eventually get to the requested state.
When I saw the panic, it would be in Allocating state. At least that's what I had written down in an earlier comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in DMs, this was concerning enough to address in a separate PR.

pub async fn remove_disk_from_snapshot_rop(&self) {
let disk_url = get_disk_url("disk-from-snapshot");

eprintln!("NOW Remove disk from snapshot for disk {:?}", disk_url);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These messages are not necessarily needed due to request logging

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

@leftwo leftwo requested a review from jmpesp September 15, 2025 23:23
@leftwo leftwo merged commit 4b62dfc into main Sep 22, 2025
16 checks passed
@leftwo leftwo deleted the alan/why-do-tests-flake branch September 22, 2025 23:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants