Skip to content

Conversation

@iximeow
Copy link
Member

@iximeow iximeow commented Oct 15, 2025

This should fix flake opportunities in both:

  • crucible_replacements::test_delete_volume_region_replacement_state_replacement_done
  • crucible_replacements::test_Delete_volume_region_replacement_state_requested

This is my proposed change in #9216. From the logs of a flake and careful reading it looks like we were almost-but-not-quite handling the condition described in wait_for_request_state if the start and end states for a replacement request are the same.

Consulting get_region_replacement_by_id and the returned replacement request state tells us when the step has completed, in addition to checking that the step had started.

This should fix flake opportunities in both:

* `crucible_replacements::test_delete_volume_region_replacement_state_replacement_done`
* `crucible_replacements::test_Delete_volume_region_replacement_state_requested`

This is my proposed change in #9216. From the logs of a flake and
careful reading it looks like we were almost-but-not-quite handling the
condition described in `wait_for_request_state` if the start and end
states for a replacement request are the same.

Consulting `get_region_replacement_by_id` and the returned replacement
request state tells us when the step has completed, in addition to
checking that the step had started.
@iximeow iximeow requested a review from jmpesp October 15, 2025 02:26
Comment on lines +589 to +592
// The replacement step is in progress, but
// not done yet. We're still waiting, but
// probably not for long.
Err(CondCheckError::<()>::NotYet)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having said a whole lot of words, the real critical point would to be able to see this somehow. I tried to coax evidence of that:

  • added a panic!("race hit!") here locally
  • ran the test as while true; do pbind -e 1 -- cargo nextest run test_delete_volume_region_replacement_state_replacement_done; done
  • in another shell, ran pbind -e 1 bash -c 'while true; do :; done

while this extremely slowed things down (.. more than the 50% I'd expected), I have yet to see the panic get hit.

the possibility that simply calling get_region_replacement_request_by_id adds enough time to consistently resolve the race is not lost on me...

@iximeow iximeow merged commit a180243 into main Oct 15, 2025
16 checks passed
@iximeow iximeow deleted the ixi/9216 branch October 15, 2025 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants