-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Testing: Expunge Sled #5480
Comments
@jmpesp and I did some preliminary testing in the Canada region earlier. We expunged 'gravytrain' and saw the policy for the sled and its disks change to 'expunged'. We also regenerated a blueprint and saw external DNS change generation numbers. I'm going to do a run on a4x2 later tonight with @sunshowers code to see if zones get expunged. I'm going to start by adding a sled then removing one. I'll take more detailed notes on that run. |
Overview
After adding
|
omicron_physical_disks::deploy_disks( | |
&opctx, | |
&sleds_by_id, | |
&blueprint.blueprint_disks, | |
) | |
.await?; | |
omicron_zones::deploy_zones( | |
&opctx, | |
&sleds_by_id, | |
&blueprint.blueprint_zones, | |
) | |
.await?; |
Thanks for collecting all of this! Just a couple of quick drive-by thoughts from a first read:
Maybe not - do all the later collections report errors? The collection pruner always keeps around the most recent collection with 0 errors, so if all the new collections have errors, you'll see one old one stick around until a new collection with no errors arrives.
"What should the executor do on failure" is a fair question and not something we're handling robustly today at all, but I think I'd prioritize "why is the executor trying to talk to an expunged sled at all"; by definition, that is going to fail, right? |
notes from test 2024-04-23
|
Nice!
Presumably #5203?
Some discussion in #5296. |
I believe so yeah. Just wanted to make a note of it. |
Running
|
It looks like these errors are coming from For omdb, this is easy to work around. I'm not clear on whether there's some risk that we hit this inside Nexus during the expungement process. If so, that'll be harder to work around. It should generally work to retry, provided the APIs are idempotent. |
Retry didn't work, but I gave it an explicit nexus URL and voila! |
Initial Notes about testing on testbed and madrid live here:
The text was updated successfully, but these errors were encountered: