blueprint-execution: Break dependency between `PUT /omicron-zones` success and subsequent cleanup steps

blueprint-execution has two cleanup steps that depend on the success of the earlier `PUT /omicron-zones` step:

* zone cleanup
* saga reassignment
* failed support bundle cleanup

All of these steps assume that the zones they are cleaning up after are no longer running, but that's only true today because `PUT /omicron-zones` is synchronous (i.e., sled-agent only returns success if it has already stopped any zones that shouldn't be running) and because execution is stopped if the `PUT /omicron-zones` step fails. We definitely want to change the second of those (this is #6999), and longer term we probably want to change the first one too (converting sled-agent into more of a "accept and return the new config, then make it real via a reconciler loop in the background).

#7524 is a small PR that makes these dependencies explicit in code. We should break this dependency somehow:

* The planner could confirm that a zone is gone and indicate it's ready for cleanup via some property in the blueprint (similar to the treatment disks got in #7286)
* Is it possible for the executor to know this on its own? (I don't think so but maybe?)

This is a blocker for fixing #6999.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

blueprint-execution: Break dependency between `PUT /omicron-zones` success and subsequent cleanup steps #7527

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

blueprint-execution: Break dependency between PUT /omicron-zones success and subsequent cleanup steps #7527

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

blueprint-execution: Break dependency between `PUT /omicron-zones` success and subsequent cleanup steps #7527