Skip to content

blueprint-execution: Break dependency between PUT /omicron-zones success and subsequent cleanup steps #7527

@jgallagher

Description

@jgallagher

blueprint-execution has two cleanup steps that depend on the success of the earlier PUT /omicron-zones step:

  • zone cleanup
  • saga reassignment
  • failed support bundle cleanup

All of these steps assume that the zones they are cleaning up after are no longer running, but that's only true today because PUT /omicron-zones is synchronous (i.e., sled-agent only returns success if it has already stopped any zones that shouldn't be running) and because execution is stopped if the PUT /omicron-zones step fails. We definitely want to change the second of those (this is #6999), and longer term we probably want to change the first one too (converting sled-agent into more of a "accept and return the new config, then make it real via a reconciler loop in the background).

#7524 is a small PR that makes these dependencies explicit in code. We should break this dependency somehow:

  • The planner could confirm that a zone is gone and indicate it's ready for cleanup via some property in the blueprint (similar to the treatment disks got in Expunge and Decommission disks in planner #7286)
  • Is it possible for the executor to know this on its own? (I don't think so but maybe?)

This is a blocker for fixing #6999.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions