Skip to content

Failed instances should be allowed to stop and restart #2825

@gjcolombo

Description

@gjcolombo

Currently Nexus accepts no attempts to change the state of a Failed instance:

fn check_runtime_change_allowed(
&self,
runtime: &nexus::InstanceRuntimeState,
) -> Result<(), Error> {
// Users are allowed to request a start or stop even if the instance is
// already in the desired state (or moving to it), and we will issue a
// request to the SA to make the state change in these cases in case the
// runtime state we saw here was stale. However, users are not allowed
// to change the state of an instance that's migrating, failed or
// destroyed.
let allowed = match runtime.run_state {
InstanceState::Creating => true,
InstanceState::Starting => true,
InstanceState::Running => true,
InstanceState::Stopping => true,
InstanceState::Stopped => true,
InstanceState::Rebooting => true,
InstanceState::Migrating => false,
InstanceState::Repairing => false,
InstanceState::Failed => false,
InstanceState::Destroyed => false,
};

There are plenty of reasons an instance could move to the Failed state (e.g. a failure to start the VM in Propolis, a heartbeat failure like those discussed in #2727, etc.). A VM user needs to be able to stop and attempt to restart a failed instance.

(Note that, on the Propolis end, once an instance has failed, it can't be restarted--the Propolis zone needs to be destroyed and recreated.)

Metadata

Metadata

Assignees

Labels

known issueTo include in customer documentation and trainingnexusRelated to nexus

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions