Skip to content

after failing to start a Propolis VM sled agent should collect a zone bundle before removing the zone #6459

@gjcolombo

Description

@gjcolombo

Seen in #6453. The error path from InstanceRunner::propolis_ensure doesn't go down the same termination path as other methods of disposing of a Propolis zone (i.e. the InstanceRunner::terminate function):

// If this instance started from scratch, and startup failed, move
// the instance to the Failed state instead of leaking the Starting
// state.
//
// Once again, migration targets don't do this, because a failure to
// start a migration target simply leaves the VM running untouched
// on the source.
if migration_params.is_none() && setup_result.is_err() {
error!(&self.log, "vmm setup failed: {:?}", setup_result);
// This case is morally equivalent to starting Propolis and then
// rudely terminating it before asking it to do anything. Update
// the VMM and instance states accordingly.
let mark_failed = false;
self.state.terminate_rudely(mark_failed);
}

This means that sled agent won't collect zone bundles from Propolis zones that are torn down due to an instance start failure, which makes it hard to see what's going on. This cleanup path should be reconciled with the other Propolis zone cleanup paths.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Sled AgentRelated to the Per-Sled Configuration and Management

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions