over-eager to destroy VM when device start() fails

device start is a fallible operation! we try to start devices in `propolis-server`:

https://github.com/oxidecomputer/propolis/blob/fcf37aefdc1d3bcce3cbf7e2b984e05ccfe6b553/bin/propolis-server/src/lib/vm/state_driver.rs#L596-L609

if a device fails to start (or a block backend, below, fails to start), we'll then return `VmStartOutcome::Failed` and eventually eventually get to `HandleEventOutcome::Exit`. then we [`set_rundown()`](https://github.com/oxidecomputer/propolis/blob/fcf37aefdc1d3bcce3cbf7e2b984e05ccfe6b553/bin/propolis-server/src/lib/vm/state_driver.rs#L435-L438) and drop the `StateDriver`, eventually getting through

https://github.com/oxidecomputer/propolis/blob/fcf37aefdc1d3bcce3cbf7e2b984e05ccfe6b553/bin/propolis-server/src/lib/vm/objects.rs#L420-L433

at this point we'll have dropped the Machine, uninstalled the guest's memory (and MSI-X handle!) from devices, and dropped everything. some devices will have been started, some will not yet have started. devices that have started will have some parts dropped by Propolis, but spawned threads and spawned tasks may not be cancelled, stopped, joined on, etc.

#1110 is, in part, because the vsock poller expects that once it has memory, memory only goes away _after_ the device is paused and the poller thread is told to exit. we probably should expect that Propolis embedders follow the [valid state transitions described by Indicator](https://github.com/oxidecomputer/propolis/blob/fcf37aefdc1d3bcce3cbf7e2b984e05ccfe6b553/lib/propolis/src/lifecycle.rs#L172-L181), and that if a device is started it must be paused before dropping the Machine, that a device is halted when dropped, etc.

	// Send synchronous start commands to all devices.
	for (name, dev) in objects.device_map() {
	info!(self.log, "sending start request to {}", name);
	let res = dev.start();
	if let Err(e) = res {
	error!(
	self.log, "device start() returned an error";
	"device" => %name,
	"error" => %e
	);

	return VmStartOutcome::Failed;
	}
	}

	impl Drop for VmObjects {
	fn drop(&mut self) {
	// Signal to these objects' owning VM that rundown has completed and a
	// new VM can be created.
	//
	// It is always safe to complete rundown at this point because the state
	// driver ensures that if it creates VM objects, then it will not drop
	// them without first moving the VM to the Rundown state.
	let parent = self.parent.clone();
	tokio::spawn(async move {
	parent.complete_rundown().await;
	});
	}
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

over-eager to destroy VM when device start() fails #1115

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

over-eager to destroy VM when device start() fails #1115

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions