Skip to content

Conversation

@gjcolombo
Copy link
Contributor

Fixes #750.

Tests: new cargo tests; dropped a package image into an Omicron dev cluster and verified that instances start correctly with it.

StorageDeviceV0::NvmeDisk(d) => d.backend_name.clone(),
};

assert_eq!(device_to_backend, parsed.backend_name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems like instead of having ParsedStorageDevice.backend_name and requiring it always matches what happens to also be in the device spec, that we'd be better off with a StorageDeviceV0::backend_name() helper that we use in places like add_storage_device(). something like .insert(device_spec.backend_name(), backend_spec)?

do we expect to have storage devices that don't know their backend name? (that seems unlikely, but it's an obvious reason my thought wouldn't stick!)

(i also understand this change and why it fixes things, so +1 for landing to make things fixed and adjusting after if you're so inclined)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately my plan is to do away with ParsedStorageDevice entirely and replace it with a Disk type that contains both the device and backend halves of each of a VM's storage components. In this bug I got caught looking the wrong way halfway through the series of refactoring PRs that's supposed to make that happen ;)

I have the result of this wave of refactoring in a branch in my fork; here's what the routine this PR fixes will look like when that's done. This will (hopefully) foreclose on the sort of bug we hit here, since machine init won't be looking up named backends in a map anymore. (I think the code can be improved even more beyond what I have in the fork at this point, but am trying to stick to one renovation project at a time--insert the beaver-big-plans meme here...)

There's even more context in #735 if you're looking for some light bedtime reading.

@gjcolombo gjcolombo merged commit 31feeca into master Aug 28, 2024
@gjcolombo gjcolombo deleted the gjcolombo/disk-backend-names branch August 28, 2024 02:10
gjcolombo added a commit that referenced this pull request Aug 28, 2024
PHD uses the propolis-server instance_spec_ensure endpoint when creating
VMs. This maximizes the framework's control over what virtual devices and
backends get created and its knowledge of how devices should manifest to
guests.

When sled agent creates a VM, it uses the instance_ensure endpoint, which
takes an InstanceEnsureRequest that specifies components at a slightly
higher level of abstraction than instance_spec_ensure. The server handles
calls to the former endpoint by shimming the InstanceEnsureRequest into an
instance spec and then pretending the caller called instance_spec_ensure.
The shim is relatively straightforward, but because PHD doesn't use it (and
as #750 shows) it can be a great place for bugs to hide...

Although the medium-term plan is to try to switch sled agent over to using
the spec endpoint (see RFD 505), for now add an affordance to PHD to allow
it to use the instance_ensure endpoint if a test requests it.

Tested by running the new test against master before and after #751 merged
and verifying that the test fails without that change and passes with it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

parse_disk_from_request mishandles backend names

4 participants