Skip to content

propolis-server: migration needs to learn about spec versions #1136

@iximeow

Description

@iximeow

I thought I had gotten these notes down, but I can't find them and it's not on the migration board! there are some relevant bits of lib/migrate to look at in the context of the HTTP API versioning work that has since landed:

pub(crate) struct Preamble {
pub instance_spec: v1::instance_spec::VersionedInstanceSpec,
pub blobs: Vec<Vec<u8>>,
}
impl Preamble {
pub fn new(
instance_spec: v1::instance_spec::VersionedInstanceSpec,
) -> Preamble {
Preamble { instance_spec, blobs: Vec::new() }
}
/// Consume the spec in this Preamble and produce an instance spec suitable
/// for initializing the target VM.
///
/// This routine enumerates the disks and NICs in the `target_spec` and
/// looks for disks with a Crucible backend and NICs with a viona backend.
/// Any such backends will replace the corresponding backend entries in the
/// source spec. If the target spec contains a replacement backend that is
/// not present in the source spec, this routine fails.
pub fn amend_spec(
self,
replacements: &BTreeMap<
v1::instance_spec::SpecKey,
ReplacementComponent,
>,
) -> Result<Spec, MigrateError> {
fn wrong_type_error(
id: &v1::instance_spec::SpecKey,
kind: &str,
) -> MigrateError {
let msg =
format!("component {id} is not a {kind} in the source spec");
MigrateError::InstanceSpecsIncompatible(msg)
}
let v1::instance_spec::VersionedInstanceSpec::V0(mut source_spec) =
self.instance_spec;

and

async fn sync(&mut self) -> Result<(), MigrateError> {
self.update_state(MigrationState::Sync);
let preamble =
Preamble::new(v1::instance_spec::VersionedInstanceSpec::V0(
self.vm.lock_shared().await.instance_spec().clone().into(),
));

these together mean that on the recipient side, propolis-server only knows how to accept a v1::VersionedInstanceSpec::V0 and that spec must be whatever you can get from converting the active VM's Spec down to an v1::InstanceSpec. in turn, this means that instance specs must be convertible to a v1::InstanceSpec. this all should be alarming, and maybe make your skin crawl.

ideally we can handle migration with an overall strategy like:

  • try converting the current VM's Spec to each API type, with conversions as TryFrom, starting from the latest version.
  • one of these should work; the spec came from the API at some point[1], so we should be able to convert back to one of those types.
  • send a migration Prelude with that converted spec and a version indicator, something like VersionedInstanceSpec was intended to handle.
  • ... and then handle device payloads and versioning thereof.

converting from a current Spec down to v1::InstanceSpec is lossy! we lose information in migrating out!

it is, and this is where the lossy conversion happens.

  • we should eliminate impl From<Spec> for v1::instance_spec::InstanceSpec.

in general we can't impl From<Spec> for [HTTP API Spec], because the API type is not guaranteed to be backwards compatible - it's possible we decide to remove a field from a device's configuration, for example.

migration does need an alternative to propolis_api_types::*::instance_spec::VersionedInstanceSpec

the big wrinkle with this being in propolis_api_types is that we want this type to be a sum of all versions' InstanceSpec. but that's problematic when a new API version would change the enum, and make different API versions' VersionedInstanceSpec incompatible with each other. as the comment there mentions, there is an endpoint to get the current instance's spec, which speaks in terms of this somewhat-doomed type:

https://github.com/oxidecomputer/propolis/blob/master/crates/propolis-api-types-versions/src/initial/instance_spec.rs#L151-L177

I believe this is only used for debugging at this point, but being able to get the current instance's spec is definitely valuable. we may want to keep something like VersionedInstanceSpec around for this API, and if so we need to:

  • note more explicitly that we'd expect a new VersionedInstanceSpec that is the sum of previous InstanceSpecs, any time we add a new latest InstanceSpec.

in either case we need a non-HTTP-API-shaped means to negotiate instance specs:

  • propolis-server migration needs an InstanceSpec container with version information that is forwards-compatible; the destination accepting new variants must not preclude accepting old shared variants.

I don't have great ideas for how we test this, other than that we should round-trip API specs through Spec and back and verify they are lossless. we should also be able to convert up and down (where lossless) and not lose instance spec information. we might want a selection of "old" instance specs that describe instance we expect to migrate in, and verify those do migrate in; I'm thinking a selection of externally-serialized data because this is all getting outside the OpenAPI versioning schema. so producing specs from the latest propolis-server opens the risk of a backwards-incompatible change happening which causes us to diverge from a real older instance spec.

Metadata

Metadata

Assignees

No one assigned

    Labels

    migrationIssues related to live migration.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions