Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add garbage collection for expunged sleds and zones #5552

Open
sunshowers opened this issue Apr 17, 2024 · 1 comment
Open

Add garbage collection for expunged sleds and zones #5552

sunshowers opened this issue Apr 17, 2024 · 1 comment
Labels
Update System Replacing old bits with newer, cooler bits
Milestone

Comments

@sunshowers
Copy link
Contributor

sunshowers commented Apr 17, 2024

Specifically, the blueprint planner needs to:

  • if a zone is expunged, and has no resources attached to it according to the planning input, then remove it from the blueprint
  • if a sled is expunged, and has no zones attached to it, then mark it decommissioned

This should likely be done both at construction time against the parent blueprint, and incrementally.

Resources include:

  • external IPs
  • vNICs
  • for Nexus instances, sagas
  • ???

We've deferred this for release 8 but would like to do this for release 9.

@sunshowers sunshowers added the Update System Replacing old bits with newer, cooler bits label Apr 17, 2024
@sunshowers sunshowers added this to the 9 milestone Apr 17, 2024
@sunshowers
Copy link
Contributor Author

sunshowers commented Apr 17, 2024

Some half-finished code which needs to be updated to take #5541 etc into account:

Click to expand
/// Builds a map of resources that either the planning input or the parent
/// blueprint says are in-use.
fn build_external_in_use_map<T: Clone + Hash + Eq>(
    parent_blueprint: &Blueprint,
    input_map: &BTreeMap<OmicronZoneUuid, T>,
    accessor: impl Fn(&OmicronZoneConfig) -> Option<T>,
) -> Result<HashMap<T, ResourceInUseDetails>, Error> {
    // The basic table is, for each resource:
    //
    // ```
    //    in blueprint?     in input_map?    result
    //  yes, reachable          yes          in use
    //  yes, non-reachable      yes          in use      [1]
    //         no               yes        in use, warn  [2]
    //  yes, reachable           no          in use      [3]
    //  yes, non-reachable       no         not in use
    //         no                no         not in use   [4]
    // ```
    //
    // "in blueprint?" means whether that resource is allocated to a zone
    // that's described in the blueprint (desired state). "reachable" is
    // defined by the `ShouldBeExternallyReachable` filter.
    //
    // "in input_map?" means whether that resource is allocated to a zone
    // that's described in the planning input (actual state).
    //
    // [1]: This is a zone that the executor will aim to remove external
    //      resources for. However, it is *currently* in use, and we can't just
    //      pave over and reallocate its resources.
    // [2]: This should never happen. The blueprint is meant to be a complete
    //      description of the state. This is a bug -- warn about it (and maybe
    //      consider erroring out?)
    // [3]: This is a zone that the executor will aim to add external resources
    //      for. We consider it to be in use.
    // [4]: This resource will never be considered in the below code.
    //
    // There are also some error cases:
    //
    // 1. Two zones in the input map are using the same resource.
    // 2. Two zones in the blueprint are using the same resource.
    // 3. For the same zone, the blueprint and the input map disagree on what
    //    resource is allocated to them.
    //
    // Case 3 is potentially migration of a zone to a different resource --
    // this is not supported at the moment.
    //
    // So, how do we go about actually implementing this table? We'll perform
    // two loops: one over the input map and one over the parent blueprint.

    let mut in_use: HashMap<T, ResourceInUseDetails> = HashMap::new();

    for (zone_id, resource) in input_map {
        let details = in_use.entry(resource.clone()).or_default();
        if details.in_input_map {
            // Error case 1: two zones in the input map are using the same
            // resource.
            return Err(Error::Planner(anyhow!(
                "in input map, zone {} and zone {} are using the same resource: {:?}",
                zone_id,
                zone_id,
                resource,
            )));
        }
        in_use.insert(
            resource.clone(),
            ResourceInUseDetails {
                in_input_map: true,
                allocated_to_zone: None,
            },
        );
    }

    Ok(in_use)
}

/// Describes why a resource is currently marked in use.
///
/// The `Default` is only for construction -- the in-use map should always have
/// non-default values.
#[derive(Debug, Default)]
struct ResourceInUseDetails {
    in_input_map: bool,
    allocated_to_zone: OmicronZoneUuid,
}

And some test code:

Click to expand
        // Create a new blueprint that starts from bp2. Ensure that the new
        // builder's idea of which resources are used has been updated to
        // reflect that the corresponding zones are now expunged.

        let planner3 = Planner::new_based_on(
            logctx.log.clone(),
            &blueprint2,
            &input,
            "test_blueprint3",
            &collection,
        )
        .expect("planner created")
        .with_rng_seed((TEST_NAME, "bp3"));

        // Ensure that the expunged and decommissioned sled's allocated
        // resources are no longer in use.
        let nexus_v4_in_use = planner3.blueprint.nexus_v4_ips.in_use();
        let nexus_v6_in_use = planner3.blueprint.nexus_v6_ips.in_use();
        let external_ips_in_use =
            planner3.blueprint.available_external_ips.in_use();
        let system_macs_in_use =
            planner3.blueprint.available_system_macs.in_use();

        for (sled_id, zone) in
            blueprint2.all_omicron_zones_typed(BlueprintZoneFilter::All)
        {
            // TODO: also verify quiesced zones here.

            let desc = if sled_id == expunged_sled_id {
                "expunged sled"
            } else if sled_id == decommissioned_sled_id {
                "decommissioned sled"
            } else {
                continue;
            };

            // For the decommissioned sled, ensure that IPs are no longer in
            // use.
            if let OmicronZoneType::Nexus { nic, .. } = &zone.zone_type {
                match nic.ip {
                    IpAddr::V4(ip) => {
                        assert!(
                            !nexus_v4_in_use.contains(&ip),
                            "for {desc}, Nexus V4 IP {ip} should not be in use"
                        );
                    }
                    IpAddr::V6(ip) => {
                        assert!(
                            !nexus_v6_in_use.contains(&ip),
                            "for {desc}, Nexus V6 IP {ip} should not be in use"
                        );
                    }
                }
            }

            if let Some(external_ip) = zone.zone_type.external_ip() {
                assert!(
                    !external_ips_in_use.contains(&external_ip),
                    "for {desc}, external IP {external_ip} should not be in use"
                );
            }
            if let Some(nic) = zone.zone_type.opte_vnic() {
                assert!(
                    !system_macs_in_use.contains(&nic.mac),
                    "for {desc}, system MAC {} should not be in use",
                    nic.mac
                );
            }
        }

        logctx.cleanup_successful();

jgallagher added a commit that referenced this issue May 13, 2024
This builds on (and is currently pointed at) #5663, and is on the path
to fixing #5625, although there is still more work to do there. (Should
be small, and I'll start on it while this is in review to make sure it
really fixes it.) I don't plan on landing this before R8 is out the
door, but wanted to go ahead and open it up to review.

Much of this is fallout from our discussions about what it means to
decommission a sled. Major changes are:

* `SledFilter::All` has been renamed to `SledFilter::Commissioned`, and
no longer returns sleds that are in `SledState::Decommissioned`.
* The blueprint planner will now update the desired sled state to
`Decommissioned` for sleds which satisfy our conditions. (See
`do_plan_decommission()`.)
* The blueprint planner will carry forward the Omicron zones of
decommissioned sleds to child blueprints. Pruning these is #5552.
* The blueprint planner will _not_ carry forward a desired sled state of
`Decommissioned` once the inputs report that the sled has already been
decommissioned.
* The blueprint executor will decommission sleds that the planner said
to.
* Decommissioning a sled implicitly decommissions all its disks. (This
matches what happens with sled expungement, and should not interfere
with region replacement, which keys off of policy, not state.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Update System Replacing old bits with newer, cooler bits
Projects
None yet
Development

No branches or pull requests

1 participant