Skip to content

Instance Deletion saga looks destructive, even if instance should not be deleted #2842

@smklein

Description

@smklein

Let's suppose we have an instance in a state where it should not be deleted. Specifically, a case where sid_delete_instance_record , calling project_delete_instance, observes that the instance cannot be running.

This should be reproducible in a scenario where the instance is "running" - it's not an ok_to_delete_instance_state:

let ok_to_delete_instance_states = vec![stopped, failed];

If we execute the instance delete saga on this instance, we'll execute the following actions:

fn make_saga_dag(
_params: &Self::Params,
mut builder: steno::DagBuilder,
) -> Result<steno::Dag, super::SagaInitError> {
builder.append(v2p_ensure_undo_action());
builder.append(v2p_ensure_action());
builder.append(delete_asic_configuration_action());
builder.append(instance_delete_record_action());
builder.append(delete_network_interfaces_action());
builder.append(deallocate_external_ip_action());
builder.append(virtual_resources_account_action());
builder.append(sled_resources_account_action());
Ok(builder.build()?)
}

Here's what will happen:

  • The V2P mappings will be deleted
  • The ASIC configuration will be deleted
  • The instance record cannot be deleted, because we aren't in a valid state. The saga will start unwinding from instance_delete_record_action.
  • On the unwind path, the V2P mappings will try to be re-created via sid_v2p_ensure_undo

This is problematic for a couple reasons:

  1. The (temporary) destruction of the V2P mappings are an observable side-effect of the failed instance deletion
  2. As far as I can tell, the sid_delete_network_config function will delete all NAT mappings, and this destructive action will not be "undone"

This seems like it'll degrade the network functionality of the instance, even though it remains running.

It seems like we should delete the instance record first, before proceeding with the de-allocation of resources. This will validate the state of the instance before we actually perform destructive operations.

FYI @jmpesp , @internet-diglett .

Metadata

Metadata

Assignees

Labels

bugSomething that isn't working.networkingRelated to the networking.nexusRelated to nexus

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions