-
Notifications
You must be signed in to change notification settings - Fork 45
Instance Deletion saga looks destructive, even if instance should not be deleted #2842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yes, this is the case. I'll try reordering the saga nodes and see if it's a trivial fix. |
Looks like there are some record lookups that get broken if we perform It looks like deleting the v2p mappings depends on the instance record being present? Lines 388 to 407 in e4a5dd0
Admittedly I could be misunderstanding this, as I'm still not 100% sure on how some of our queries work. |
Yeah, this is definitely true - for the reason you posted, and for calling For this to work, we'd have to cache the existing V2P entries (probably as the output of a saga node) before deleting the existing record. Problem with that is stale information (and this problem probably exists today) - nothing's locking the instance record so it would be possible to add or remove a network interface, and render that cached information stale. |
Ah, I was potentially going to suggest the generation number here, but that sounds like a bad fit. Network interfaces can be added to stopped instances, right? |
Let's suppose we have an instance in a state where it should not be deleted. Specifically, a case where
sid_delete_instance_record
, callingproject_delete_instance
, observes that the instance cannot be running.This should be reproducible in a scenario where the instance is "running" - it's not an
ok_to_delete_instance_state
:omicron/nexus/db-queries/src/db/datastore/instance.rs
Line 228 in e4a5dd0
If we execute the instance delete saga on this instance, we'll execute the following actions:
omicron/nexus/src/app/sagas/instance_delete.rs
Lines 69 to 82 in e4a5dd0
Here's what will happen:
instance_delete_record_action
.sid_v2p_ensure_undo
This is problematic for a couple reasons:
sid_delete_network_config
function will delete all NAT mappings, and this destructive action will not be "undone"This seems like it'll degrade the network functionality of the instance, even though it remains running.
It seems like we should delete the instance record first, before proceeding with the de-allocation of resources. This will validate the state of the instance before we actually perform destructive operations.
FYI @jmpesp , @internet-diglett .
The text was updated successfully, but these errors were encountered: