Skip to content

Conversation

gjcolombo
Copy link
Contributor

When sled agent receives a request to stop a VMM that's not in the agent's VMM table, return NoSuchVmm instead of succeeding. This allows users manually to recover an instance that was Running prior to a sled reboot but hasn't yet been moved to Failed by the instance watcher.

Tested manually as follows:

  1. Modify sled agent's VMM worker loop so that it doesn't publish VMM state before exiting; this is needed so that manually unregistering an instance from a sled doesn't cause it to go to Stopped
  2. Launch a dev cluster with both (1) and the change in this PR.
  3. Start an instance, then send an HTTP DELETE to sled agent's internal API to forcibly unregister the VMM.
  4. Observe that the instance remains Running in the console.
  5. Stop the instance; observe that the "not found, going to Failed" message is displayed and that the instance then goes to Failed.

Fixes #4511.

@gjcolombo gjcolombo requested a review from hawkw September 27, 2024 00:18
Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i said i was going to review this tomorrow, but i didn't realize it was just a two-line change 😅

@gjcolombo gjcolombo merged commit 69da5d6 into main Sep 27, 2024
15 of 16 checks passed
@gjcolombo gjcolombo deleted the gjcolombo/4511 branch September 27, 2024 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stop request for an instance with an abandoned VMM succeeds, returns state of "Running," and never stops the instance
2 participants