Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure old VMs are deleted if UIDs don't match #218

Closed
wants to merge 4 commits into from

Conversation

rmohr
Copy link
Member

@rmohr rmohr commented May 17, 2017

WIP because: Not yet sure how to write a functional test which covers that.

If virt-handler is under load, it can miss a delete and instead see a
replaced VM with the same name. Check for the UIDs to detect such
mismatches and delete the old VM before starting the new one.

Fixes #193

@stu-gott
Copy link
Member

stu-gott commented May 26, 2017

EDIT: this has been addressed.

@rmohr There appears to be a problem with the code in manager.go. It appears that due to the changes that were incorporated for getActualOrDeleteOutdatedDomain, that dom.GetState() crashes with a nil pointer reference on line 380.

I'm posting the test in case you get a chance to look at it before I do.

@stu-gott stu-gott force-pushed the fix-delete branch 3 times, most recently from a305fc9 to 40e86e1 Compare June 7, 2017 21:38
@admiyo admiyo changed the title [DO NOT MERGE] Make sure old VMs are deleted if UIDs don't match Make sure old VMs are deleted if UIDs don't match Jun 12, 2017
if string(vm.GetObjectMeta().GetUID()) != uid {
vm := v1.NewVMReferenceFromName(vm.ObjectMeta.Name)
vm.GetObjectMeta().SetUID(types.UID(uid))
if err = l.killDomain(vm, dom); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If nothing is being logged here, this if statement isn't really necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe such a defer:

defer func (dom) {
 if dom != nil {
    dom.Free()
  }
}(dom)

would be a solution to make the if unnecessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there was an issue here where dom was technically an interface, so comparing against nil didn't do something expected.

logging.DefaultLogger().Object(vm).Error().Reason(err).Msg("Getting the domain failed.")
return nil, false, err
}
if exists {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could reduce indentation by returning early here if you wanted.

if !exists {
    return nil, false, nil
}


uuid := obj.(*v1.VM).ObjectMeta.UID

deleteVirtHandler(coreClient, namespace)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expect(deleteVirtHandler(...)).To(Succeed())


// Ensure the event log shows that a VM restart took place
select {
case <-time.After(120 * time.Second):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather try to give the whole test an overall timeout ...

}, 60*time.Second).Should(Equal(false), "Timed out waiting for virt-handler to stop")

// Double check that virt-handler daemonset is not present
obj, err := betaRestClient.Get().Resource("daemonsets").Namespace(namespace).Name("virt-handler").Do().Get()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you move that check into the isVirtHandlerRunning function?

if string(vm.GetObjectMeta().GetUID()) != uid {
vm := v1.NewVMReferenceFromName(vm.ObjectMeta.Name)
vm.GetObjectMeta().SetUID(types.UID(uid))
if err = l.killDomain(vm, dom); err != nil {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe such a defer:

defer func (dom) {
 if dom != nil {
    dom.Free()
  }
}(dom)

would be a solution to make the if unnecessary.

rmohr and others added 4 commits June 27, 2017 12:58
If virt-handler is under load, it can miss a delete and instead see a
replaced VM with the same name. Check for the UIDs to detect such
mismatches and delete the old VM before starting the new one.
Signed-off-by: Stu Gott <sgott@redhat.com>
Signed-off-by: Stu Gott <sgott@redhat.com>
Signed-off-by: Stu Gott <sgott@redhat.com>
@davidvossel
Copy link
Member

retest this please

@davidvossel
Copy link
Member

ci is failing

Copy link
Member

@davidvossel davidvossel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to detect this situation is weird. We shouldn't have to care about this.

We should put something that uniquely defines the VM domain in the actual domain name passed to libvirt to prevent us from having to think about this condition. I don't know what the vm "GetUID()" function's results look like, but if that's something we can include in the domain name, that seems valuable.

if string(vm.GetObjectMeta().GetUID()) != uid {
vm := v1.NewVMReferenceFromName(vm.ObjectMeta.Name)
vm.GetObjectMeta().SetUID(types.UID(uid))
if err = l.killDomain(vm, dom); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're about to have the ability for vms with identical names to live in different namespaces. This function just looks up domains by name. We'll want to make sure to update this when the namespace change occurs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to just completely ignore the k8s object name when setting the guest name. This avoids problems with clashes in set of valid characters for names between libvirt & kubevirt, or length limits, or other similar things which could trip us up. Use some kubevirt controlled naming convention, so we can reliably identify kubevirt controlled guests. A simple option is just 'kubevirt-" prefix + the result of GetUID() again. Ideally, we would use UID for all object lookups too, since UUID is the primary unique identifier libvirt recommends apps work with, as it has the strongest uniqueness guarantees.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

We didn't do it so far, because by default the controllers use namespaces and object names to create keys for lookups. So if we don't do it this way, we could miss the deletion of a VM and only see the updated object.

I wanted to avoid copying client-go code to use another identify function, but by going over it again, it might not be that much copying. I will replace the DeletionHandlingMetaNamespaceKeyFunc in the Shared informer for virt-handler in the two controllers there with a DeletionHandlingUUIDKeyFunc if possible. Then we can track deletions based on uuids.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I would have to do a complete copy of the cache.sharedIndexerInformer. I can't replace the DeletionHandlingMetaNamespaceKeyFunc otherwise ...

Copy link
Member Author

@rmohr rmohr Jun 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will see if we can do something upstream in client-go, to not have to copy all that ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@berrange, like mentioned further down below in the PR, I don't think we have a much better option than what we do here right now. Regarding valid character clashes, I looked up the allowed chars in Kubernetes:

By convention, the names of Kubernetes resources should be up to maximum length of 253 characters and consist of lower case alphanumeric characters, -, and ., but certain resources have more specific restrictions. [1]

I think there should be no issues, since this is a subset of what libvirt supports. What do you think?

@fabiand
Copy link
Member

fabiand commented Jul 10, 2017

Anything that is missing to move this PR forward?

@davidvossel
Copy link
Member

@fabiand the patch as it is now means we're inheriting some technical debt. The "correct" fix is for us to be able to make use of informers in a way where we don't miss deletions. As long as we've investigated that there's not a low friction way for us to resolve the issue using informer logic right now, my opinion is that we should accept the debt and retroactively detect the UUID mismatch like the proposed patch is doing.

@rmohr
Copy link
Member Author

rmohr commented Jul 11, 2017

Follow up on kubernetes/client-go#229: It turns out, that it would not really help us, to set the KeyFunc.
While we would be able to "optimize" the extra uid checks away, and not miss the delete, we would still have to take care of the situation, where we would be informed about the new VM with the new UID while the delete might still not be processed.
We could handle that, but I have the suspicion, that it makes things even worse.

@davidvossel
Copy link
Member

This issue cannot occur anymore. Maybe it was possible in the past, but there are safeguards in place now that prevent it.

  • If an active VM is deleted and then immediately created again, the newly created version of the VM will move to the "Pending" phase.

  • If a previous instance of this VM is still active somewhere in the cluster, then that instances's virt-launcher pod will still be active while the new VM is in "Pending".

  • Virt-controller will not attempt to schedule the new VM while an active virt-launcher pod already exists for the VM. Look for this log message logger.Error("VM Pods do already exist, will clean up before continuing.") All existing virt-launcher pods must terminate before the VM can move from "pending" to "scheduling"

By virt-controller blocking the scheduling of the VM until the previous virt-launcher pod termintates, it's not possible for virt-handler to know about the newly created VM's existence until after cleanup of the previous VM instance occurs.

I'm closing this for now. Feel free to reopen it if someone wants to discuss it further.

kubevirt-bot pushed a commit to kubevirt-bot/kubevirt that referenced this pull request Nov 6, 2020
[WIP] add unit test for pkg/controller util_test.go and import_controller_test.go
mzzgaopeng pushed a commit to mzzgaopeng/kubevirt that referenced this pull request Mar 8, 2021
mzzgaopeng pushed a commit to mzzgaopeng/kubevirt that referenced this pull request Mar 8, 2021
fix the typo of macvlan and also modify documents to meet the current plugins.
kubevirt-bot pushed a commit to kubevirt-bot/kubevirt that referenced this pull request Dec 7, 2021
Signed-off-by: Quique Llorente <ellorent@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants