-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make sure old VMs are deleted if UIDs don't match #218
Conversation
EDIT: this has been addressed. @rmohr There appears to be a problem with the code in manager.go. It appears that due to the changes that were incorporated for getActualOrDeleteOutdatedDomain, that dom.GetState() crashes with a nil pointer reference on line 380. I'm posting the test in case you get a chance to look at it before I do. |
a305fc9
to
40e86e1
Compare
pkg/virt-handler/virtwrap/manager.go
Outdated
if string(vm.GetObjectMeta().GetUID()) != uid { | ||
vm := v1.NewVMReferenceFromName(vm.ObjectMeta.Name) | ||
vm.GetObjectMeta().SetUID(types.UID(uid)) | ||
if err = l.killDomain(vm, dom); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If nothing is being logged here, this if statement isn't really necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe such a defer:
defer func (dom) {
if dom != nil {
dom.Free()
}
}(dom)
would be a solution to make the if unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there was an issue here where dom was technically an interface, so comparing against nil didn't do something expected.
pkg/virt-handler/virtwrap/manager.go
Outdated
logging.DefaultLogger().Object(vm).Error().Reason(err).Msg("Getting the domain failed.") | ||
return nil, false, err | ||
} | ||
if exists { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could reduce indentation by returning early here if you wanted.
if !exists {
return nil, false, nil
}
tests/virt_handler_test.go
Outdated
|
||
uuid := obj.(*v1.VM).ObjectMeta.UID | ||
|
||
deleteVirtHandler(coreClient, namespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expect(deleteVirtHandler(...)).To(Succeed())
tests/virt_handler_test.go
Outdated
|
||
// Ensure the event log shows that a VM restart took place | ||
select { | ||
case <-time.After(120 * time.Second): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather try to give the whole test an overall timeout ...
tests/virt_handler_test.go
Outdated
}, 60*time.Second).Should(Equal(false), "Timed out waiting for virt-handler to stop") | ||
|
||
// Double check that virt-handler daemonset is not present | ||
obj, err := betaRestClient.Get().Resource("daemonsets").Namespace(namespace).Name("virt-handler").Do().Get() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you move that check into the isVirtHandlerRunning function?
pkg/virt-handler/virtwrap/manager.go
Outdated
if string(vm.GetObjectMeta().GetUID()) != uid { | ||
vm := v1.NewVMReferenceFromName(vm.ObjectMeta.Name) | ||
vm.GetObjectMeta().SetUID(types.UID(uid)) | ||
if err = l.killDomain(vm, dom); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe such a defer:
defer func (dom) {
if dom != nil {
dom.Free()
}
}(dom)
would be a solution to make the if unnecessary.
If virt-handler is under load, it can miss a delete and instead see a replaced VM with the same name. Check for the UIDs to detect such mismatches and delete the old VM before starting the new one.
Signed-off-by: Stu Gott <sgott@redhat.com>
Signed-off-by: Stu Gott <sgott@redhat.com>
Signed-off-by: Stu Gott <sgott@redhat.com>
retest this please |
ci is failing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to detect this situation is weird. We shouldn't have to care about this.
We should put something that uniquely defines the VM domain in the actual domain name passed to libvirt to prevent us from having to think about this condition. I don't know what the vm "GetUID()" function's results look like, but if that's something we can include in the domain name, that seems valuable.
if string(vm.GetObjectMeta().GetUID()) != uid { | ||
vm := v1.NewVMReferenceFromName(vm.ObjectMeta.Name) | ||
vm.GetObjectMeta().SetUID(types.UID(uid)) | ||
if err = l.killDomain(vm, dom); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're about to have the ability for vms with identical names to live in different namespaces. This function just looks up domains by name. We'll want to make sure to update this when the namespace change occurs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to just completely ignore the k8s object name when setting the guest name. This avoids problems with clashes in set of valid characters for names between libvirt & kubevirt, or length limits, or other similar things which could trip us up. Use some kubevirt controlled naming convention, so we can reliably identify kubevirt controlled guests. A simple option is just 'kubevirt-" prefix + the result of GetUID() again. Ideally, we would use UID for all object lookups too, since UUID is the primary unique identifier libvirt recommends apps work with, as it has the strongest uniqueness guarantees.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
We didn't do it so far, because by default the controllers use namespaces and object names to create keys for lookups. So if we don't do it this way, we could miss the deletion of a VM and only see the updated object.
I wanted to avoid copying client-go code to use another identify function, but by going over it again, it might not be that much copying. I will replace the DeletionHandlingMetaNamespaceKeyFunc
in the Shared informer for virt-handler in the two controllers there with a DeletionHandlingUUIDKeyFunc
if possible. Then we can track deletions based on uuids.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I would have to do a complete copy of the cache.sharedIndexerInformer. I can't replace the DeletionHandlingMetaNamespaceKeyFunc
otherwise ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will see if we can do something upstream in client-go, to not have to copy all that ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@berrange, like mentioned further down below in the PR, I don't think we have a much better option than what we do here right now. Regarding valid character clashes, I looked up the allowed chars in Kubernetes:
By convention, the names of Kubernetes resources should be up to maximum length of 253 characters and consist of lower case alphanumeric characters, -, and ., but certain resources have more specific restrictions. [1]
I think there should be no issues, since this is a subset of what libvirt supports. What do you think?
Anything that is missing to move this PR forward? |
@fabiand the patch as it is now means we're inheriting some technical debt. The "correct" fix is for us to be able to make use of informers in a way where we don't miss deletions. As long as we've investigated that there's not a low friction way for us to resolve the issue using informer logic right now, my opinion is that we should accept the debt and retroactively detect the UUID mismatch like the proposed patch is doing. |
Follow up on kubernetes/client-go#229: It turns out, that it would not really help us, to set the KeyFunc. |
This issue cannot occur anymore. Maybe it was possible in the past, but there are safeguards in place now that prevent it.
By virt-controller blocking the scheduling of the VM until the previous virt-launcher pod termintates, it's not possible for virt-handler to know about the newly created VM's existence until after cleanup of the previous VM instance occurs. I'm closing this for now. Feel free to reopen it if someone wants to discuss it further. |
[WIP] add unit test for pkg/controller util_test.go and import_controller_test.go
Add ROADMAP document
fix the typo of macvlan and also modify documents to meet the current plugins.
Signed-off-by: Quique Llorente <ellorent@redhat.com>
WIP because: Not yet sure how to write a functional test which covers that.
If virt-handler is under load, it can miss a delete and instead see a
replaced VM with the same name. Check for the UIDs to detect such
mismatches and delete the old VM before starting the new one.
Fixes #193