Join GitHub today
Decentralized Libvirt #663
This patch moves KubeVirt's design from one that depends on a centralized libvirtd per a node, to one that uses a decentralized libvirtd per a VM pod.
With this new decentralized approach, each VM's qemu process now lives directly within the VM Pod's cgroups and namespaces which means that any storage/network devices in the Pod are available to the VM.
Cloud-init and Registry Disks
Generation and lifecycle management of ephemeral disks have moved from virt-handler to virt-launcher.
This data is now completely self contained (no shared host mounts) within the VM Pod, which means cleanup occurs automatically as a part of the kubelet tearing down the Pod's environment.
Notifications Server and Domain Informer
Previously virt-handler received events about lifecycle changes to a domain through a libvirt event callback.
Now virt-handler receives domain lifecycle events through its notification server. Virt-handler starts a notification server that listens on a unix socket. Each virt-launcher acts as a client to this notification server and forwards domain lifecycle events to it.
The virt-handler domain informer uses this notification server for its Watch function. The informer's List function iterates over every known virt-launcher present on the local host and requests the latest information about all defined domains.
Virt-launcher Command Server
Each virt-launcher starts a command server that listens on a unix socket as part of the virt-launcher process's initialization.
By design, Virt-launcher has no connection to the k8s api server. The command server allows virt-handler to manage the VM's lifecycle by posting VM specs to virt-launcher to start/stop.
This command server is also how the virt-handler's Domain informer perform's its List function. There's a directory of unix sockets belonging to each virt-launcher. The domain informer List function iterates over each of these sockets and creates a cache of all the active domains on the local node.
Migrations have been disabled
The reasoning for this is migrations depend on network access to the libvirtd process managing the VM. Our plan for networking has the IP provided to each VM pod being taken over by the VM itself, which means processes running in the pod (other than the qemu process) will not have network access in the near future.
This doesn't mean we are abandoning live migrations. It just means we are accepting that migrations are a feature we're willing to sacrifice in the short term in order to simplify the move to a more desirable overall KubeVirt design.
Re-enabling migrations is being tracked in this issue #676
Libvirtd and Virtlogd
The libvirtd and virtlogd processes are now launched as part of virt-launcher's initialization sequence.
Originally I had libvirtd and virtlogd in their own respective containers in the VM pod, however this caused issues with startup and shutdown ordering.
Virt-launcher intercepts posix signals to shutdown and uses that as a signal to begin gracefully shutting down the VM. We need to ensure that the libvirtd process does not shutdown until after the VM has exited. This was hard to guarantee with libvirtd not being in the same container and controlled by virt-launcher.
Code Removal and Relocation
Nothing with involved with networking was impacted by this patch series. The VM pods remain in the host network namespace for now simply because the Pod network work hasn't been completed yet.
Issues resolved by these changes.
fabiand left a comment
Thanks for the extensiove description.
Overall this looks good!
There are a few things (i.e. talking to launcher) where I am unsure how we wanna do this on the long run, but for now all of this works for me.
I think we should merge this to really expose this work to broader testing.