-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal on how to bring a VM into a container #2
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me in general.
I think a binary - or this search mechanism - is the best solution we have today.
Few things:
- Please describe the complete flow from controller until launched VM - and how each component passes, retrieves the secret
- Let's keep numa out and focus on setting ns and cgroup
process with the uuid and the vm name in the commandline like this: | ||
|
||
```bash | ||
virt-launcher -kubevirt.vm.uuid 1234-5678-1234-1234 -kubevirt.vm.name testvm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using this flags, the below suggested ENV vars can be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we would find everything already in /proc/<pid>/cmdline
. Then we can just look in /environ (like you suggested) or a specific file location, if the right secret is there.
</domain> | ||
``` | ||
|
||
Note that we have to specify the qemu namespace to use these features. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean the target namespace the qemu instance should live in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just mean the xmlns
namespace, when it is not present, libvirt does not accept these qemu tags. Will clarify it.
|
||
```xml | ||
<qemu:commandline> | ||
<qemu:env name='kubevirt.io.secret' value='3098tFJoswfwkjp4'/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about aligning the namespace: kubevirt.io.vm.secret
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, in contrast to the VM name and the VM uuid, the secret is no property of the VM, that is why I did not add vm
there, but I don't really care.
2. Introduce a shared secret between the VM target container and kubevirt | ||
3. This secret can be unique per pod creation and only valid for a specific | ||
amount of time | ||
4. The binary checks that the secret is present in the kubernetes metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"the kubernetes metadata file" what file are you referring to?
namespace where libvirt is running. They should be delivered via an init | ||
container to the host via `hostDir` mount (if libvirt is running on the host), | ||
or via an `emptyDir` mount to a container (if libvirt is running in a | ||
container). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about shipping it directly in the libvirtd container?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be ok too for the start, the disadvantage is, that we would need to rebuild the libvirt container then. So far I only had to change the emulator when kubevirt introduced new features and not when we changed the libvirt configuration. It seems to be more influenced on how we integrate with kubernetes than how we integrate with libvirt. If we ship it for example with the virt-handler daemon set as init container, we would not have to role out or restart libvirt. But both solutions would work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further, virt-handler controlls the additional qemu commandline and environment flags, and therefore how the emulator is called. So for me it is more like a part of virt-handler and that it is more important that it is kept in-sync with virt-handler than with libvirt. What do you think?
|
||
when we implement the secret properly. | ||
|
||
### NUMA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep numa out of this for now.
It deserves it's own proposal I'd say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me. I will make one for numa and cpu pinning.
|
||
``` | ||
# Getting the pid os container runtime independent | ||
pid = $(ps aux | grep virt-launcher | grep "-kubevirt.vm.uuid 1234-5678-1234-1234 | tr -s ' ' | cut -d " " -f2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also set the secret on the launcher pod using an environemtn variable.
Then we can scan all /proc/*/environ files which is already a map, and find the correct process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Being picky: It would probably not work with rkt stage2-lkvm runtime, because there the launcher would be run in a VM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Being picky: It would probably not work with rkt stage2-lkvm runtime, because there the launcher would be run in a VM.
Right.
We can also set the secret on the launcher pod using an environemtn variable.
Then we can scan all /proc/*/environ files which is already a map, and find the correct process.
I guess you mean just scanning the environ file of the target virt-launcher, to make sure it is the right one.
hook](https://libvirt.org/hooks.html#qemu) will check if the Domain XML has an | ||
emulator tag which points to the right binary. | ||
|
||
### Flow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice if the flow would cover
- controller part (setting the secret?)
- launcher part
- handler part
- binary part
### Running it inside a container | ||
|
||
To run the binary from inside a container, it needs access to the host `/proc` | ||
directory. The recommended implementaiton is, to have a config file for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it only needs to access it if the container it is launched in has a pid namespace.
If the surrounding container is using the host's pid namespace, then /proc should be the host's /proc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a security standpoint, this approach seems risky. It means that the virt-handler code can assume control of any other process on the system via a VM launch.
This is an overkill response to the desire to limit the power of virt-launcher, which we've decided should not have direct access to libvirt. We don't want a random VM controller other vms, either.
It seems to me that the right solution involves cooperation between the virt-launcher and virt-handler processes. virt-launcher should open a domain socket that gets passed to virt-handler, and finally to libvirt, for launching the vm. This socket should be the only means of communication between the processes.
The virt-launcher process already has theproper cgroups and namespaces set up. It should be the sole source of truth for this information.
When launching a vm, libvirt uses a fork/exec approach. It opens the socket used to manage the process, and execs either qemu, or the command as passed from the caller, such as what virt-handler passes in the wrapper script. Since domain sockets can be passed via domain sockets (really!) the launched process could pass the management socket to virt-launcher, as well as any other parameters required for the start of the virtual machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fabiand forgot to hit the "submit review" button. They were pending for a few weeks now. Sorry.
</domain> | ||
``` | ||
|
||
Note that we have to specify the qemu namespace to use these features. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just mean the xmlns
namespace, when it is not present, libvirt does not accept these qemu tags. Will clarify it.
|
||
```xml | ||
<qemu:commandline> | ||
<qemu:env name='kubevirt.io.secret' value='3098tFJoswfwkjp4'/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, in contrast to the VM name and the VM uuid, the secret is no property of the VM, that is why I did not add vm
there, but I don't really care.
process with the uuid and the vm name in the commandline like this: | ||
|
||
```bash | ||
virt-launcher -kubevirt.vm.uuid 1234-5678-1234-1234 -kubevirt.vm.name testvm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we would find everything already in /proc/<pid>/cmdline
. Then we can just look in /environ (like you suggested) or a specific file location, if the right secret is there.
|
||
``` | ||
# Getting the pid os container runtime independent | ||
pid = $(ps aux | grep virt-launcher | grep "-kubevirt.vm.uuid 1234-5678-1234-1234 | tr -s ' ' | cut -d " " -f2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Being picky: It would probably not work with rkt stage2-lkvm runtime, because there the launcher would be run in a VM.
Right.
We can also set the secret on the launcher pod using an environemtn variable.
Then we can scan all /proc/*/environ files which is already a map, and find the correct process.
I guess you mean just scanning the environ file of the target virt-launcher, to make sure it is the right one.
namespace where libvirt is running. They should be delivered via an init | ||
container to the host via `hostDir` mount (if libvirt is running on the host), | ||
or via an `emptyDir` mount to a container (if libvirt is running in a | ||
container). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be ok too for the start, the disadvantage is, that we would need to rebuild the libvirt container then. So far I only had to change the emulator when kubevirt introduced new features and not when we changed the libvirt configuration. It seems to be more influenced on how we integrate with kubernetes than how we integrate with libvirt. If we ship it for example with the virt-handler daemon set as init container, we would not have to role out or restart libvirt. But both solutions would work.
### Running it inside a container | ||
|
||
To run the binary from inside a container, it needs access to the host `/proc` | ||
directory. The recommended implementaiton is, to have a config file for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right
|
||
when we implement the secret properly. | ||
|
||
### NUMA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me. I will make one for numa and cpu pinning.
Closing this according to discussion offline discussions. |
No description provided.