Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal on how to bring a VM into a container #2

Closed
wants to merge 3 commits into from
Closed

Proposal on how to bring a VM into a container #2

wants to merge 3 commits into from

Conversation

rmohr
Copy link
Member

@rmohr rmohr commented Jan 15, 2017

No description provided.

@rmohr
Copy link
Member Author

rmohr commented Feb 2, 2017

@fabiand @stu-gott @admiyo, thoughts on this? I think we can come up with something better later if we find something. This is really a blocker regarding to bringing KubeVirt out of the Vagrant and Demo VM.

Copy link
Member

@fabiand fabiand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me in general.
I think a binary - or this search mechanism - is the best solution we have today.

Few things:

  1. Please describe the complete flow from controller until launched VM - and how each component passes, retrieves the secret
  2. Let's keep numa out and focus on setting ns and cgroup

process with the uuid and the vm name in the commandline like this:

```bash
virt-launcher -kubevirt.vm.uuid 1234-5678-1234-1234 -kubevirt.vm.name testvm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using this flags, the below suggested ENV vars can be used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we would find everything already in /proc/<pid>/cmdline. Then we can just look in /environ (like you suggested) or a specific file location, if the right secret is there.

</domain>
```

Note that we have to specify the qemu namespace to use these features.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean the target namespace the qemu instance should live in?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just mean the xmlns namespace, when it is not present, libvirt does not accept these qemu tags. Will clarify it.


```xml
<qemu:commandline>
<qemu:env name='kubevirt.io.secret' value='3098tFJoswfwkjp4'/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about aligning the namespace: kubevirt.io.vm.secret?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, in contrast to the VM name and the VM uuid, the secret is no property of the VM, that is why I did not add vm there, but I don't really care.

2. Introduce a shared secret between the VM target container and kubevirt
3. This secret can be unique per pod creation and only valid for a specific
amount of time
4. The binary checks that the secret is present in the kubernetes metadata
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the kubernetes metadata file" what file are you referring to?

namespace where libvirt is running. They should be delivered via an init
container to the host via `hostDir` mount (if libvirt is running on the host),
or via an `emptyDir` mount to a container (if libvirt is running in a
container).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about shipping it directly in the libvirtd container?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be ok too for the start, the disadvantage is, that we would need to rebuild the libvirt container then. So far I only had to change the emulator when kubevirt introduced new features and not when we changed the libvirt configuration. It seems to be more influenced on how we integrate with kubernetes than how we integrate with libvirt. If we ship it for example with the virt-handler daemon set as init container, we would not have to role out or restart libvirt. But both solutions would work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further, virt-handler controlls the additional qemu commandline and environment flags, and therefore how the emulator is called. So for me it is more like a part of virt-handler and that it is more important that it is kept in-sync with virt-handler than with libvirt. What do you think?


when we implement the secret properly.

### NUMA
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep numa out of this for now.
It deserves it's own proposal I'd say.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. I will make one for numa and cpu pinning.


```
# Getting the pid os container runtime independent
pid = $(ps aux | grep virt-launcher | grep "-kubevirt.vm.uuid 1234-5678-1234-1234 | tr -s ' ' | cut -d " " -f2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also set the secret on the launcher pod using an environemtn variable.
Then we can scan all /proc/*/environ files which is already a map, and find the correct process.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being picky: It would probably not work with rkt stage2-lkvm runtime, because there the launcher would be run in a VM.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being picky: It would probably not work with rkt stage2-lkvm runtime, because there the launcher would be run in a VM.

Right.

We can also set the secret on the launcher pod using an environemtn variable.
Then we can scan all /proc/*/environ files which is already a map, and find the correct process.

I guess you mean just scanning the environ file of the target virt-launcher, to make sure it is the right one.

hook](https://libvirt.org/hooks.html#qemu) will check if the Domain XML has an
emulator tag which points to the right binary.

### Flow
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice if the flow would cover

  1. controller part (setting the secret?)
  2. launcher part
  3. handler part
  4. binary part

### Running it inside a container

To run the binary from inside a container, it needs access to the host `/proc`
directory. The recommended implementaiton is, to have a config file for the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it only needs to access it if the container it is launched in has a pid namespace.

If the surrounding container is using the host's pid namespace, then /proc should be the host's /proc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right

Copy link

@admiyo admiyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a security standpoint, this approach seems risky. It means that the virt-handler code can assume control of any other process on the system via a VM launch.

This is an overkill response to the desire to limit the power of virt-launcher, which we've decided should not have direct access to libvirt. We don't want a random VM controller other vms, either.

It seems to me that the right solution involves cooperation between the virt-launcher and virt-handler processes. virt-launcher should open a domain socket that gets passed to virt-handler, and finally to libvirt, for launching the vm. This socket should be the only means of communication between the processes.

The virt-launcher process already has theproper cgroups and namespaces set up. It should be the sole source of truth for this information.

When launching a vm, libvirt uses a fork/exec approach. It opens the socket used to manage the process, and execs either qemu, or the command as passed from the caller, such as what virt-handler passes in the wrapper script. Since domain sockets can be passed via domain sockets (really!) the launched process could pass the management socket to virt-launcher, as well as any other parameters required for the start of the virtual machine.

Copy link
Member Author

@rmohr rmohr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fabiand forgot to hit the "submit review" button. They were pending for a few weeks now. Sorry.

</domain>
```

Note that we have to specify the qemu namespace to use these features.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just mean the xmlns namespace, when it is not present, libvirt does not accept these qemu tags. Will clarify it.


```xml
<qemu:commandline>
<qemu:env name='kubevirt.io.secret' value='3098tFJoswfwkjp4'/>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, in contrast to the VM name and the VM uuid, the secret is no property of the VM, that is why I did not add vm there, but I don't really care.

process with the uuid and the vm name in the commandline like this:

```bash
virt-launcher -kubevirt.vm.uuid 1234-5678-1234-1234 -kubevirt.vm.name testvm
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we would find everything already in /proc/<pid>/cmdline. Then we can just look in /environ (like you suggested) or a specific file location, if the right secret is there.


```
# Getting the pid os container runtime independent
pid = $(ps aux | grep virt-launcher | grep "-kubevirt.vm.uuid 1234-5678-1234-1234 | tr -s ' ' | cut -d " " -f2)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being picky: It would probably not work with rkt stage2-lkvm runtime, because there the launcher would be run in a VM.

Right.

We can also set the secret on the launcher pod using an environemtn variable.
Then we can scan all /proc/*/environ files which is already a map, and find the correct process.

I guess you mean just scanning the environ file of the target virt-launcher, to make sure it is the right one.

namespace where libvirt is running. They should be delivered via an init
container to the host via `hostDir` mount (if libvirt is running on the host),
or via an `emptyDir` mount to a container (if libvirt is running in a
container).
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be ok too for the start, the disadvantage is, that we would need to rebuild the libvirt container then. So far I only had to change the emulator when kubevirt introduced new features and not when we changed the libvirt configuration. It seems to be more influenced on how we integrate with kubernetes than how we integrate with libvirt. If we ship it for example with the virt-handler daemon set as init container, we would not have to role out or restart libvirt. But both solutions would work.

### Running it inside a container

To run the binary from inside a container, it needs access to the host `/proc`
directory. The recommended implementaiton is, to have a config file for the
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right


when we implement the secret properly.

### NUMA
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. I will make one for numa and cpu pinning.

@fabiand
Copy link
Member

fabiand commented Jul 5, 2017

Closing this according to discussion offline discussions.

@fabiand fabiand closed this Jul 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants