Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complete migration of network configuration from launcher into handler pod #3085

Closed
booxter opened this issue Feb 18, 2020 · 12 comments · Fixed by #3290 or #4506
Closed

Complete migration of network configuration from launcher into handler pod #3085

booxter opened this issue Feb 18, 2020 · 12 comments · Fixed by #3290 or #4506

Comments

@booxter
Copy link
Contributor

booxter commented Feb 18, 2020

As of the time of writing, all network configuration (wiring) for VMI pod interfaces is handled inside podinterface.go entry points that are executed by virt-launcher. While it works, it's not ideal. Specifically, because of security implications where we have to leave some capabilities set on all launcher pods that are not completely safe to leave set for resources managed by users (launchers are owned by users not admins).

The goal of this request is to remove these advanced capabilities from launcher pod. The way we plan to do it is to move network configuration steps that require the capabilities into virt-handler that is already running as a superuser and is not accessible by users.

The first step of the migration is being developed in context of #2837 This PR should move all network configuration into handler, except the step that starts a DHCP server (needed for some types of interfaces).

To complete the goal set here, the following additional pieces of work should be merged:

  1. find a way to migrate DHCP server management into handler too. While it should be possible to start a go thread in launcher network namespace from handler, special care should be taken to make sure handler doesn't leave orphan servers running on VMI shutdown. Also, it should be considered how such a design would affect availability of DHCP service on handler upgrade or misbehavior. (Right now, launcher pod operations do not depend on handler availability once VMI is set up.)

  2. make sure that any resources created by libvirt on VMI startup, like tap device used to plug VMI into the container namespace, can be pre-created by handler and used by libvirt. This step may require additional work in libvirt to allow for this mode of operation. For tap device, this should already be solved in latest master of libvirt, see: https://bugzilla.redhat.com/show_bug.cgi?id=1723367 There may be other similar cases.

  3. finally, remove NET_ADMIN capability from launcher pod.

@booxter
Copy link
Contributor Author

booxter commented Feb 18, 2020

@davidvossel let me know if this is clear enough.

@kubevirt-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@kubevirt-bot kubevirt-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 18, 2020
@kubevirt-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

@kubevirt-bot kubevirt-bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 17, 2020
@maiqueb
Copy link
Contributor

maiqueb commented Jun 18, 2020

/remove-lifecycle rotten

@kubevirt-bot kubevirt-bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jun 18, 2020
@maiqueb
Copy link
Contributor

maiqueb commented Aug 24, 2020

/reopen

of the 3 items referenced in the issue's description, only the second item (pre-create tap devices on virt-handler) was addressed.

@kubevirt-bot kubevirt-bot reopened this Aug 24, 2020
@kubevirt-bot
Copy link
Contributor

@maiqueb: Reopened this issue.

In response to this:

/reopen

of the 3 items referenced in the issue's description, only item #2 was addressed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mlsorensen
Copy link
Contributor

mlsorensen commented Aug 30, 2020

I’d be interested in knowing how this is going to be handled. From what I’ve been able to dig up, just using an existing tap device requires NET_ADMIN at the point where Qemu opens /dev/net/tun and issues a TUNSETIFF.

The kernel code in net/tun.c checks for NET_ADMIN when fetching a tap. Qemu calls this here.

@mlsorensen
Copy link
Contributor

Aha! That’s where this is going...

https://www.spinics.net/linux/fedora/libvirt-users/msg12261.html

@maiqueb
Copy link
Contributor

maiqueb commented Sep 7, 2020

Aha! That’s where this is going...

https://www.spinics.net/linux/fedora/libvirt-users/msg12261.html

This features the reply: https://www.spinics.net/linux/fedora/libvirt-users/msg12265.html . It is indeed possible to open a tap device without NET_ADMIN.

@mlsorensen
Copy link
Contributor

mlsorensen commented Sep 9, 2020

Aha! That’s where this is going...
https://www.spinics.net/linux/fedora/libvirt-users/msg12261.html

This features the reply: https://www.spinics.net/linux/fedora/libvirt-users/msg12265.html . It is indeed possible to open a tap device without NET_ADMIN.

It's actually NOT possible, if you have SELinux enabled. Even with NET_ADMIN the pod can't use an existing tap due to SELinux.

The kernel checks to see if you have access to relabelto/relabelfrom on tun_socket class here:
https://github.com/torvalds/linux/blob/34d4ddd359dbcdf6c5fb3f85a179243d7a1cb7f8/security/selinux/hooks.c#L5493

It's an easy fix if we can upstream container-selinux to give container_t access to relabel tun_socket.

@maiqueb
Copy link
Contributor

maiqueb commented Sep 10, 2020

Aha! That’s where this is going...
https://www.spinics.net/linux/fedora/libvirt-users/msg12261.html

This features the reply: https://www.spinics.net/linux/fedora/libvirt-users/msg12265.html . It is indeed possible to open a tap device without NET_ADMIN.

It's actually NOT possible, if you have SELinux enabled. Even with NET_ADMIN the pod can't use an existing tap due to SELinux.

AFAIU, it is, since we have this neat little hole punched in the selinux policy:

(allow process self (tun_socket (relabelfrom relabelto attach_queue)))

The kernel checks to see if you have access to relabelto/relabelfrom on tun_socket class here:
https://github.com/torvalds/linux/blob/34d4ddd359dbcdf6c5fb3f85a179243d7a1cb7f8/security/selinux/hooks.c#L5493

Thanks for this pointer to the code; I was actually under the impression that it was libvirt doing the relabeling.

It's an easy fix if we can upstream container-selinux to give container_t access to relabel tun_socket.

@jean-edouard : do you think we should / could pursue this course of action ?

@mlsorensen
Copy link
Contributor

mlsorensen commented Sep 10, 2020

Aha! That’s where this is going...
https://www.spinics.net/linux/fedora/libvirt-users/msg12261.html

This features the reply: https://www.spinics.net/linux/fedora/libvirt-users/msg12265.html . It is indeed possible to open a tap device without NET_ADMIN.

It's actually NOT possible, if you have SELinux enabled. Even with NET_ADMIN the pod can't use an existing tap due to SELinux.

AFAIU, it is, since we have this neat little hole punched in the selinux policy:

(allow process self (tun_socket (relabelfrom relabelto attach_queue)))

The kernel checks to see if you have access to relabelto/relabelfrom on tun_socket class here:
https://github.com/torvalds/linux/blob/34d4ddd359dbcdf6c5fb3f85a179243d7a1cb7f8/security/selinux/hooks.c#L5493

Thanks for this pointer to the code; I was actually under the impression that it was libvirt doing the relabeling.

Just talking through this (probably for my own sake) - In the libvirt case where it is opening an fd and passing it by number to qemu, I think the same process occurs, just when libvirt first attaches. From what I can see I believe the check will be skipped since the fd is already open and configured. So as long as the labeling matches between /dev/net/tun wherever libvirt is running and wherever qemu runs, it will be fine.

However, if libvirt has NET_ADMIN it's probably moot since there don't seem to be any SELinux checks on tap device creation, only opening existing named devices. If some helper script or other process is actually creating the tap and then libvirt is simply opening the existing tap, then it would run into the SELinux check. It's a matter of when the fd responsible for creating the tap is closed.

It's an easy fix if we can upstream container-selinux to give container_t access to relabel tun_socket.

@jean-edouard : do you think we should / could pursue this course of action ?

I've created a PR here to get the ball rolling: containers/container-selinux#104

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants