New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Complete migration of network configuration from launcher into handler pod #3085
Comments
|
@davidvossel let me know if this is clear enough. |
|
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with /lifecycle stale |
|
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with /lifecycle rotten |
|
/remove-lifecycle rotten |
|
/reopen of the 3 items referenced in the issue's description, only the second item (pre-create tap devices on virt-handler) was addressed. |
|
@maiqueb: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
I’d be interested in knowing how this is going to be handled. From what I’ve been able to dig up, just using an existing tap device requires NET_ADMIN at the point where Qemu opens /dev/net/tun and issues a TUNSETIFF. The kernel code in net/tun.c checks for NET_ADMIN when fetching a tap. Qemu calls this here. |
|
Aha! That’s where this is going... https://www.spinics.net/linux/fedora/libvirt-users/msg12261.html |
This features the reply: https://www.spinics.net/linux/fedora/libvirt-users/msg12265.html . It is indeed possible to open a tap device without NET_ADMIN. |
It's actually NOT possible, if you have SELinux enabled. Even with NET_ADMIN the pod can't use an existing tap due to SELinux. The kernel checks to see if you have access to relabelto/relabelfrom on tun_socket class here: It's an easy fix if we can upstream container-selinux to give container_t access to relabel tun_socket. |
AFAIU, it is, since we have this neat little hole punched in the selinux policy:
Thanks for this pointer to the code; I was actually under the impression that it was libvirt doing the relabeling.
@jean-edouard : do you think we should / could pursue this course of action ? |
Just talking through this (probably for my own sake) - In the libvirt case where it is opening an fd and passing it by number to qemu, I think the same process occurs, just when libvirt first attaches. From what I can see I believe the check will be skipped since the fd is already open and configured. So as long as the labeling matches between /dev/net/tun wherever libvirt is running and wherever qemu runs, it will be fine. However, if libvirt has NET_ADMIN it's probably moot since there don't seem to be any SELinux checks on tap device creation, only opening existing named devices. If some helper script or other process is actually creating the tap and then libvirt is simply opening the existing tap, then it would run into the SELinux check. It's a matter of when the fd responsible for creating the tap is closed.
I've created a PR here to get the ball rolling: containers/container-selinux#104 |
As of the time of writing, all network configuration (wiring) for VMI pod interfaces is handled inside podinterface.go entry points that are executed by virt-launcher. While it works, it's not ideal. Specifically, because of security implications where we have to leave some capabilities set on all launcher pods that are not completely safe to leave set for resources managed by users (launchers are owned by users not admins).
The goal of this request is to remove these advanced capabilities from launcher pod. The way we plan to do it is to move network configuration steps that require the capabilities into virt-handler that is already running as a superuser and is not accessible by users.
The first step of the migration is being developed in context of #2837 This PR should move all network configuration into handler, except the step that starts a DHCP server (needed for some types of interfaces).
To complete the goal set here, the following additional pieces of work should be merged:
find a way to migrate DHCP server management into handler too. While it should be possible to start a go thread in launcher network namespace from handler, special care should be taken to make sure handler doesn't leave orphan servers running on VMI shutdown. Also, it should be considered how such a design would affect availability of DHCP service on handler upgrade or misbehavior. (Right now, launcher pod operations do not depend on handler availability once VMI is set up.)
make sure that any resources created by libvirt on VMI startup, like tap device used to plug VMI into the container namespace, can be pre-created by handler and used by libvirt. This step may require additional work in libvirt to allow for this mode of operation. For tap device, this should already be solved in latest master of libvirt, see: https://bugzilla.redhat.com/show_bug.cgi?id=1723367 There may be other similar cases.
finally, remove NET_ADMIN capability from launcher pod.
The text was updated successfully, but these errors were encountered: