New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create tap device on virt handler #3290
Create tap device on virt handler #3290
Conversation
77bd8ca
to
e7bd1e3
Compare
|
Can't wait to try this out. Does the change set referenced address #3085 and I assume there will be more commits to the DHCP portion and actually remove NET_ADMIN from the launcher pod? |
It does address #3085 . I'd rather have this PR focus exclusively on having non-privileged libvirt consume a pre-provisioned tap device; once that's done, I can focus on the next step - handling DHCP - which would finally enable the removal of the NET_ADMIN capability from the launcher pod. |
|
/retest |
|
After analyzing the failed tests, I noticed that libvirt fails to attach to the pre-configured tap device being created by virt-handler. Upon looking at the audit logs, I've found suspicious lines, quite similar to the ones listed in this bug: I noticed that you've CC'ed yourself on that bug @fabiand a very short time ago. Are you hitting this as well ? I confirm that allowing libvirtd to 'relabelfrom' and 'relabelto' fixes this issue, enabling libvirt to consume the pre-configured tap devices - i.e. Given my lack of selinux knowledge I'm not sure this is a selinux-policy bug in centos8.1 or if I need to provision those policies in virt-handler ... What do you think @fabiand , @rmohr ? EDIT: the policy file that enables libvirt to consume the pre-proviosioned tap device can be found below: |
|
@maiqueb can we try to label the tab device in a way from virt-handler, so that libvirt will not have to change the labels? |
|
We have a similar issue also related to a tun_socket relabelling. I think virt-handler could relabel the /dev/tapX that you are creating to https://github.com/stefanberger/libvirt-tpm/blob/master/src/security/security_selinux.c#L3271 |
|
btw @maiqueb where is this log coming from? From the compute container or the node? |
The node. EDIT: Wait, I think I misunderstood you; this log comes from the 'node' - as in |
e7bd1e3
to
85b52eb
Compare
85b52eb
to
9683a50
Compare
@vladikr I don't think it would do, for 2 reasons:
./cluster-up/ssh.sh node02 "sudo semanage fcontext -l" | grep "/dev/tap"
/dev/tap.* character device system_u:object_r:tun_tap_device_t:s0
/dev/tape.* character device system_u:object_r:tape_device_t:s0 Ultimately, while I think we all agree the real solution is to properly tag the tap device, I still don't know what is the object here.... I think it's the interface itself, but that's half 'feeling' half educated guess. Also, I'm not sure what's the tag to use. For the time being, I've added the rules that 'make this work' that I pasted on #3290 (comment). I'll keep looking into this matter. |
| log.Log.Reason(err).Criticalf("Failed to create tap device %s. Reason: %s", tapName, out) | ||
| return "", err | ||
| } | ||
| log.Log.Infof("Created tap device: %s", tapName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the device that's being generated here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, sorry but I don't understand the question. You mean the type of device ?... The name ?...
cmd/virt-handler/base_container.cil
Outdated
| (allow container_t self (tun_socket (relabelto))) | ||
| (allow container_t spc_t (tun_socket (relabelfrom))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going forward, we are planning to remove this custom policy, hence we need to find a solution that allows virt-launcher to use this tap device while it is labeled as container_t
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this is just here because it felt a little bit less wrong than in the virt_launcher.cil file .
I'm looking for a better way of getting selinux to co-exist w/ the tap device; this is just temporary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, this doesn't make it better. This file is a base for virt_launcher.cil, if anything it should be there. However, this will soon be removed and virt-launcher will not have a special policy.
Afaik, when you create a tap device you will have a corresponding char device. Could it be that this device has been created on node01 instead of node02? |
It can be easily seen in the snippet below (my local setup, no kubevirt, fc31):
It wasn't . |
9683a50
to
5f89948
Compare
I didn't go into the specific for a while now. Perhaps the device is being created when you open() /dev/net/tun... I'm not sure. I'll need to look into this. Do you see the file descriptor of the tap device you've created? [1] https://github.com/stefanberger/libvirt-tpm/blob/master/src/security/security_selinux.c#L3271 |
Unfortunately, I didn't have much time today to look deeper into this. In general, the reason why there is no /dev/tapX device at this stage is that The above is still happening in the virt-launcher (in libvirt[1] and then [2]) and it's the association of the device to the interface what's probably failing ( I didn't find if/how it's possible to relabel the registered tap interface... However, I do wonder if we can use macvtap (bridge mode) here. It will create a [1] https://github.com/libvirt/libvirt/blob/master/src/qemu/qemu_interface.c#L443 |
aad2ab7
to
0c49faa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very good to me already.
@phoracek comments make sense. Besides these, I would avoid introducing additional logic or refactoring into this PR.
cmd/virt-handler/virt_launcher.cil
Outdated
| @@ -7,4 +7,5 @@ | |||
| (allow process hugetlbfs_t (dir (add_name create write remove_name rmdir setattr))) | |||
| (allow process hugetlbfs_t (file (create unlink))) | |||
| (allow process sysfs_t (file (write))) | |||
| (allow process tun_tap_device_t (chr_file (read write open ioctl))) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was under the impression that we are aiming to get a correctly labeled interface in the virt-launcher which will not require any additional privileges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After having removed the selinux categories from the virt-handler pod, this is no longer needed; I think you've reviewed an outdated version of the code.
| createVirtualNetworkingDevice(deviceName string, isMultiqueue bool, launcherPID int) error | ||
| configureVirtualNetworkingDevice() error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
maybe just call it createAndBindTapToBridge ? :)
In order to make virt-handler create the tap device on behalf of
virt-launcher, a set of changes were required:
- updated the VIF struct to feature the name of the created tap
device. This way, pod networking configuration phase kubevirt#2 can
instruct libvirt that it should use a pre-existent tap device
(via 'target' parameter). This will happen in a follow-up
commit.
- added a method to the `NetworkHandler` interface that
creates a tap device. This method receives the tap interface
name as parameter (to handle multus based scenarios, where one
VM is plugged multiple taps).
- create a tap device from the `Bridge` and `Masquerade`
`BindMechanism`s, using the aforementioned `NetworkHandler`
method. The name of the created devices will be cached in the
VIF structure. This tap device will be configured a follow-up
commit.
Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
With this change, libvirt will use a non-managed, pre-configured tap device (which does not require NET_ADMIN capabilities) instead of creating a new one (which does require NET_ADMIN capabilities). This commit adds a method to configure a tap device to the `NetworkHandler` and `BindMechanism` interfaces. This configuration is common for all binding mechanism that require a bridge: it configures the tap device, and binds it to the in-pod bridge. This configuration step takes place in the pod networking configuration phase#1. Lastly, updates were made enabling this pre-configured tap device to be consumed by libvirt; the generated domxml specifies interface type 'ethernet' instead of 'bridge'. It also must feature the tap device name as the interface 'target', and use the 'managed' attribute to indicate that the tap device already exists, and that no further configuration is to take place. The unit tests had to be updated accordingly, enabling the travis lane to pass. This commit is part of the change-set to remove the NET_ADMIN capability from the virt-launcher pods. Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
When requested in the vmi spec, create the tap devices w/ multi queue support. Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
Create the tap device via a golang library instead of relying on the iproute tool. This library only performs the minimal required set of syscalls on the tap's FD. It is planned to eventually replace this library (currently songgao water - [0]) for netlink's tuntap implementation [1], which currently does not support creating the tap device with the desired owner / group. Those attributes will be used by libvirt on the consuming end, and the kernel requires the NET_ADMIN capability whenever the owner or group doesn't match (libvirt will open them with the `qemu` user, which translated into uid / gid 107. [0] - https://github.com/songgao/water [1] - https://github.com/vishvananda/netlink Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
This patch reads the selinux labels of both virt-handler and virt-launcher. It then proceeds to create the tap-device using the label of virt-launcher, so it can be consumed by libvirt without requiring any special selinux policies. Once that is done, the original context is reset, so all other binaries can be executed in the virt-handler selinux context. Calls to SetExecLabel should be wrapped in Lock /Unlock OS thread until the program execution finishes, to guarantee another goroutine does not migrate to the current thread before execution is complete, as per https://godoc.org/github.com/opencontainers/selinux/go-selinux#SetExecLabel Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
There are some tests in the `security_features` suite that check
that KubeVirt runs successfully when the virt-launcher's
`selinuxLauncherType` is defined as `container_t`.
When that happens, we fall victim to libvirt blindly re-labeling
the tun socket, and we hit the following AVC:
```
type=AVC msg=audit(1596645906.167:1471): avc: denied \
{ relabelfrom } for pid=20383 comm="libvirtd" \
scontext=system_u:system_r:container_t:s0:c803,c1014 \
tcontext=system_u:system_r:container_t:s0:c803,c1014 \
tclass=tun_socket permissive=0
```
Since updating the selinux policy to allow for that is not
desired, we're left with no choice but to require
virt_launcher.process selinux contexts for networking.
A follow up patch will update the default value, setting it to the
escalated privilege version.
Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
Not having categories in the virt-handler allows us to greatly simplify the tap making solution, since it allows the process that creates the tap device (which runs with virt-launcher's categories) to access the clone device available on virt-handler. Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
Whenever a file descriptor is opened and a new process is forked or exec'ed, it will inherit all the FDs of the parent. In this case, the leaked FDs have the `spc_t` type, and the child process is running with lesser privileges, which will result in plenty of AVCs in the selinux audit log. By setting the close on exec flag of the file descriptors, they will close themselves on the child, as per this article: https://danwalsh.livejournal.com/53603.html Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
Use `virt_launcher.process` as the default selinux launcher type. Update the functional tests accordingly. Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
Add a struct that allows creating the tap device in 2 steps: - get the selinux labels and build the `exec.Command(...)` - selinux label handling and executing the tap making command Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
b3e43ef
to
fbec96d
Compare
|
/lgtm |
|
/hold cancel |
|
/retest |
1 similar comment
|
/retest |
|
/lgtm |
|
/retest |
3 similar comments
|
/retest |
|
/retest |
|
/retest |
|
/retest |
|
/retest |
|
@maiqueb Congratulation! Thank you for bearing with us! It wasn't an easy PR. Kudos! |
What this PR does / why we need it:
Currently, libvirt is responsible for creating & configuring the tap devices that will be used by the kubevirt VMs. As such, virt-launcher pods require the NET_ADMIN capability.
This PR moves the tap device creation / configuration stage to the virt-handler pod, which will create / configure the tap devices on behalf of the virt-launchers (and in their netns).
To allow a non-privileged libvirt to consume a pre-configured tap device, the kubevirt's VM domxml specification has to be updated.
Which issue(s) this PR fixes:
Partially fixes #3085
Special notes for your reviewer:
This PR is part of the change-set to remove the NET_ADMIN capability from the virt-launcher pod.
Release note: