Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node-labeller.sh: Consider AppArmor restrictions #8692

Closed

Conversation

vasiliy-ul
Copy link
Contributor

@vasiliy-ul vasiliy-ul commented Oct 28, 2022

What this PR does / why we need it:

Even though the virt-handler pod is privileged, on the systems with AppArmor there might be a host profile which will be automatically picked for the /usr/sbin/libvirtd binary. That may block the execution of /usr/libexec/qemu-kvm. In such a case, try moving the qemu executable to a location, which is more common for AppArmor-enabled Linux distros.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #7771, fixes #7638

Special notes for your reviewer:

The fix is needed only when the script is executed in a privileged container. In that case, the container runtime will not apply any profile (i.e. the container will run unconfined). However, if an AppArmor profile for libvirt is installed on the host, it will be automatically picked for the binary /usr/bin/libvirtd.

For the VM workloads the problem is not relevant since virt-launcher pod is not privileged, hence the runtime will apply the default profile (e.g. cri-containerd.apparmor.d in the output belllow):

 # aa-status 
apparmor module is loaded.
58 profiles are loaded.
58 profiles are in enforce mode.
   ...
   cri-containerd.apparmor.d
   ...
   libvirtd
   libvirtd//qemu_bridge_helper
   ...
0 profiles are in complain mode.
0 profiles are in kill mode.
0 profiles are in unconfined mode.
11 processes have profiles defined.
11 processes are in enforce mode.
   /bin/busybox (1778) cri-containerd.apparmor.d
   /bin/busybox (1921) cri-containerd.apparmor.d
   /usr/bin/virt-controller (2111) cri-containerd.apparmor.d
   /usr/bin/virt-operator (2354) cri-containerd.apparmor.d
   /usr/bin/virt-api (2515) cri-containerd.apparmor.d
   /usr/bin/virt-launcher-monitor (3889) cri-containerd.apparmor.d
   /usr/bin/virt-launcher (3906) cri-containerd.apparmor.d
   /usr/sbin/libvirtd (3916) cri-containerd.apparmor.d
   /usr/sbin/virtlogd (3917) cri-containerd.apparmor.d
   /usr/bin/container-disk (3955) cri-containerd.apparmor.d
   /usr/libexec/qemu-kvm (4068) cri-containerd.apparmor.d
0 processes are in complain mode.
0 processes are unconfined but have a profile defined.
0 processes are in mixed mode.
0 processes are in kill mode.

# k exec -ti virt-launcher-vmi-fedora-p5xq5 -- bash
bash-4.4# ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.5 1238196 23476 ?       Ssl  07:46   0:00 /usr/bin/virt-launcher-monitor --qemu-timeout 297s --name vmi-fedora --uid 05979244-f909-45f2-875f-c8455b9b9
root        12  0.1  1.4 2148452 59176 ?       Sl   07:46   0:01 /usr/bin/virt-launcher --qemu-timeout 297s --name vmi-fedora --uid 05979244-f909-45f2-875f-c8455b9b9dad --na
root        22  0.1  0.6 1473748 27236 ?       Sl   07:46   0:00 /usr/sbin/libvirtd -f /var/run/libvirt/libvirtd.conf
root        23  0.0  0.3 137876 14092 ?        S    07:46   0:00 /usr/sbin/virtlogd -f /etc/libvirt/virtlogd.conf
qemu        77  4.7 11.6 3412228 467488 ?      Sl   07:47   0:29 /usr/libexec/qemu-kvm -name guest=default_vmi-fedora,debug-threads=on -S -object {"qom-type":"secret","id":"
root       169  0.0  0.0  12060  3180 pts/0    Ss   07:57   0:00 bash
root       175  0.0  0.0  44704  3268 pts/0    R+   07:57   0:00 ps aux
bash-4.4# cat /proc/22/attr/current 
cri-containerd.apparmor.d (enforce)
bash-4.4# cat /proc/77/attr/current 
cri-containerd.apparmor.d (enforce)

Also note: after moving the binary, we will have the new path reflected in the /var/lib/kubevirt-node-labeller/virsh_domcapabilities.xml:

<domainCapabilities>
  <path>/usr/bin/qemu-system-x86_64</path>
  ...

But I do not think it will cause issues, as AFAIK this path element is not used by the node-labeller.

Release note:

Fix virt-handler Init:CrashLoopBackOff on systems with AppArmor

Use set -x to print commands and their arguments as they are executed.

Signed-off-by: Vasiliy Ulyanov <vulyanov@suse.de>
Even though the virt-handler pod is privileged, on the systems with
AppArmor there might be a host profile which will be automatically
picked for the /usr/sbin/libvirtd binary. That may block the execution
of /usr/libexec/qemu-kvm. In such a case, try moving the qemu executable
to a location, which is more common for AppArmor-enabled Linux distros.

Signed-off-by: Vasiliy Ulyanov <vulyanov@suse.de>
@kubevirt-bot kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. size/XS labels Oct 28, 2022
@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from vasiliy-ul by writing /assign @vasiliy-ul in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@vasiliy-ul
Copy link
Contributor Author

BTW, this commit
https://gitlab.com/libvirt/libvirt/-/commit/5c8bd31c881e99261ac098e867a79b300440731a#64dfe5dabba9628d05af55486c20994715f5b505_104_101
seems to adapt the libvirt apparmor profile to allow /usr/libexec/*. However, it depends on the actual environment configuration what the resulting value of @libexecdir@ will look like.

@kubevirt-bot
Copy link
Contributor

@vasiliy-ul: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubevirt-client-python 940c9b2 link true /test pull-kubevirt-client-python
pull-kubevirt-e2e-k8s-1.24-sig-compute-migrations-nonroot 940c9b2 link true /test pull-kubevirt-e2e-k8s-1.24-sig-compute-migrations-nonroot
pull-kubevirt-e2e-k8s-1.23-sig-compute 940c9b2 link true /test pull-kubevirt-e2e-k8s-1.23-sig-compute
pull-kubevirt-e2e-k8s-1.22-operator 940c9b2 link true /test pull-kubevirt-e2e-k8s-1.22-operator

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@vasiliy-ul
Copy link
Contributor Author

/hold

This needs some more discussions.

@kubevirt-bot kubevirt-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 28, 2022
@vasiliy-ul vasiliy-ul changed the title node-labeller.sh: Consider AppArmor restrictions DNM node-labeller.sh: Consider AppArmor restrictions Oct 28, 2022
@vasiliy-ul vasiliy-ul changed the title DNM node-labeller.sh: Consider AppArmor restrictions node-labeller.sh: Consider AppArmor restrictions Oct 31, 2022
@vasiliy-ul
Copy link
Contributor Author

/unhold

@kubevirt-bot kubevirt-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 31, 2022
Copy link
Contributor

@dhiller dhiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the mailing list conversation this needs some more discussion IIUC.

@vasiliy-ul
Copy link
Contributor Author

According to the mailing list conversation this needs some more discussion IIUC.

I think we clarified some concerns in the mailing list, though in general - yes, probably makes sense to have some more discussions.

@andreabolognani
Copy link
Contributor

@vasiliy-ul this looks very hacky and makes me uncomfortable, mainly because the binary path would then not match between the capabilities XML generated by virt-handler and virt-launcher.

Are the capabilities generated by virt-handler only used for the purpose of node labeling? And is the libvirtd instance running inside each virt-launcher container then generating its own capabilities from scratch at startup?

IIUC the denial is caused by the fact that libvirtd running inside the virt-handler container gets the host AppArmor policy applied to it. If that's the case, since the AppArmor policy is mantained as part of libvirt itself, could we perhaps just patch it so that it allows QEMU binaries installed under /usr/libexec to run? We already have similar allowances in place for other binaries, for example virtiofsd, so applying the approach more broadly doesn't sound too far fetched. I could submit a patch against libvirt. Once the updated libvirt package makes its way to the various distros, the issue would then go away on its own.

OTOH, doesn't the AppArmor profile affect the virt-handler container only because the libvirt package is installed on the host? Does it even make sense to have that installed, if you're going to be using the host as a node in a Kubernetes cluster? Perhaps we should document that the libvirt package should not be installed on the host when KubeVirt is going to be deployed?

Apologies if the questions are silly, I'm just not very familiar with how SELinux / AppArmor interact with containers.

@vasiliy-ul
Copy link
Contributor Author

@andreabolognani ,

@vasiliy-ul this looks very hacky and makes me uncomfortable, mainly because the binary path would then not match between the capabilities XML generated by virt-handler and virt-launcher.

Yes, this is what I do not like either in the approach.

Are the capabilities generated by virt-handler only used for the purpose of node labeling? And is the libvirtd instance running inside each virt-launcher container then generating its own capabilities from scratch at startup?

AFAIK, yes, the XML generated by node-labeller.sh is used only for labeling the node, and it is not used by virt-launcher.

IIUC the denial is caused by the fact that libvirtd running inside the virt-handler container gets the host AppArmor policy applied to it. If that's the case, since the AppArmor policy is mantained as part of libvirt itself, could we perhaps just patch it so that it allows QEMU binaries installed under /usr/libexec to run? We already have similar allowances in place for other binaries, for example virtiofsd, so applying the approach more broadly doesn't sound too far fetched. I could submit a patch against libvirt. Once the updated libvirt package makes its way to the various distros, the issue would then go away on its own.

I think it is already fixed to a certain extent in the latest libvirt profile. Mentioned it here: #8692 (comment)

The addition of

@libexecdir@/* PUxr,

should cover the KubeVirt case (though only if @libexecdir@ points to /usr/libexec)

OTOH, doesn't the AppArmor profile affect the virt-handler container only because the libvirt package is installed on the host? Does it even make sense to have that installed, if you're going to be using the host as a node in a Kubernetes cluster? Perhaps we should document that the libvirt package should not be installed on the host when KubeVirt is going to be deployed?

We already have some documentation on how to adapt the host AppArmor profile to the KubeVirt case. IMHO, I don't think we should restrict users in what they might have installed on the host.

Apologies if the questions are silly, I'm just not very familiar with how SELinux / AppArmor interact with containers.

No, in fact good questions. Thanks for those. I am not insisting on moving the PR in, especially since it raises doubts and concerns. This is more like a quick workaround to address the problem. As already mentioned in the mailing list, perhaps having the use-case documented is just good enough.

@vasiliy-ul
Copy link
Contributor Author

/hold

@kubevirt-bot kubevirt-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 31, 2022
@andreabolognani
Copy link
Contributor

I think it is already fixed to a certain extent in the latest libvirt profile. Mentioned it here: #8692 (comment)

The addition of

@libexecdir@/* PUxr,

should cover the KubeVirt case (though only if @libexecdir@ points to /usr/libexec)

That's not the case for the Debian / Ubuntu package, which passes

--libexecdir=/usr/lib/libvirt

to meson.

But that's for internal libvirt components such as libvirt_parthelper: QEMU is an external program, and libvirt can't assume that it will have been configured with a matching libexecdir value. This is why for virtiofsd we have

/usr/{lib,lib64,lib/qemu,libexec}/virtiofsd PUx,

I don't see why we couldn't add a similar rule for QEMU.

OTOH, doesn't the AppArmor profile affect the virt-handler container only because the libvirt package is installed on the host? Does it even make sense to have that installed, if you're going to be using the host as a node in a Kubernetes cluster? Perhaps we should document that the libvirt package should not be installed on the host when KubeVirt is going to be deployed?

We already have some documentation on how to adapt the host AppArmor profile to the KubeVirt case.

Can you please point me to it?

IMHO, I don't think we should restrict users in what they might have installed on the host.

Is it common practice to run workloads unrelated to Kubernetes on a Kubernetes node? I would have thought that you'd dedicate the machine fully to that role, but perhaps that doesn't match reality.

@vasiliy-ul
Copy link
Contributor Author

vasiliy-ul commented Oct 31, 2022

I don't see why we couldn't add a similar rule for QEMU.

Yeah, I think that would be the right thing to do 👍

Can you please point me to it?

It's been added recently: https://kubevirt.io/user-guide/operations/installation/#integration-with-apparmor

Is it common practice to run workloads unrelated to Kubernetes on a Kubernetes node? I would have thought that you'd dedicate the machine fully to that role, but perhaps that doesn't match reality.

Well, it depends, I guess... I just think that restrictions on the host side make KubeVirt less user-friendly. Especially when this particular problem can be solved via adapting the profile on the host.

@andreabolognani
Copy link
Contributor

I don't see why we couldn't add a similar rule for QEMU.

Yeah, I think that would be the right thing to do +1

I'll prepare a libvirt patch later this week.

Can you please point me to it?

It's been added recently: https://kubevirt.io/user-guide/operations/installation/#integration-with-apparmor

Perhaps we could amend that to suggest uninstalling libvirt as a first possible workaround.

Users might not actually need libvirt on the host, and overall it's much simpler to uninstall a package than it is to edit configuration files. The latter might also result in prompts on package upgrades.

@andreabolognani
Copy link
Contributor

I'll prepare a libvirt patch later this week.

Patch posted to the list: https://listman.redhat.com/archives/libvir-list/2022-November/235395.html

@rmohr
Copy link
Member

rmohr commented Nov 7, 2022

@vasiliy-ul this looks very hacky and makes me uncomfortable, mainly because the binary path would then not match between the capabilities XML generated by virt-handler and virt-launcher.

Yes, this is what I do not like either in the approach.

This may actually be a concern for me. I carry a a patch which I could not contribute yet, where I am copying capability files between virt-handler and virt-launcher, to significantly speed up VM starts in virt-launcher. If the capabilities look different for libvirt it may attempt to retry the discovery ...

So I would appreciate any approach which would not move the binary. I am however also interesed in AppArmor support :)

@andreabolognani
Copy link
Contributor

If the capabilities look different for libvirt it may attempt to retry the discovery ...

It definitely would.

From the libvirt side (full discussion here), there's confusion as to whether the interactions that we see today between the host's AppArmor configuration and the libvirtd process running inside the virt-handler container are working as intended.

IIUC the whole issue stems from the fact that, when spawned by virt-handler, libvirtd is running in a privileged context whereas that's not the case when spawned by virt-launcher.

I understand that virt-handler needs to be privileged, but is there any way that the libvirtd part could be moved to a separate, unprivileged container?

It seems to me like that would be a good idea in general, because it would mean that libvirtd would always have the same privileges, regardless of context. I can't point to any specific instance right now, but I wouldn't be surprised if the privileged libvirtd spawned by virt-handler would advertise more capabilities than the unprivileged one spawned by virt-launcher, potentially leading to a mismatch.

Would this be feasible at all? Or are there reasons why it can't be done?

@rmohr
Copy link
Member

rmohr commented Nov 7, 2022

I understand that virt-handler needs to be privileged, but is there any way that the libvirtd part could be moved to a separate, unprivileged container?

I think right now the only reason why we have it still privileged is a chicken-and-egg problem. virt-handler brings up the kvm device plugin which the discovery would need in the init-container to run unprivileged. So if we can make /dev/kvm somehow in an unprivileged fashion available to the node-labeller container, we can run it unprivileged.

@andreabolognani
Copy link
Contributor

I think right now the only reason why we have it still privileged is a chicken-and-egg problem. virt-handler brings up the kvm device plugin which the discovery would need in the init-container to run unprivileged. So if we can make /dev/kvm somehow in an unprivileged fashion available to the node-labeller container, we can run it unprivileged.

@rmohr this is all pretty in-depth when it comes to the inner working of KubeVirt, but I think I more or less get the gist.

Would it be possible to turn the node-labeller job from an init container into a regular one-off pod that virt-handler creates once /dev/kvm has been made available, and whose completion it waits for before proceeding with the rest of the node setup?

I'm sure it's more complicated than that :) but do you think it would be doable?

Honestly the pushback I've been getting about changing the AppArmor policy to allow running /usr/libexec/qemu-kvm is understandable. It'd be less of a hack than what's being proposed here, but not necessarily by a large margin. Figuring out a way to make the node-labeller job run in an unprivileged context is probably the only proper solution.

I think in the short term we should just document uninstalling libvirt in the host as the preferred workaround for the AppArmor issues, in addition to the ones that are already documented.

@rmohr
Copy link
Member

rmohr commented Nov 8, 2022

Would it be possible to turn the node-labeller job from an init container into a regular one-off pod that virt-handler creates once /dev/kvm has been made available, and whose completion it waits for before proceeding with the rest of the node setup?

I'm sure it's more complicated than that :) but do you think it would be doable?

Yes something like that could work. Just complicates things a bit. Another option would be to run the node-labeller as a sidecar instead of an init-container. It could run unprivileged. virt-handler could boot up to a certain point, run mknod in the side-car container, then the discovery happens in unprivileged mode, and once the files appear in the shared location, virt-handler continues.

I think in the short term we should just document uninstalling libvirt in the host as the preferred workaround for the AppArmor issues, in addition to the ones that are already documented.

Yes that sounds like a reasonable short-term solution for me. @vasiliy-ul ?

@andreabolognani
Copy link
Contributor

Would it be possible to turn the node-labeller job from an init container into a regular one-off pod that virt-handler creates once /dev/kvm has been made available, and whose completion it waits for before proceeding with the rest of the node setup?

I'm sure it's more complicated than that :) but do you think it would be doable?

Yes something like that could work. Just complicates things a bit. Another option would be to run the node-labeller as a sidecar instead of an init-container. It could run unprivileged. virt-handler could boot up to a certain point, run mknod in the side-car container, then the discovery happens in unprivileged mode, and once the files appear in the shared location, virt-handler continues.

Tracking issue opened: #8744

I think in the short term we should just document uninstalling libvirt in the host as the preferred workaround for the AppArmor issues, in addition to the ones that are already documented.

Yes that sounds like a reasonable short-term solution for me. @vasiliy-ul ?

PR opened: kubevirt/user-guide#618

@andreabolognani
Copy link
Contributor

@vasiliy-ul now that we have an issue tracking the implementation of a proper solution and the documentation has been enhanced, can we close this PR?

@vasiliy-ul
Copy link
Contributor Author

Yep, closing it.

@vasiliy-ul vasiliy-ul closed this Nov 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dco-signoff: yes Indicates the PR's author has DCO signed all their commits. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Kubevirt virt-handler Init:CrashLoopBackOff Virt handler failed start in the kind k8s cluster on X86_64
5 participants