New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Host Memory Exhaustion Attack from Inside Kata Containers #3373
Comments
|
@WatchDoug, thanks for raising the issue. I see this as a possible CVE, with the issue not being on the Kata Containers itself. Also, I'd like to take a look at the PoC, and that could be done via https://launchpad.net/katacontainers.io, as everything there can be done privately. Last but not least, let's discuss the mitigation plans and whatnot in the issue opened on launchpad. /cc @kata-containers/architecture-committee! And a huge thanks to @gkurz for promptly bringing this to our attention! |
Please Cc me (gkurz) on the launchpad issue as well. |
|
Discussion has moved to launchpad. |
|
Nice find. I think using "-o no_posix_lock" is the short term mitigation of the issue till we figure out a proper way to handle it. BTW, remote posix locks are disabled by default in virtiofsd. So to run into this issue, one will have to explicitly enable it. Following is the commit which disabled remote posix locks by default. commit 88fc107956a5812649e5918e0c092d3f78bb28ad Remote posix locks are useful only if a filesystem is shared across multiple VMs using virtiofs. I believe kata is using virtiofs only for rootfs which is prepared separately for each VM using overlayfs. Hence no sharing. And hence kata should not be needing to enable remote posix locks. |
|
Is kata enabling posix locks by default? Is there a reason they need remote posix locks enabled? Anyway functionality is not complete. It does not support waiting posix locks. So I believe that first thing we should probably do is not enable remote posix locks in kata by default. |
|
Is this specific to virtiofsd only? Can a simple privileged (and unprivileged) process on host drive system out of memory without being OOM killed? |
No kata explicitely passes "-o no_posix_lock" for other reasons but it is possible for the end user to provide extra options that get appended to the virtiofsd command line.
kata might be able to filter out options that the user should really not pass. |
I haven't tried yet but looking at the kernel fix it looks like it can happen with any process. |
|
I guess simplest short term fix is for kata to disallow option "-o posix_lock", till a proper long term fix gets committed to Linux kernel. |
Also for filesystem PVCs (Kubernetes terminology), right? |
yes |
|
For filesystem PVCs I'd argue it's important to enable remote locks. (Blocking remote locks aren't supported, yet, but that's another story. Better not supported than data corruption.) |
Remote locks currently have several issues that justify they aren't enabled by default IMHO. Users can still ask their admin to enable the 'virtio_fs_extra_args' annotation so that they can pass '-o posix_lock' themselves if they're doing locking in a shared directory. |
|
@gkurz The problem is that sysadmins are not always aware that an app performs locking (Typically an app developer doesn't document this requirement, because on runc it just works) and |
|
@haslersn I understand your concern but this is out the scope. With or without kata, POSIX locks can currently be used by a container to hog the host memory : this issue is just about finding a mitigation. Please contact virtiofs people for the final availability of POSIX locks in C virtiofsd (part of QEMU, work in progress) and rust virtiofsd (not started yet). |
Hi, I cannot access the discussion here. I'd like to know if/when the bug will be opened to the public. Thanks! |
This is Yutian Yang from Zhejiang University. Our team have discovered
a new attack from inside Kata containers, leading to host memory
exhaustion.
The root cause lies in the Linux kernel, even in the latest version.
Briefly speaking, the kernel memcg does not charge posix locks
allocated by user processes. The bug report and patches can be found
at
https://lore.kernel.org/linux-mm/20210902215519.AWcuVc3li%25akpm@linux-foundation.org/
Unfortunately, we find that even virtualized containers like Kata are
also affected by this bug. With "-o posix_lock" option enabled, Kata
runtime forwards posix lock allocation requests to virtiofsd on the
host. The virtiofsd then allocates posix locks on behalf. Although
virtiofsd processes are limited by memcg, memory consumption of posix
locks in kernel are not properly charged. Attackers inside containers
can thus allocate a huge number of posix locks to run out of all
memory on the node. Note that the number of posix locks are not
limited by rlimit/sysctl by default.
We have developed a PoC that causes host memory exhaustion. We
are glad to share them via emails if you are interested in reproducing
the attack.
We also want to discuss whether there is a way to mitigate such problems.
A quick mitigation could be disabling the "-o posix_lock" option. However,
how can we enable the functionality without triggering kernel bugs before
they are patched in the kernel code?
The text was updated successfully, but these errors were encountered: