Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Host Memory Exhaustion Attack from Inside Kata Containers #3373

Open
WatchDoug opened this issue Jan 3, 2022 · 16 comments
Open

Host Memory Exhaustion Attack from Inside Kata Containers #3373

WatchDoug opened this issue Jan 3, 2022 · 16 comments
Labels
bug Incorrect behaviour security Potential or actual security issue

Comments

@WatchDoug
Copy link

WatchDoug commented Jan 3, 2022

This is Yutian Yang from Zhejiang University. Our team have discovered
a new attack from inside Kata containers, leading to host memory
exhaustion.

The root cause lies in the Linux kernel, even in the latest version.
Briefly speaking, the kernel memcg does not charge posix locks
allocated by user processes. The bug report and patches can be found
at
https://lore.kernel.org/linux-mm/20210902215519.AWcuVc3li%25akpm@linux-foundation.org/

Unfortunately, we find that even virtualized containers like Kata are
also affected by this bug. With "-o posix_lock" option enabled, Kata
runtime forwards posix lock allocation requests to virtiofsd on the
host. The virtiofsd then allocates posix locks on behalf. Although
virtiofsd processes are limited by memcg, memory consumption of posix
locks in kernel are not properly charged. Attackers inside containers
can thus allocate a huge number of posix locks to run out of all
memory on the node. Note that the number of posix locks are not
limited by rlimit/sysctl by default.

We have developed a PoC that causes host memory exhaustion. We
are glad to share them via emails if you are interested in reproducing
the attack.

We also want to discuss whether there is a way to mitigate such problems.
A quick mitigation could be disabling the "-o posix_lock" option. However,
how can we enable the functionality without triggering kernel bugs before
they are patched in the kernel code?

@WatchDoug WatchDoug added bug Incorrect behaviour needs-review Needs to be assessed by the team. labels Jan 3, 2022
@fidencio
Copy link
Member

fidencio commented Jan 3, 2022

@WatchDoug, thanks for raising the issue. I see this as a possible CVE, with the issue not being on the Kata Containers itself.
For now, I'd like to ask you to treating this as a CVE, and thus following our CVE process (please, see: https://github.com/kata-containers/community/blob/main/VMT/VMT.md).

Also, I'd like to take a look at the PoC, and that could be done via https://launchpad.net/katacontainers.io, as everything there can be done privately.

Last but not least, let's discuss the mitigation plans and whatnot in the issue opened on launchpad.

/cc @kata-containers/architecture-committee!

And a huge thanks to @gkurz for promptly bringing this to our attention!

@gkurz
Copy link
Member

gkurz commented Jan 3, 2022

@WatchDoug, thanks for raising the issue. I see this as a possible CVE, with the issue not being on the Kata Containers itself. For now, I'd like to ask you to treating this as a CVE, and thus following our CVE process (please, see: https://github.com/kata-containers/community/blob/main/VMT/VMT.md).

Also, I'd like to take a look at the PoC, and that could be done via https://launchpad.net/katacontainers.io, as everything there can be done privately.

Last but not least, let's discuss the mitigation plans and whatnot in the issue opened on launchpad.

/cc @kata-containers/architecture-committee!

And a huge thanks to @gkurz for promptly bringing this to our attention!

Please Cc me (gkurz) on the launchpad issue as well.

@gkurz
Copy link
Member

gkurz commented Jan 4, 2022

Discussion has moved to launchpad.

@gkurz gkurz removed the needs-review Needs to be assessed by the team. label Jan 4, 2022
@rhvgoyal
Copy link

rhvgoyal commented Jan 4, 2022

Nice find. I think using "-o no_posix_lock" is the short term mitigation of the issue till we figure out a proper way to handle it.

BTW, remote posix locks are disabled by default in virtiofsd. So to run into this issue, one will have to explicitly enable it. Following is the commit which disabled remote posix locks by default.

commit 88fc107956a5812649e5918e0c092d3f78bb28ad
Author: Vivek Goyal vgoyal@redhat.com
Date: Mon Jul 27 12:18:41 2020 -0400

virtiofsd: Disable remote posix locks by default

Remote posix locks are useful only if a filesystem is shared across multiple VMs using virtiofs. I believe kata is using virtiofs only for rootfs which is prepared separately for each VM using overlayfs. Hence no sharing. And hence kata should not be needing to enable remote posix locks.

@rhvgoyal
Copy link

rhvgoyal commented Jan 4, 2022

Is kata enabling posix locks by default? Is there a reason they need remote posix locks enabled? Anyway functionality is not complete. It does not support waiting posix locks.

So I believe that first thing we should probably do is not enable remote posix locks in kata by default.

@rhvgoyal
Copy link

rhvgoyal commented Jan 4, 2022

Is this specific to virtiofsd only? Can a simple privileged (and unprivileged) process on host drive system out of memory without being OOM killed?

@gkurz
Copy link
Member

gkurz commented Jan 4, 2022

Is kata enabling posix locks by default? Is there a reason they need remote posix locks enabled? Anyway functionality is not complete. It does not support waiting posix locks.

No kata explicitely passes "-o no_posix_lock" for other reasons but it is possible for the end user to provide extra options that get appended to the virtiofsd command line.

So I believe that first thing we should probably do is not enable remote posix locks in kata by default.

kata might be able to filter out options that the user should really not pass.

@gkurz
Copy link
Member

gkurz commented Jan 4, 2022

Is this specific to virtiofsd only? Can a simple privileged (and unprivileged) process on host drive system out of memory without being OOM killed?

I haven't tried yet but looking at the kernel fix it looks like it can happen with any process.

@rhvgoyal
Copy link

rhvgoyal commented Jan 4, 2022

I guess simplest short term fix is for kata to disallow option "-o posix_lock", till a proper long term fix gets committed to Linux kernel.

@haslersn
Copy link
Contributor

I believe kata is using virtiofs only for rootfs

Also for filesystem PVCs (Kubernetes terminology), right?

@gkurz
Copy link
Member

gkurz commented Jan 11, 2022

I believe kata is using virtiofs only for rootfs

Also for filesystem PVCs (Kubernetes terminology), right?

yes

@haslersn
Copy link
Contributor

For filesystem PVCs I'd argue it's important to enable remote locks. (Blocking remote locks aren't supported, yet, but that's another story. Better not supported than data corruption.)

@gkurz
Copy link
Member

gkurz commented Jan 11, 2022

For filesystem PVCs I'd argue it's important to enable remote locks. (Blocking remote locks aren't supported, yet, but that's another story. Better not supported than data corruption.)

Remote locks currently have several issues that justify they aren't enabled by default IMHO. Users can still ask their admin to enable the 'virtio_fs_extra_args' annotation so that they can pass '-o posix_lock' themselves if they're doing locking in a shared directory.

@haslersn
Copy link
Contributor

haslersn commented Jan 11, 2022

@gkurz The problem is that sysadmins are not always aware that an app performs locking (Typically an app developer doesn't document this requirement, because on runc it just works) and -o no_posix_lock doesn't lead to an error but rather to local locking semantics. So the sysadmins will not consider enabling it and will one day wonder why their app corrupted its data.

@gkurz
Copy link
Member

gkurz commented Jan 11, 2022

@haslersn I understand your concern but this is out the scope. With or without kata, POSIX locks can currently be used by a container to hog the host memory : this issue is just about finding a mitigation.

Please contact virtiofs people for the final availability of POSIX locks in C virtiofsd (part of QEMU, work in progress) and rust virtiofsd (not started yet).

@gkurz gkurz added the security Potential or actual security issue label Jan 25, 2022
@00xc
Copy link

00xc commented Feb 3, 2022

Discussion has moved to launchpad.

Hi, I cannot access the discussion here. I'd like to know if/when the bug will be opened to the public. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behaviour security Potential or actual security issue
Projects
Issue backlog
  
To do
Development

No branches or pull requests

6 participants