Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server silently fails to set memory cgroup, does not report to user or fail to #24559

Closed
jgarcia-mesosphere opened this issue Jul 12, 2016 · 15 comments
Labels
area/kernel kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed.

Comments

@jgarcia-mesosphere
Copy link

Environment details (AWS, VirtualBox, physical, etc.):
Using Docker 1.83 in CoreOS 835.6, we noticed that in conditions that make it impossible to set a memory cgroup:

/sys/fs/cgroup/memory/mesos $ df .
Filesystem     1K-blocks  Used Available Use% Mounted on
cgroup                 0     0         0    - /sys/fs/cgroup/memory
/sys/fs/cgroup/memory/mesos $ sudo mkdir foo
mkdir: cannot create directory 'foo': No space left on device

Subsequently, Docker permits containers to be created without memory isolation (instead of failing), resulting in a container with unbounded memory capacity - note the cgroup structure created:

/sys/fs/cgroup/memory/mesos $ docker run -m 32m -d example/busybox sleep 10000

...

/sys/fs/cgroup/memory/mesos $ docker ps | grep busybox
849c66081229        example/busybox                                                         "sleep 10000"            6 seconds ago       Up 4 seconds                                                                                    suspicious_mahavira

/sys/fs/cgroup/memory/mesos $ find /sys/fs/cgroup -name "*849c66081229*"
/sys/fs/cgroup/blkio/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/freezer/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/devices/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/cpu,cpuacct/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/cpuset/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/net_cls,net_prio/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/systemd/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/memory/mesos $ 

Steps to reproduce the issue:

  1. Create at least 65336 memory cgroups. An easy way to do this is to start 65k containers with memory isolation enabled ( -m or --memory option)
  2. Stop containers/remove cgroup folders
  3. Check /proc/cgroups to verify there are 65335 memory cgroups
  4. Start a memory-isolated container using -m or --memory option
  5. Check cgroup hierarchy for the container's hash, note that the memory cgroup for the container is not in the /sys/fs/cgroup/memory/system.slice/docker-* folder as expected

Describe the results you received:
We saw what appeared to be the docker daemon silently fail to set memory isolation.

Describe the results you expected:
We expected the daemon to refuse to run the container because isolation failed.

Additional information you deem important (e.g. issue happens only occasionally):
This is also tracked in MESOS-5836 and a patch has been suggested for the kernel at patch 9184539

@justincormack justincormack added the kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. label Jul 13, 2016
@justincormack
Copy link
Contributor

So the kernel issue means that if you have ever created 64k containers with memory bounds, even if they are no longer running, the system will no longer be able to create containers (if it wasnt for the bug that means they are instead unconfined)?

@jgarcia-mesosphere
Copy link
Author

jgarcia-mesosphere commented Jul 13, 2016

We found that using Docker 1.83 with kernel 4.2, 4.4, and 4.5, containers <64k will create normally with memory isolation; containers >64k will create, but with no isolation. We did not test newer versions. Our tests used CoreOS 835, CoreOS 1010, and Ubuntu 16.04. We've reported this to CoreOS and kernel.org for upstream attention.

@jgarcia-mesosphere
Copy link
Author

I tested kernel 4.6(Arch) and it fixes this, using the patch noted above. Still a valuable advisory for folks with unpatched kernels. I leave it to the project to decide if this should be closed.

@jgarcia-mesosphere
Copy link
Author

Also tracked here: https://bugzilla.kernel.org/show_bug.cgi?id=124641

@jgarcia-mesosphere
Copy link
Author

The fix is also going to backport to 4.4. https://lkml.org/lkml/2016/7/13/864 I think this can be closed.

@thaJeztah
Copy link
Member

Thanks so much for the detailed report and links, @jgarcia-mesosphere, really appreciated.

I think it's okay to keep it open for a short time to make it easier to find, but we can close after that (and when we know the "mainstream" distros carry this patch)

/cc @sforshee @runcom @tianon @cyphar ^^

@brauner
Copy link
Contributor

brauner commented Jul 14, 2016

On Tue, Jul 12, 2016 at 03:16:40PM -0700, John Garcia wrote:

Environment details (AWS, VirtualBox, physical, etc.):
Using Docker 1.83 in CoreOS 835.6, we noticed that in conditions that make it impossible to set a memory cgroup:

/sys/fs/cgroup/memory/mesos $ df .
Filesystem     1K-blocks  Used Available Use% Mounted on
cgroup                 0     0         0    - /sys/fs/cgroup/memory
/sys/fs/cgroup/memory/mesos $ sudo mkdir foo
mkdir: cannot create directory 'foo': No space left on device

Subsequently, Docker permits containers to be created without memory isolation (instead of failing), resulting in a container with unbounded memory capacity - note the cgroup structure created:

I'm interested in this part of issue. Can you reproduce this using the latest
Docker version keeping all other parameters fixed (i.e. same kernel etc.)? Even
if the kernel bug is fixed this sounds like a possible bug in Docker. I'd expect
it to fail when it can't fulfill the requested memory isolation. But maybe I'm
off with this.

@brauner
Copy link
Contributor

brauner commented Jul 14, 2016

So I can't reproduce this issue on a 4.6 kernel with:

  1. Create 65536 memory cgroups to trigger the 64k limit.
for i in `seq 1 65536`; do mkdir dummy_$i; done
Client:
 Version:      1.11.2
 API version:  1.23
 Go version:   go1.6.2
 Git commit:   9e83765
 Built:
 OS/Arch:      linux/amd64

Server:
 Version:      1.11.2
 API version:  1.23
 Go version:   go1.6.2
 Git commit:   9e83765
 Built:
 OS/Arch:      linux/amd64
  1. Try to start a container with memory confinement
sudo docker run -it --rm debian:jessie -m 32M
docker: Error response from daemon: rpc error: code = 2 desc = "oci runtime error: mkdir /sys/fs/cgroup/memory/docker/e2a1250ad119c65c787e04bed75425626b10a80843d2670e9fcc2231d62afb6c: no space left on device".

Fwiw, result is identical when no memory confinement is specified. Will try to reproduce on 4.4/4.5 with newer Docker version.

@jgarcia-mesosphere
Copy link
Author

Thanks for the repro @brauner looks to me like everything is good in the new version based on that.

@thaJeztah
Copy link
Member

@justincormack wdyt, should we have a look at that error (for unpatched kernels?)

@brauner
Copy link
Contributor

brauner commented Jul 15, 2016

I also can't reproduce this with the latest Docker version on an unpatched 4.4 kernel:

  1. Create 65536 memory cgroups to trigger the 64k limit.
for i in `seq 1 65536`; do mkdir dummy_$i; done
Client:
 Version:      1.11.2
 API version:  1.23
 Go version:   go1.6.2
 Git commit:   9e83765
 Built:
 OS/Arch:      linux/amd64

Server:
 Version:      1.11.2
 API version:  1.23
 Go version:   go1.6.2
 Git commit:   9e83765
 Built:
 OS/Arch:      linux/amd64
  1. Try to start a container with memory confinement
sudo docker run -it --rm  -m 32M debian:jessie
docker: Error response from daemon: rpc error: code = 2 desc = "oci runtime error: mkdir /sys/fs/cgroup/memory/docker/e2a1250ad119c65c787e04bed75425626b10a80843d2670e9fcc2231d62afb6c: no space left on device".

@cyphar
Copy link
Contributor

cyphar commented Jul 15, 2016

Yeah, all of the versions of runC that I can recall will hard fail if you ask it to set up a cgroup and then it can't set it up. It might've been some weirdness happening with libcontainer one year ago that we've since fixed (@brauner can you check that 1.10.3 doesn't have this problem as well -- since we're still shipping that at the moment).

@brauner
Copy link
Contributor

brauner commented Jul 18, 2016

@cyphar using the same procedure and kernel 4.4, also not reproducible with Docker 1.10.3.

@thaJeztah
Copy link
Member

Thanks all! Based on the feedback above, I'm closing this; the underlying issue will be fixed in the kernel, and recent versions of Docker won't silently ignore the problem, so looks like we're good to go 👍

@justincormack
Copy link
Contributor

I believe 4.4.19 stable kernel has the fix so this is no longer an issue (finally).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kernel kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed.
Projects
None yet
Development

No branches or pull requests

5 participants