Running many containers requires bumping up inotify limits #1044

Closed
jpetazzo opened this Issue Jun 27, 2013 · 10 comments

Comments

Projects
None yet
8 participants
Contributor

jpetazzo commented Jun 27, 2013

It has been reported that it wasn't possible to run more than 118 containers without hitting a inotify limit[1]. We had already met that limit on the dotCloud platform. All you need is to bump up fs.inotify.max_users_instances, with e.g.:

sysctl -w fs.inotify.max_user_instances=8192

One idea is to display a warning when getting close to the limit; but that is difficult, because it is not possible to cheaply detect how many inotify instances are being used (see [2]).

Another (IMHO better) idea is to parse the output of lxc-start, and when we see Too many open files - failed to inotify_init, we could either bump up the limit automatically, or tell the user.

I think it is safe to bump up the limit automatically up to a fairly generous number (on the PAAS we use 8192), but not above, because if the Docker host is running another software that leaks inotify instances, we would end up DOS-ing the host by virtually lifting the limit... Not good.

So, to wrap up:

  • bump up the limit to 8192 when Docker starts
  • when lxc-start fails, parse output, and if we detect inotify_init, display the warning, prompting the user to raise the limit themselves if they want to.

[1] https://twitter.com/losnggeneration/status/349973683672059904
[2] http://unix.stackexchange.com/questions/15509/whos-consuming-my-inotify-resources

Collaborator

vieux commented Jun 28, 2013

On my system:

$> sysctl fs.inotify.max_user_instances
fs.inotify.max_user_instances = 128

And yet I can start more than 128 containers (I don't know how many, I reach No unallocated IP available before, but it works well with 215 for exemple)

Do you know why ?

Contributor

jpetazzo commented Jun 28, 2013

Interesting. Which version of the lxc userland tools are you using?

Collaborator

vieux commented Jun 28, 2013

lxc version: 0.7.5

On Fri, Jun 28, 2013 at 6:24 PM, Jérôme Petazzoni
notifications@github.comwrote:

Interesting. Which version of the lxc userland tools are you using?


Reply to this email directly or view it on GitHubhttps://github.com/dotcloud/docker/issues/1044#issuecomment-20198658
.

Victor VIEUX
http://vvieux.com

Contributor

dsissitka commented Aug 16, 2013

Is this still an issue?

$ sudo docker version
Client version: 0.5.3
Server version: 0.5.3
Go version: go1.1
$ lxc-version
lxc version: 0.9.0
$ sysctl fs.inotify.max_user_instances
fs.inotify.max_user_instances = 128
$ cat test.sh
#!/bin/bash

set -e

for i in {1..1000}
do
    echo $i
    docker run -d ubuntu sleep 60
done
$ sudo ./test.sh > log 2>&1
$ tail -n 2 log
254
2013/08/15 20:12:26 Error: Error starting container 07a77c313e45: pipe2: too many open files
$
Contributor

EvanKrall commented Feb 28, 2014

A similar issue can be seen when AUFS finally leaks ~1 million anonymous block devices: the mount syscall returns EMFILE, which docker interprets as "too many open files". This is a bug in AUFS: sfjro/aufs3-linux#1

It looks like:

krall@docker1-devc:~$ docker pull busybox
Pulling repository busybox
769b9341d937: Error pulling image (latest) from busybox, Driver aufs failed to get image rootfs bf747efa0e2fa9f7c691588ce3938944c75607a7bb5e757f7369f86904d97c78: too many open files a7bb5e757f7369f86904d97c78: too many open files
bf747efa0e2f: Error downloading dependent layers
2014/02/28 00:41:01 pull: Could not find repository on any of the indexed registries.
krall@docker1-devc:~$
krall@docker1-devc:~$ docker run -t -i -rm 539c0211cd76 /bin/echo hi
2014/02/28 00:40:40 Error: create: too many open files
Contributor

crosbymichael commented Feb 28, 2014

@EvanKrall are you able to try out a different driver like devicemapper or btrfs to see if you still get the errors?

Contributor

EvanKrall commented Feb 28, 2014

Once you run the kernel out of anonymous block devices with the AUFS bug, lxc-start has a hard time calling clone:

[pid  5552] 11:32:43.527336 clone(child_stack=0x7fff6eea8bf0, flags=CLONE_NEWNS|0x6c000000|SIGCHLD) = -1 EMFILE (Too many open files) <0.000488>
krall@docker1-devc:~$ docker run -t -i -rm busybox /bin/echo hi
lxc-start: Cannot allocate memory - failed to clone

lxc-start: failed to clone(0x6c020000): Too many open files
lxc-start: Too many open files - failed to fork into a new namespace
lxc-start: failed to spawn 'ff225359394251326a905ac6603715487bc19837ad6e2dffe259ac8bbd56e555'
lxc-start: No such file or directory - failed to remove cgroup '/sys/fs/cgroup/cpu/sysdefault/lxc/ff225359394251326a905ac6603715487bc19837ad6e2dffe259ac8bbd56e555'
[error] commands.go:2530 Error getting size: bad file descriptor
krall@docker1-devc:~$

A reboot will clear this up.

I suspect devicemapper doesn't leak anonymous block devices the way AUFS does, but I haven't tested that.

I've tried both AUFS and devicemapper, and both are unable to run more than 252 containers (each docker run -d busybox /bin/ash -c "while true; do echo alive; sleep 60000; done"). The error I get is pipe2: too many open files. I'm happy to play around with DOCKER_OPTS more to help debug, but I'll need some guidance on how to proceed ;-)

I've already upped fs.inotify.max_user_instances as recommended above. As both drivers fail right at 252, I'm guessing there is some other limit that I'm unaware of

For Fedora, we sent this:
291d5e6

For Ubuntu, I believe you add a ulimit override to /etc/init/docker.conf:
limit nofile
service docker restart

ashahba commented Jan 17, 2017

I ran into the same error but only when attempting to restart docker services and I was able to resolve it with setting:
sysctl -w fs.inotify.max_user_instances=8192

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment