New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to allocate directory watch: Too many open files #2252

Closed
naisanza opened this Issue Aug 1, 2016 · 10 comments

Comments

4 participants
@naisanza
Copy link
Contributor

naisanza commented Aug 1, 2016

Required information

  • Distribution: Ubuntu
  • Distribution version: 16.04
  • The output of "lxc info" or if that fails:
    • Kernel version: 4.4.0-31-generic
    • LXC version: 2.0.3
    • LXD version: 2.0.3
    • Storage backend in use: ZFS

Issue description

I have 15 LXD Containers running and some containers are failing to function correctly because there are too many files open.

The server is running Vanilla Ubuntu 16.04 Server (up-to-date) with all configuration modifications listed below:

root@bigma:~# cat server_changes.txt
#/etc/security/limits.conf
*       hard    nofile  1048576
*       soft    nofile  1048576
*       soft    memlock unlimited
*       hard    memlock unlimited

#vm.max_map_count = 65530
sysctl -w vm.max_map_count=262144

# /etc/apparmor.d/lxc/lxc-default
# /etc/init.d/apparmor reload
mount options=(rw, bind, ro),
mount fstype=(ecryptfs),

Number of files open by lsof:

root@bigma:~# lsof 2>/dev/null | wc -l
75675

Steps to reproduce

  1. Create 15 LXD Containers
  2. Start 15 LXD Containers

Information to attach

  • any relevant kernel output (dmesg) none relevant
  • container log (lxc info NAME --show-log) available if needed
  • main daemon log (/var/log/lxd.log) available if needed
  • output of the client with --debug available if needed
  • output of the daemon with --debug available if needed
@pcdummy

This comment has been minimized.

Copy link
Contributor

pcdummy commented Aug 1, 2016

Did you reboot after applying limits.conf?

@naisanza

This comment has been minimized.

Copy link
Contributor Author

naisanza commented Aug 1, 2016

@pcdummy I reloaded the sysctl variables, close all sessions, closed all containers, closed all LXD services, even closed the ZFS pools

http://www.commandlinefu.com/commands/view/11891/reload-all-sysctl-variables-without-reboot

@pcdummy

This comment has been minimized.

Copy link
Contributor

pcdummy commented Aug 1, 2016

But you didn't reboot, right? Ulimit doesn't apply without a reboot, at least for me not.

@stgraber

This comment has been minimized.

Copy link
Member

stgraber commented Aug 1, 2016

You're not bumping the right limit. That error is almost certainly an inotify limit. Try bumping the ones in /proc/sys/fs/inotify

Those aren't namespaced yet, so you need to bump them on the host to affect the containers. There's plan in the upstream kernel to have those tied to a user namespace, which means that in most cases you won't run out anymore.

@naisanza

This comment has been minimized.

Copy link
Contributor Author

naisanza commented Aug 1, 2016

@stgraber I've updated with the following key-values:

# /etc/sysctl.conf
# fs.inotify.max_queued_events = 16384
# fs.inotify.max_user_instances = 128
# fs.inotify.max_user_watches = 8192
fs.inotify.max_queued_events=1048576
fs.inotify.max_user_instances=1048576
fs.inotify.max_user_watches=1048576

I'll test it over the next couple days

In the meanwhile, does LXD have an official production server configuration best practices for things like this?

@stgraber

This comment has been minimized.

Copy link
Member

stgraber commented Aug 1, 2016

We don't but we'd certainly welcome the contribution. Best would probably be a doc/production-setup.md or similar which we could then integrate with our website.

@stgraber

This comment has been minimized.

Copy link
Member

stgraber commented Aug 14, 2016

That markdown file is now part of our documentation, closing the issue.

@stgraber stgraber closed this Aug 14, 2016

@andersruneson

This comment has been minimized.

Copy link

andersruneson commented Oct 6, 2016

I also had to change fs.inotify.max_user_instances to 1024 according this mail thread.
Could that be added to the documentation maybe?

@stgraber

This comment has been minimized.

Copy link
Member

stgraber commented Oct 6, 2016

As mentioned before, this is all listed in doc/production-setup.md

@andersruneson

This comment has been minimized.

Copy link

andersruneson commented Oct 6, 2016

Right! max_user_instances is there, but maybe 1048576 as value doesn't work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment