Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to watch directory ... no space left on device #1581

Open
liftedkilt opened this issue Jan 26, 2017 · 11 comments
Open

Failed to watch directory ... no space left on device #1581

liftedkilt opened this issue Jan 26, 2017 · 11 comments
Labels

Comments

@liftedkilt
Copy link

@liftedkilt liftedkilt commented Jan 26, 2017

When trying to start cadvisor as either a container or as the standalone binary, it fails with the error:

Failed to watch directory "/sys/fs/cgroup/memory/system.slice": inotify_add_watch /sys/fs/cgroup/memory/system.slice/run-docker-netns-c6d57b04b0f8.mount: no space left on device

There is over 1.5Tb of free space, so I know it obviously isn't a space problem on the host. Any thoughts?

@timstclair

This comment has been minimized.

Copy link
Contributor

@timstclair timstclair commented Jan 27, 2017

This error (ENOSPC) comes from the inotify_add_watch syscall, and actually has multiple meanings (the message comes from golang). Most likely the problem is from exceeding the maximum number of watches, not filling the disk. This can be increased with the fs.inotify.max_user_watches sysctl, but I would investigate what else is creating so many watches first. How many containers are you running?

@liftedkilt

This comment has been minimized.

Copy link
Author

@liftedkilt liftedkilt commented Jan 27, 2017

737 containers, 'fs.inotify.max_user_watches' is set to 8192

@timstclair

This comment has been minimized.

Copy link
Contributor

@timstclair timstclair commented Jan 27, 2017

737 containers is a lot. Take a look at this SO answer: http://unix.stackexchange.com/a/13757/68061

I'm curious what the result of the suggested command is:

sudo find /proc/*/fd -lname anon_inode:inotify |
   cut -d/ -f3 |
   xargs -I '{}' -- ps --no-headers -o '%p %U %c' -p '{}' |
   uniq -c |
   sort -nr
@liftedkilt

This comment has been minimized.

Copy link
Author

@liftedkilt liftedkilt commented Jan 27, 2017

  5  875708 root     java
  5  681845 root     java
  5   63880 root     java
  5   61450 root     java
  5   61448 root     java
  5   57620 root     java
  5   52067 root     java
  5   49636 root     java
  5   47877 root     java
  5   45639 root     java
  5   43724 root     java
  5   41675 root     java
  5   40566 root     java
  5   40553 root     java
  5 3951445 root     java
  5 3938788 root     java
  5 3626989 root     java
  5 3582307 root     java
  5   34833 root     java
  5   34818 root     java
  5 3362735 root     java
  5  254863 root     java
  5  250666 root     java
  5  244775 root     java
  5  234905 root     java
  5  224418 root     java
  5       1 root     systemd
  5  185994 root     java
  5 1827734 root     java
  5 1326212 root     java
  4   63878 root     java
  2 2722168 root     cadvisor
  2    1791 root     polkitd
  1    9828 root     agetty
  1     906 root     systemd-udevd
  1 3010109 wgf      systemd
  1  182943 root     java
  1    1794 dnsmasq  dnsmasq
  1    1737 message+ dbus-daemon
  1    1720 root     acpid
  1    1718 root     accounts-daemon
  1    1564 systemd+ systemd-timesyn
@cirocosta

This comment has been minimized.

Copy link

@cirocosta cirocosta commented Sep 11, 2017

Hey @tallclair, I just ran into the same as @liftedkilt reported in many of the machines we're running ( all very well packed for their sizes). By default they're all running with 8K watches set in syctl.conf.

Naturally one can fix that by raising that limit but I was wondering whether we could reduce that number by removing unnecessary watches that are placed in raw.go - do we really need to keep track of each cgroup subsystem in order to discover if a container has been created/removed/.. ?

Do you think there's room for making this more light-weight? I'd be willing to give a shot implementing it.

Thx!

@tallclair

This comment has been minimized.

Copy link
Member

@tallclair tallclair commented Sep 11, 2017

Usually it should be OK to watch a single common cgroup subsystem (e.g. cpu), but that would risk missing containers that don't use that subsystem. That is probably a rare case that could be addressed with a configuration option though. I'm not super familiar with this part of the code, so perhaps someone more familiar can chime in. @vishh ?

@wangzi19870227

This comment has been minimized.

Copy link

@wangzi19870227 wangzi19870227 commented Feb 22, 2018

maybe your inotify resources exhausted, increase max_user_watches works for me.

$ cat /proc/sys/fs/inotify/max_user_watches # default is 8192 $ sudo sysctl fs.inotify.max_user_watches=1048576 # increase to 1048576

@zuozuo

This comment has been minimized.

Copy link

@zuozuo zuozuo commented Feb 24, 2018

@wangzi19870227 thanks, works for me

@bamb00

This comment has been minimized.

Copy link

@bamb00 bamb00 commented Aug 13, 2018

@timstclair Is there anyway I can get the listing of the actual culprit? I cannot tell if the kubelet process is taking up majority of the watches even though the error is from kubelet.

# cat /proc/sys/fs/inotify/max_user_watches
8192

# tail -f /var/log/messages
tail: inotify resources exhausted
tail: inotify cannot be used, reverting to polling

# journalctl -u kubelet | grep device
Aug 13 12:48:05 sys-multinode-minion-1 kubelet[81153]: W0813 12:48:05.682241   81153 raw.go:87] Error while processing event ("/sys/fs/cgroup/cpuset/kubepods/burstable/podd405e4ff-9aab-11e8-aaea-70695a988249/485ae7dc7e34b76666815053f75a705d03ca2c0ee3ba2d2edbb6fa4128ec7810": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/cpuset/kubepods/burstable/podd405e4ff-9aab-11e8-aaea-70695a988249/485ae7dc7e34b76666815053f75a705d03ca2c0ee3ba2d2edbb6fa4128ec7810: no space left on device


#  find /proc/*/fd -lname anon_inode:inotify |    cut -d/ -f3 |    xargs -I '{}' -- ps --no-headers -o '%p %U %c' -p '{}' |    uniq -c | sort -nr
find: ‘/proc/81153/fd/40’: No such file or directory
find: ‘/proc/81153/fd/88’: No such file or directory
	      6      1 root     systemd
	      4  81153 root     kubelet
	      2   4605 gdm      gnome-shell
	      2   1882 root     NetworkManager
	      2   1805 polkitd  polkitd
	      1  83112 root     prometheus-conf
	      1   6340 gdm      ibus-engine-sim
	      1   6065 colord   colord
	      1   6044 gdm      gsd-sound
	      1   5983 gdm      gsd-color
	      1   5964 gdm      gsd-xsettings
	      1   5952 root     packagekitd
	      1   5876 gdm      ibus-portal
	      1   5871 gdm      ibus-x11
	      1   5862 gdm      ibus-dconf
	      1   5825 gdm      ibus-daemon
	      1   5057 gdm      pulseaudio
	      1   4480 gdm      dbus-daemon
	      1   4409 gdm      dbus-daemon
	      1   4398 gdm      gnome-session-b
	      1   4047 nobody   dnsmasq
	      1   2459 root     crond
	      1   2450 root     rsyslogd
	      1   1846 avahi    avahi-daemon
	      1   1841 root     abrt-watch-log
	      1   1838 root     abrt-watch-log
	      1   1837 root     abrtd
	      1   1813 dbus     dbus-daemon
	      1   1802 root     accounts-daemon
	      1   1312 root     systemd-udevd
@machinekoder

This comment has been minimized.

Copy link

@machinekoder machinekoder commented Nov 6, 2018

I think it's worth mentioning that the inotify limit is a property of the host system, not the Docker image itself. So if you get this error, increase the inotify limit in you host system, not inside the Docker image.

@yousong

This comment has been minimized.

Copy link

@yousong yousong commented Nov 14, 2018

In case it may help, I just wrote a few lines of shell script to count inotify watches used by each inotify instance. We have a kubelet installation using up almost all the max_user_watches quota

https://github.com/yousong/gists/blob/master/shell/inotify_watchers.sh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
10 participants
You can’t perform that action at this time.