Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to watch directory ... no space left on device #1581

Open
liftedkilt opened this issue Jan 26, 2017 · 13 comments
Open

Failed to watch directory ... no space left on device #1581

liftedkilt opened this issue Jan 26, 2017 · 13 comments

Comments

@liftedkilt
Copy link

When trying to start cadvisor as either a container or as the standalone binary, it fails with the error:

Failed to watch directory "/sys/fs/cgroup/memory/system.slice": inotify_add_watch /sys/fs/cgroup/memory/system.slice/run-docker-netns-c6d57b04b0f8.mount: no space left on device

There is over 1.5Tb of free space, so I know it obviously isn't a space problem on the host. Any thoughts?

@timstclair
Copy link
Contributor

This error (ENOSPC) comes from the inotify_add_watch syscall, and actually has multiple meanings (the message comes from golang). Most likely the problem is from exceeding the maximum number of watches, not filling the disk. This can be increased with the fs.inotify.max_user_watches sysctl, but I would investigate what else is creating so many watches first. How many containers are you running?

@liftedkilt
Copy link
Author

737 containers, 'fs.inotify.max_user_watches' is set to 8192

@timstclair
Copy link
Contributor

737 containers is a lot. Take a look at this SO answer: http://unix.stackexchange.com/a/13757/68061

I'm curious what the result of the suggested command is:

sudo find /proc/*/fd -lname anon_inode:inotify |
   cut -d/ -f3 |
   xargs -I '{}' -- ps --no-headers -o '%p %U %c' -p '{}' |
   uniq -c |
   sort -nr

@liftedkilt
Copy link
Author

  5  875708 root     java
  5  681845 root     java
  5   63880 root     java
  5   61450 root     java
  5   61448 root     java
  5   57620 root     java
  5   52067 root     java
  5   49636 root     java
  5   47877 root     java
  5   45639 root     java
  5   43724 root     java
  5   41675 root     java
  5   40566 root     java
  5   40553 root     java
  5 3951445 root     java
  5 3938788 root     java
  5 3626989 root     java
  5 3582307 root     java
  5   34833 root     java
  5   34818 root     java
  5 3362735 root     java
  5  254863 root     java
  5  250666 root     java
  5  244775 root     java
  5  234905 root     java
  5  224418 root     java
  5       1 root     systemd
  5  185994 root     java
  5 1827734 root     java
  5 1326212 root     java
  4   63878 root     java
  2 2722168 root     cadvisor
  2    1791 root     polkitd
  1    9828 root     agetty
  1     906 root     systemd-udevd
  1 3010109 wgf      systemd
  1  182943 root     java
  1    1794 dnsmasq  dnsmasq
  1    1737 message+ dbus-daemon
  1    1720 root     acpid
  1    1718 root     accounts-daemon
  1    1564 systemd+ systemd-timesyn

@cirocosta
Copy link

Hey @tallclair, I just ran into the same as @liftedkilt reported in many of the machines we're running ( all very well packed for their sizes). By default they're all running with 8K watches set in syctl.conf.

Naturally one can fix that by raising that limit but I was wondering whether we could reduce that number by removing unnecessary watches that are placed in raw.go - do we really need to keep track of each cgroup subsystem in order to discover if a container has been created/removed/.. ?

Do you think there's room for making this more light-weight? I'd be willing to give a shot implementing it.

Thx!

@tallclair
Copy link
Contributor

Usually it should be OK to watch a single common cgroup subsystem (e.g. cpu), but that would risk missing containers that don't use that subsystem. That is probably a rare case that could be addressed with a configuration option though. I'm not super familiar with this part of the code, so perhaps someone more familiar can chime in. @vishh ?

@wangzi19870227
Copy link

maybe your inotify resources exhausted, increase max_user_watches works for me.

$ cat /proc/sys/fs/inotify/max_user_watches # default is 8192 $ sudo sysctl fs.inotify.max_user_watches=1048576 # increase to 1048576

@zuozuo
Copy link

zuozuo commented Feb 24, 2018

@wangzi19870227 thanks, works for me

@bamb00
Copy link

bamb00 commented Aug 13, 2018

@timstclair Is there anyway I can get the listing of the actual culprit? I cannot tell if the kubelet process is taking up majority of the watches even though the error is from kubelet.

# cat /proc/sys/fs/inotify/max_user_watches
8192

# tail -f /var/log/messages
tail: inotify resources exhausted
tail: inotify cannot be used, reverting to polling

# journalctl -u kubelet | grep device
Aug 13 12:48:05 sys-multinode-minion-1 kubelet[81153]: W0813 12:48:05.682241   81153 raw.go:87] Error while processing event ("/sys/fs/cgroup/cpuset/kubepods/burstable/podd405e4ff-9aab-11e8-aaea-70695a988249/485ae7dc7e34b76666815053f75a705d03ca2c0ee3ba2d2edbb6fa4128ec7810": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/cpuset/kubepods/burstable/podd405e4ff-9aab-11e8-aaea-70695a988249/485ae7dc7e34b76666815053f75a705d03ca2c0ee3ba2d2edbb6fa4128ec7810: no space left on device


#  find /proc/*/fd -lname anon_inode:inotify |    cut -d/ -f3 |    xargs -I '{}' -- ps --no-headers -o '%p %U %c' -p '{}' |    uniq -c | sort -nr
find: ‘/proc/81153/fd/40’: No such file or directory
find: ‘/proc/81153/fd/88’: No such file or directory
	      6      1 root     systemd
	      4  81153 root     kubelet
	      2   4605 gdm      gnome-shell
	      2   1882 root     NetworkManager
	      2   1805 polkitd  polkitd
	      1  83112 root     prometheus-conf
	      1   6340 gdm      ibus-engine-sim
	      1   6065 colord   colord
	      1   6044 gdm      gsd-sound
	      1   5983 gdm      gsd-color
	      1   5964 gdm      gsd-xsettings
	      1   5952 root     packagekitd
	      1   5876 gdm      ibus-portal
	      1   5871 gdm      ibus-x11
	      1   5862 gdm      ibus-dconf
	      1   5825 gdm      ibus-daemon
	      1   5057 gdm      pulseaudio
	      1   4480 gdm      dbus-daemon
	      1   4409 gdm      dbus-daemon
	      1   4398 gdm      gnome-session-b
	      1   4047 nobody   dnsmasq
	      1   2459 root     crond
	      1   2450 root     rsyslogd
	      1   1846 avahi    avahi-daemon
	      1   1841 root     abrt-watch-log
	      1   1838 root     abrt-watch-log
	      1   1837 root     abrtd
	      1   1813 dbus     dbus-daemon
	      1   1802 root     accounts-daemon
	      1   1312 root     systemd-udevd

@machinekoder
Copy link

machinekoder commented Nov 6, 2018

I think it's worth mentioning that the inotify limit is a property of the host system, not the Docker image itself. So if you get this error, increase the inotify limit in you host system, not inside the Docker image.

@yousong
Copy link

yousong commented Nov 14, 2018

In case it may help, I just wrote a few lines of shell script to count inotify watches used by each inotify instance. We have a kubelet installation using up almost all the max_user_watches quota

https://github.com/yousong/gists/blob/master/shell/inotify_watchers.sh

@gxin0426
Copy link

gxin0426 commented Apr 7, 2020

737 containers is a lot. Take a look at this SO answer: http://unix.stackexchange.com/a/13757/68061

I'm curious what the result of the suggested command is:

sudo find /proc/*/fd -lname anon_inode:inotify |
   cut -d/ -f3 |
   xargs -I '{}' -- ps --no-headers -o '%p %U %c' -p '{}' |
   uniq -c |
   sort -nr

after execution of orders:

sudo find /proc/*/fd -lname anon_inode:inotify |

cut -d/ -f3 |
xargs -I '{}' -- ps --no-headers -o '%p %U %c' -p '{}' |
uniq -c |
sort -nr
3 5452 root kubelet
3 1 root systemd
2 9086 root NetworkManager
2 9060 polkitd polkitd
1 9425 root rsyslogd
1 9110 root crond
1 9069 dbus dbus-daemon
1 4728 root systemd-udevd

@zoux86
Copy link

zoux86 commented Oct 14, 2020

737 containers is a lot. Take a look at this SO answer: http://unix.stackexchange.com/a/13757/68061

@timstclair i meet the same problem with 468 containers, 'fs.inotify.max_user_watches' is set to 8192
so i want to know why you think "737 containers is a lot"。dose 468 containers is or not?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests