Failed to watch directory ... no space left on device #1581

liftedkilt · 2017-01-26T23:16:55Z

When trying to start cadvisor as either a container or as the standalone binary, it fails with the error:

Failed to watch directory "/sys/fs/cgroup/memory/system.slice": inotify_add_watch /sys/fs/cgroup/memory/system.slice/run-docker-netns-c6d57b04b0f8.mount: no space left on device

There is over 1.5Tb of free space, so I know it obviously isn't a space problem on the host. Any thoughts?

The text was updated successfully, but these errors were encountered:

timstclair · 2017-01-27T00:43:59Z

This error (ENOSPC) comes from the inotify_add_watch syscall, and actually has multiple meanings (the message comes from golang). Most likely the problem is from exceeding the maximum number of watches, not filling the disk. This can be increased with the fs.inotify.max_user_watches sysctl, but I would investigate what else is creating so many watches first. How many containers are you running?

liftedkilt · 2017-01-27T01:24:42Z

737 containers, 'fs.inotify.max_user_watches' is set to 8192

timstclair · 2017-01-27T02:27:50Z

737 containers is a lot. Take a look at this SO answer: http://unix.stackexchange.com/a/13757/68061

I'm curious what the result of the suggested command is:

sudo find /proc/*/fd -lname anon_inode:inotify |
   cut -d/ -f3 |
   xargs -I '{}' -- ps --no-headers -o '%p %U %c' -p '{}' |
   uniq -c |
   sort -nr

liftedkilt · 2017-01-27T04:03:23Z

  5  875708 root     java
  5  681845 root     java
  5   63880 root     java
  5   61450 root     java
  5   61448 root     java
  5   57620 root     java
  5   52067 root     java
  5   49636 root     java
  5   47877 root     java
  5   45639 root     java
  5   43724 root     java
  5   41675 root     java
  5   40566 root     java
  5   40553 root     java
  5 3951445 root     java
  5 3938788 root     java
  5 3626989 root     java
  5 3582307 root     java
  5   34833 root     java
  5   34818 root     java
  5 3362735 root     java
  5  254863 root     java
  5  250666 root     java
  5  244775 root     java
  5  234905 root     java
  5  224418 root     java
  5       1 root     systemd
  5  185994 root     java
  5 1827734 root     java
  5 1326212 root     java
  4   63878 root     java
  2 2722168 root     cadvisor
  2    1791 root     polkitd
  1    9828 root     agetty
  1     906 root     systemd-udevd
  1 3010109 wgf      systemd
  1  182943 root     java
  1    1794 dnsmasq  dnsmasq
  1    1737 message+ dbus-daemon
  1    1720 root     acpid
  1    1718 root     accounts-daemon
  1    1564 systemd+ systemd-timesyn

cirocosta · 2017-09-11T22:36:11Z

Hey @tallclair, I just ran into the same as @liftedkilt reported in many of the machines we're running ( all very well packed for their sizes). By default they're all running with 8K watches set in syctl.conf.

Naturally one can fix that by raising that limit but I was wondering whether we could reduce that number by removing unnecessary watches that are placed in raw.go - do we really need to keep track of each cgroup subsystem in order to discover if a container has been created/removed/.. ?

Do you think there's room for making this more light-weight? I'd be willing to give a shot implementing it.

Thx!

tallclair · 2017-09-11T23:11:38Z

Usually it should be OK to watch a single common cgroup subsystem (e.g. cpu), but that would risk missing containers that don't use that subsystem. That is probably a rare case that could be addressed with a configuration option though. I'm not super familiar with this part of the code, so perhaps someone more familiar can chime in. @vishh ?

wangzi19870227 · 2018-02-22T09:16:46Z

maybe your inotify resources exhausted, increase max_user_watches works for me.

$ cat /proc/sys/fs/inotify/max_user_watches # default is 8192 $ sudo sysctl fs.inotify.max_user_watches=1048576 # increase to 1048576

zuozuo · 2018-02-24T06:43:36Z

@wangzi19870227 thanks, works for me

bamb00 · 2018-08-13T21:34:47Z

@timstclair Is there anyway I can get the listing of the actual culprit? I cannot tell if the kubelet process is taking up majority of the watches even though the error is from kubelet.

# cat /proc/sys/fs/inotify/max_user_watches
8192

# tail -f /var/log/messages
tail: inotify resources exhausted
tail: inotify cannot be used, reverting to polling

# journalctl -u kubelet | grep device
Aug 13 12:48:05 sys-multinode-minion-1 kubelet[81153]: W0813 12:48:05.682241   81153 raw.go:87] Error while processing event ("/sys/fs/cgroup/cpuset/kubepods/burstable/podd405e4ff-9aab-11e8-aaea-70695a988249/485ae7dc7e34b76666815053f75a705d03ca2c0ee3ba2d2edbb6fa4128ec7810": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/cpuset/kubepods/burstable/podd405e4ff-9aab-11e8-aaea-70695a988249/485ae7dc7e34b76666815053f75a705d03ca2c0ee3ba2d2edbb6fa4128ec7810: no space left on device


#  find /proc/*/fd -lname anon_inode:inotify |    cut -d/ -f3 |    xargs -I '{}' -- ps --no-headers -o '%p %U %c' -p '{}' |    uniq -c | sort -nr
find: ‘/proc/81153/fd/40’: No such file or directory
find: ‘/proc/81153/fd/88’: No such file or directory
	      6      1 root     systemd
	      4  81153 root     kubelet
	      2   4605 gdm      gnome-shell
	      2   1882 root     NetworkManager
	      2   1805 polkitd  polkitd
	      1  83112 root     prometheus-conf
	      1   6340 gdm      ibus-engine-sim
	      1   6065 colord   colord
	      1   6044 gdm      gsd-sound
	      1   5983 gdm      gsd-color
	      1   5964 gdm      gsd-xsettings
	      1   5952 root     packagekitd
	      1   5876 gdm      ibus-portal
	      1   5871 gdm      ibus-x11
	      1   5862 gdm      ibus-dconf
	      1   5825 gdm      ibus-daemon
	      1   5057 gdm      pulseaudio
	      1   4480 gdm      dbus-daemon
	      1   4409 gdm      dbus-daemon
	      1   4398 gdm      gnome-session-b
	      1   4047 nobody   dnsmasq
	      1   2459 root     crond
	      1   2450 root     rsyslogd
	      1   1846 avahi    avahi-daemon
	      1   1841 root     abrt-watch-log
	      1   1838 root     abrt-watch-log
	      1   1837 root     abrtd
	      1   1813 dbus     dbus-daemon
	      1   1802 root     accounts-daemon
	      1   1312 root     systemd-udevd

machinekoder · 2018-11-06T20:57:23Z

I think it's worth mentioning that the inotify limit is a property of the host system, not the Docker image itself. So if you get this error, increase the inotify limit in you host system, not inside the Docker image.

yousong · 2018-11-14T07:35:29Z

In case it may help, I just wrote a few lines of shell script to count inotify watches used by each inotify instance. We have a kubelet installation using up almost all the max_user_watches quota

https://github.com/yousong/gists/blob/master/shell/inotify_watchers.sh

gxin0426 · 2020-04-07T06:12:11Z

737 containers is a lot. Take a look at this SO answer: http://unix.stackexchange.com/a/13757/68061

I'm curious what the result of the suggested command is:
sudo find /proc/*/fd -lname anon_inode:inotify |
   cut -d/ -f3 |
   xargs -I '{}' -- ps --no-headers -o '%p %U %c' -p '{}' |
   uniq -c |
   sort -nr

after execution of orders：

sudo find /proc/*/fd -lname anon_inode:inotify |

cut -d/ -f3 |
xargs -I '{}' -- ps --no-headers -o '%p %U %c' -p '{}' |
uniq -c |
sort -nr
3 5452 root kubelet
3 1 root systemd
2 9086 root NetworkManager
2 9060 polkitd polkitd
1 9425 root rsyslogd
1 9110 root crond
1 9069 dbus dbus-daemon
1 4728 root systemd-udevd

zoux86 · 2020-10-14T03:14:38Z

737 containers is a lot. Take a look at this SO answer: http://unix.stackexchange.com/a/13757/68061

@timstclair i meet the same problem with 468 containers, 'fs.inotify.max_user_watches' is set to 8192
so i want to know why you think "737 containers is a lot"。dose 468 containers is or not?

dashpole added the kind/support label Mar 17, 2017

dashpole assigned timstclair Mar 17, 2017

timstclair assigned tallclair and unassigned timstclair Jul 7, 2017

minrk mentioned this issue Dec 5, 2017

Clean up built images with prefixes that aren't the current prefix jupyterhub/mybinder.org-deploy#189

Closed

This was referenced Nov 14, 2018

addWatch: resource exhausted (No space left on device) ehamberg/fswatcher#10

Closed

sos: addWatch: resource exhausted (No space left on device) schell/steeloverseer#34

Open

marcel-dempers mentioned this issue Dec 12, 2018

Running workflow results in wait container "no space left on device" argoproj/argo-workflows#1128

Closed

jeff1985 mentioned this issue Feb 21, 2019

Cgroup leaking, no space left on /sys/fs/cgroup kubernetes/kubernetes#70324

Closed

dalazx mentioned this issue Mar 23, 2019

Nodes become unresponsive in the default non-GPU pool on dev neuro-inc/platform-api#527

Closed

swachter mentioned this issue Apr 1, 2019

possibly leaking memory cgroups kubernetes-sigs/kind#421

Closed

BenTheElder mentioned this issue Apr 1, 2019

sporadic cluster creation timeout kubernetes-sigs/kind#412

Closed

tallclair removed their assignment May 1, 2019

g-bohncke mentioned this issue Jul 11, 2019

Cadvisor:latest will not start #2190

Open

olljanat mentioned this issue Dec 18, 2020

Kernel and iSCSI parameters are not optimal burmilla/os#34

Closed

Megidd mentioned this issue Jul 1, 2021

"No space left on device" when running dev server parcel-bundler/parcel#5261

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to watch directory ... no space left on device #1581

Failed to watch directory ... no space left on device #1581

liftedkilt commented Jan 26, 2017

timstclair commented Jan 27, 2017

liftedkilt commented Jan 27, 2017

timstclair commented Jan 27, 2017

liftedkilt commented Jan 27, 2017

cirocosta commented Sep 11, 2017

tallclair commented Sep 11, 2017

wangzi19870227 commented Feb 22, 2018

zuozuo commented Feb 24, 2018

bamb00 commented Aug 13, 2018 •

edited

Loading

machinekoder commented Nov 6, 2018 •

edited

Loading

yousong commented Nov 14, 2018

gxin0426 commented Apr 7, 2020

zoux86 commented Oct 14, 2020

Failed to watch directory ... no space left on device #1581

Failed to watch directory ... no space left on device #1581

Comments

liftedkilt commented Jan 26, 2017

timstclair commented Jan 27, 2017

liftedkilt commented Jan 27, 2017

timstclair commented Jan 27, 2017

liftedkilt commented Jan 27, 2017

cirocosta commented Sep 11, 2017

tallclair commented Sep 11, 2017

wangzi19870227 commented Feb 22, 2018

zuozuo commented Feb 24, 2018

bamb00 commented Aug 13, 2018 • edited Loading

machinekoder commented Nov 6, 2018 • edited Loading

yousong commented Nov 14, 2018

gxin0426 commented Apr 7, 2020

zoux86 commented Oct 14, 2020

bamb00 commented Aug 13, 2018 •

edited

Loading

machinekoder commented Nov 6, 2018 •

edited

Loading