Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dnsmask failed to create inotify #2709

pierreozoux opened this issue Jun 12, 2017 · 3 comments


None yet
3 participants
Copy link

commented Jun 12, 2017

kops version: 1.6
kubernetes version: 1.6.1
Networking: canal
Cloud: AWS
Node age: 26d

Here are the logs we saw:

I0612 11:07:05.749783       1 main.go:76] opts: {{/usr/sbin/dnsmasq [-k --cache-size=1000 --log-facility=- --server=/cluster.local/ --server=/ --server=/] true} /etc/k8s/dns/dnsmasq-nanny 10000000000}
I0612 11:07:05.750284       1 nanny.go:86] Starting dnsmasq [-k --cache-size=1000 --log-facility=- --server=/cluster.local/ --server=/ --server=/]
I0612 11:07:05.822074       1 nanny.go:108] 
I0612 11:07:05.822091       1 nanny.go:108] dnsmasq: failed to create inotify: No file descriptors available
I0612 11:07:05.822117       1 nanny.go:111] 
W0612 11:07:05.822124       1 nanny.go:112] Got EOF from stderr
I0612 11:07:05.822148       1 nanny.go:111] 
W0612 11:07:05.822161       1 nanny.go:112] Got EOF from stdout
F0612 11:07:05.822175       1 nanny.go:182] dnsmasq exited: exit status 5

We upgraded the cluster, and the error is gone.

This looks like related to: kubernetes/kubernetes#32526

Kops indeed has this setting on the node:

cat /proc/sys/fs/inotify/max_user_instances

Would it be beneficial to update this number? We can PR if necessary?

As a side note, since the beginning of the cluster, I can't tail pods logs with this error message:

failed to create fsnotify watcher: too many open files%

I'm not sure it is related, but I though it is worth to mention.

As a second side note, inodes there are 4.33M free inodes.


This comment has been minimized.

Copy link

commented Jun 12, 2017

Just for information, in order to finish upgrade we had to kill failing pod each time after next node were recreated. Since cops were reporting
0612 14:00:48.645469 41005 rollingupdate_cluster.go:430] Cluster did not validate, and waiting longer: your kube-system pods are NOT healthy

Once failing pod deleted, rolling update can be started again.


This comment has been minimized.

Copy link

commented Jun 12, 2017


Would it be beneficial to update this number? We can PR if necessary?

Does updating this number fix the issue? We support a bunch of different operating systems and I am uncertain which OS's this impacts.

If we can figure out a solution, please PR ;)


This comment has been minimized.

Copy link
Contributor Author

commented Jul 18, 2017

Closing in favor of #2912

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.