Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore non socket files in the kubelet plugin watcher #70494

Merged
merged 2 commits into from
Nov 17, 2018

Conversation

RenaudWasTaken
Copy link
Contributor

@RenaudWasTaken RenaudWasTaken commented Oct 31, 2018

What type of PR is this?
/kind bug
/sig node

What this PR does / why we need it: Makes sure the pluginwatcher ignores non socket files.

Which issue(s) this PR fixes : Fixes #69015

Special notes for your reviewer:
Will post a list of plugins using the pluginwatcher mechanism and where they place the socket.
cc @vladimirvivien @vikaschoudhary16 @figo

Does this PR introduce a user-facing change?:

Kubelet Device Plugin Registration directory changed from from `{kubelet_root_dir}/plugins/` to `{kubelet_root_dir}/plugins_registry/`. Any drivers (CSI or device plugin) that were using the old path must be updated to work with this version.

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. sig/node Categorizes an issue or PR as relevant to SIG Node. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/kubelet labels Oct 31, 2018
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Oct 31, 2018
Copy link
Contributor

@figo figo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, have two comments as below.

if !fi.IsDir() {
if !strings.HasSuffix(fi.Name(), ".sock") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of checking name suffix, we can check file mode to make sure it is socket file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think suffix checking is still needed, can we do both?
suffix checking is necessary for DELETE event, so that we can skip those DELETE events that come from non-socket file. (unfortunately, we can not check file mode since already been deleted when received the event), could you update the handleDeleteEvent?

relative += "/"
}

if strings.Count(relative, "/") > 1 {
Copy link
Contributor

@figo figo Oct 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by reading the PR, i believe filtering socket file is good enough to reduce the actual registration attempt, we will still see logs saying: non-socket files been ignored, we could increase log verbose level if think too many logs is a issue, similarly i suggest to increase log verbose level at handleDeleteEvent()

limit directory depth where plugin can stay has two drawbacks IMO:

  1. it is hard to enforce, not sure everyone will read code/readme here.
  2. it provides less flexibility to plugins: if they want to organize many plugins with directory structure .

Thanks

@figo figo mentioned this pull request Nov 8, 2018
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 10, 2018
@vladimirvivien
Copy link
Member

vladimirvivien commented Nov 12, 2018

@RenaudWasTaken there are couple of suggestions from the issue (#69015) that should help.

  • Keep recursive scanning as is, to avoid disruption
  • When a directory event shows up, determined if it is mounted as a data dir using the mounter package (i.e. mounter.IsLikelyNotMountPoint(...))

While the above will reject more DIRs and avoid scanning them, another suggestion would be to start scanning at a DIR dir one level below the Kubelet dir (i.e. $KUBELET_DIR/plugreg/). This would avoid all false positives caused by scanning the existing in-tree volume plugins DIR at KUBELET_DIR/plugin. This would require existing Kubelet Plugin to update where they place their socket file 😬 .

Thoughts ?

cc @saad-ali @figo @RenaudWasTaken

@RenaudWasTaken
Copy link
Contributor Author

RenaudWasTaken commented Nov 12, 2018

Here is my take on the three possible solution:

  • Solution 1: Keep recursive scanning as is, but when a directory event shows up, determined if it is mounted as a data dir using the mounter package (i.e. mounter.IsLikelyNotMountPoint(...))

    • Issues: fd exhaustion, more CPU load on a lot of directories that will never have a plugin but will see a lot of activity
    • Pros: Solution that has the most guarantee to not break any plugin
  • Solution 2: Start scanning at a DIR dir one level below the Kubelet dir

    • Issues: Breaks existing plugins
    • Pros: Less CPU load, less fd exhaustion
  • Solution 3: Stop scanning at on dir below the kubelet dir ($KUBELET_DIR/foo.io/socket-name.sock

    • Issues: Possibility of breaking existing plugins (we would need to check if there are plugins not using this scheme)
    • Pros: Less CPU load, less fd exhaustion

I believe solution 3 makes the most sense, but would like to have support from others :)

@vladimirvivien
Copy link
Member

@RenaudWasTaken what do you mean by Stop scanning?

Solution 3: Stop scanning at on dir below the kubelet dir

@RenaudWasTaken
Copy link
Contributor Author

@RenaudWasTaken what do you mean by Stop scanning?

By that I mean that any directory created under $KUBELET_DIR/foo.io/ isn't followed.

@vladimirvivien
Copy link
Member

@RenaudWasTaken for 3 scanning would always happen at $KUBELET_DIR/ but skip $KUBELET_DIR/foo.io and use the mounter function to skip mounted inline plugin dirs ?

@vladimirvivien
Copy link
Member

@RenaudWasTaken Let's go with Option 3 along with the mounter suggestion. I think that would work.

@figo
Copy link
Contributor

figo commented Nov 12, 2018

Solution 1: Keep recursive scanning as is, but when a directory event shows up, determined if it is mounted as a data dir using the mounter package (i.e. mounter.IsLikelyNotMountPoint(...))

  • Issues: fd exhaustion, more CPU load on a lot of directories that will never have a plugin but will see a lot of activity
  • Pros: Solution that has the most guarantee to not break any plugin

i don't understand why option 1 will cause more CPU load,
i am assuming when a mounted directory shows up, we will NOT add it into watcher's watch list.
then all sub-directories/file events will not trigger anything.

@RenaudWasTaken @vladimirvivien

@vladimirvivien
Copy link
Member

@figo I think @RenaudWasTaken was referring to unnecessary processing required to trigger the file event. The trigger will happen before the mounted status can be determined. However, it will stop further walk down the directory if it is mounted.

At this point, the mounted check is a good solution. Maybe it can be combined with option 3?

@RenaudWasTaken
Copy link
Contributor Author

i don't understand why option 1 will cause more CPU load,

Reading a bit more your comment, it seems I lack understanding of what the CSI plugins does. Are you suggesting that most directories in the plugin tree are mounted directories?

Looking at the logs posted on the original issue, I expected that most of the metadata files (and dirs) weren't mount points:

E0921 18:42:24.539612   81286 plugin_watcher.go:120] error could not find plugin for deleted file /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8/globalmount when handling delete event: "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8/globalmount": REMOVE
E0921 18:42:28.592695   81286 plugin_watcher.go:120] error could not find plugin for deleted file /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8/globalmount when handling delete event: "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8/globalmount": REMOVE
E0921 18:42:28.592728   81286 plugin_watcher.go:120] error could not find plugin for deleted file /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8/globalmount when handling delete event: "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8/globalmount": REMOVE
E0921 18:42:28.592752   81286 plugin_watcher.go:120] error could not find plugin for deleted file /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8 when handling delete event: "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8": REMOVE
E0921 18:42:28.592767   81286 plugin_watcher.go:120] error could not find plugin for deleted file /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8/vol_data.json when handling delete event: "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8/vol_data.json": REMOVE
E0921 18:42:28.592967   81286 plugin_watcher.go:120] error could not find plugin for deleted file /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8 when handling delete event: "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8": REMOVE
E0921 18:42:36.611609   81286 plugin_watcher.go:120] error could not find plugin for deleted file /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8/vol_data.json when handling delete event: "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8/vol_data.json": REMOVE

Because of that I was referring to increased CPU load as, when you look at fsnotify's (and inotify's) implementation more files and directories watched means more work for the CPU / kernel to do (given the current behavior, I can clearly see this grow to hundreds of files or more), and more fds used.

Do you mind detailing a bit where the mountpoint is?
Thanks!

@vladimirvivien
Copy link
Member

@RenaudWasTaken

Reading a bit more your comment, it seems I lack understanding of what the CSI plugins does. Are you suggesting that most directories in the plugin tree are mounted directories?

Unfortunately there is a mix of mounted points and metadata file nodes in $KUBELET_DIR/plugins. Here is another set of suggestions that I think could fix things ever more when scanning $KUBELET_DIR/plugins to keep backward compat with existing drivers:

  • Skip scanning $KUBELET_DIR/plugins/kubernetes.io - this is where metadata, mount points, etc are kept for in-tree and CSI drivers.
  • Next check to ensure filenode is not a mount point, if so skip
  • Add next directory for scanning
  • If .sock file is found, dispatch event for registration

This should avoid hitting all plugins (inline and csi) thus reducing scanning overhead.

Thoughts.

@AishSundar
Copy link
Contributor

/milestone v1.13
/priority important-soon

@k8s-ci-robot k8s-ci-robot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Nov 14, 2018
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 17, 2018
Copy link
Member

@saad-ali saad-ali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RenaudWasTaken, saad-ali, vishh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@saad-ali
Copy link
Member

/priority critical-urgent

@k8s-ci-robot k8s-ci-robot added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Nov 17, 2018
@RenaudWasTaken
Copy link
Contributor Author

/retest

@RenaudWasTaken
Copy link
Contributor Author

RenaudWasTaken commented Nov 17, 2018 via email

@saad-ali
Copy link
Member

/hold

Double checking GCE PD tests

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 17, 2018
@saad-ali
Copy link
Member

Ok GCE PD CSI tests look good. Removing hold

/hold cancel

/test pull-kubernetes-e2e-gce-100-performance
/test pull-kubernetes-kubemark-e2e-gce-big
/test pull-kubernetes-integration

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 17, 2018
@k8s-ci-robot k8s-ci-robot merged commit e3420cc into kubernetes:master Nov 17, 2018
tamalsaha pushed a commit to tamalsaha/docs that referenced this pull request Dec 19, 2018
The registration directory has changed from
`{kubelet_root_dir}/plugins/` to `{kubelet_root_dir}/plugins_registry/`.

see: kubernetes/kubernetes#70494
@ffilippopoulos
Copy link

We recently upgraded our kubernetes cluster from 1.11.5 to 1.12.3. We are running on aws with nodes spread across 3 azs, use calico-node as cni and ebs gp2 volumes for our pvcs. Since the upgrade we noticed a strange kubelet behaviour where certain kubelet instances fail to restart due to errors like the following:

Dec 19 09:40:21 ip-10-66-21-48 docker[7529]: F1219 09:40:21.316608    7563 kubelet.go:1376] failed to start Plugin Watcher. err: failed to traverse plugin socket path, err: failed to watch /var/lib/kubelet
/plugins/kubernetes.io/aws-ebs/mounts/aws/eu-west-1a/vol-0c4593cba89d43513/nodes/0/indices/Z9M9TmglRHqsKkUOxAy_Xw/6/_state, err: no space left on device

The node has plenty of free space, so we took our efforts of tracing the problem to watchers. We noticed that there was an excessive use of them:

$ sudo bash watchers | column -t
kubelet        /hyperkube                        8851   /proc/8851/fdinfo/5    1
kubelet        /hyperkube                        8851   /proc/8851/fdinfo/9    1
kubelet        /hyperkube                        8851   /proc/8851/fdinfo/18   1
kubelet        /hyperkube                        8851   /proc/8851/fdinfo/25   13932
kubelet        /hyperkube                        8851   /proc/8851/fdinfo/36   0
kubelet        /hyperkube                        8851   /proc/8851/fdinfo/37   245

and increasing the maximum number fs.inotify.max_user_watches was able to make kubelet start (and also log plenty of the following:

Dec 19 09:48:03 ip-10-66-21-48 docker[14332]: E1219 09:48:03.524173   14365 plugin_watcher.go:120] error could not find plugin for deleted file /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/eu-
west-1a/vol-0c4593cba89d43513/nodes/0/indices/f5rScFZ5SZeyNPwomrzM1g/0/index/_gk9_Lucene50_0.doc when handling delete event: "/var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/eu-west-1a/vol-0c4593
cba89d43513/nodes/0/indices/f5rScFZ5SZeyNPwomrzM1g/0/index/_gk9_Lucene50_0.doc": REMOVE

)
We tackled that more and discovered that what is causing that is our elasticsearch pods that we know are doing an excessive use of file descriptors and file handles (by the nature of the app). By moving the elasticsearch deployment away of the affected node we see that kubelet watchers decrease and kubelet is able to start fine.
The issue here is that all these used to run smoothly on previous kube versions but only introduce this "breaking" behaviour when we upgraded to 1.12.3. That looks related with this PR so has anyone experienced similar issues and manage to resolve using the changes on this one?

@msau42
Copy link
Member

msau42 commented Dec 19, 2018

For 1.12, we should backport the change that blacklisted the kubernetes.io subdirectory.

@RenaudWasTaken
Copy link
Contributor Author

@msau42 unfortunately that change is breaking, backporting it would require all the plugins to backport their updates.

I don't really think that's possible...

@RenaudWasTaken
Copy link
Contributor Author

A solution could be to limit the search depth for 1.12, what do you think @msau42 @vladimirvivien @saad-ali

@msau42
Copy link
Member

msau42 commented Dec 19, 2018

Limiting the search depth is technically breaking too, but unlikely.

I don't know of any users of plugin registration other than csi, and at least our csi documentation does not suggest a kubernetes.io subdirectory

@saad-ali
Copy link
Member

I agree that we can't backport this PR, which changes the plugin dir, to 1.12. But we have to fix this for 1.12. So, let's back port the fix we made for this in #71314 -- i.e. blacklist the kubernetes.io dir (https://github.com/kubernetes/kubernetes/pull/71314/files#diff-d169099e060a585d9b613ccb3b43dc31R446).

@RenaudWasTaken can you prepare a PR?

@RenaudWasTaken
Copy link
Contributor Author

I can do that, I should be able to take care of that today.

@saad-ali
Copy link
Member

Great, thanks!!

@vreon
Copy link

vreon commented Jan 18, 2019

I'm encountering the issue reported here by @ffilippopoulos, but on v1.13.2, and with the ceph.rook.io plugin instead of EBS; should I open a different issue?

@msau42
Copy link
Member

msau42 commented Jan 22, 2019

@vreon can you open a new issue and provide some logs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Plugin watcher getting notified of every Kubelet directory event
10 participants