New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plugin watcher getting notified of every Kubelet directory event #69015

Open
vladimirvivien opened this Issue Sep 25, 2018 · 10 comments

Comments

Projects
None yet
6 participants
@vladimirvivien
Member

vladimirvivien commented Sep 25, 2018

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug
This is more of an enhancement.

What happened:
The Plugin Watcher is setup to trigger on file activities when they occur in the subdirectory that it scans. During startup, the Watcher is passed (from within the Kubelet) the Kubelet's root plugin directory as a location to scan for newly added plugins -

klet.pluginWatcher = pluginwatcher.NewWatcher(klet.getPluginsDir())

Because the kubelet's plugins subdirectory is also used by existing in-tree drivers (including csi) as a location to store plugins artifacts, the Plugin Watcher is getting triggered by directory/file activities occurring in that subdirectory that has nothing to do with plugin registration as shown in the output below:

> cat kubelog.txt | grep "plugin_watcher"
E0921 18:41:49.164687   81286 plugin_watcher.go:115] error dial failed at socket /var/lib/kubelet/plugins/csi-hostpath/csi.sock, err: failed to dial socket /var/lib/kubelet/plugins/csi-hostpath/csi.sock, err: context deadline exceeded when handling create event: "/var/lib/kubelet/plugins/csi-hostpath/csi.sock": CREATE
E0921 18:42:12.126656   81286 plugin_watcher.go:120] error could not find plugin for deleted file /var/lib/kubelet/plugins/csi-hostpath/csi.sock when handling delete event: "/var/lib/kubelet/plugins/csi-hostpath/csi.sock": REMOVE
E0921 18:42:12.127706   81286 plugin_watcher.go:115] error failed to get plugin info using RPC GetInfo at socket /var/lib/kubelet/plugins/csi-hostpath/csi.sock, err: rpc error: code = Unimplemented desc = unknown service pluginregistration.Registration when handling create event: "/var/lib/kubelet/plugins/csi-hostpath/csi.sock": CREATE
E0921 18:42:20.988913   81286 plugin_watcher.go:115] error error accessing path: /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8 error: lstat /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8: no such file or directory when handling create event: "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8": CREATE
E0921 18:42:20.989211   81286 plugin_watcher.go:120] error could not find plugin for deleted file /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8 when handling delete event: "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8": REMOVE
E0921 18:42:21.495810   81286 plugin_watcher.go:115] error failed to watch /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8, err: no such file or directory when handling create event: "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8": CREATE
E0921 18:42:21.495960   81286 plugin_watcher.go:120] error could not find plugin for deleted file /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8 when handling delete event: "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8": REMOVE
E0921 18:42:22.512498   81286 plugin_watcher.go:115] error failed to watch /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8, err: no such file or directory when handling create event: "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8": CREATE
E0921 18:42:22.512767   81286 plugin_watcher.go:120] error could not find plugin for deleted file /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8 when handling delete event: "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8": REMOVE
E0921 18:42:24.539679   81286 plugin_watcher.go:120] error could not find plugin for deleted file /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8/globalmount when handling delete event: "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8/globalmount": REMOVE
E0921 18:42:24.539725   81286 plugin_watcher.go:120] error could not find plugin for deleted file /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8 when handling delete event: "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8": REMOVE
E0921 18:42:24.539743   81286 plugin_watcher.go:120] error could not find plugin for deleted file /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8/vol_data.json when handling delete event: "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8/vol_data.json": REMOVE
E0921 18:42:24.539780   81286 plugin_watcher.go:120] error could not find plugin for deleted file /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8 when handling delete event: "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8": REMOVE
E0921 18:42:24.539612   81286 plugin_watcher.go:120] error could not find plugin for deleted file /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8/globalmount when handling delete event: "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8/globalmount": REMOVE
E0921 18:42:28.592695   81286 plugin_watcher.go:120] error could not find plugin for deleted file /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8/globalmount when handling delete event: "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8/globalmount": REMOVE
E0921 18:42:28.592728   81286 plugin_watcher.go:120] error could not find plugin for deleted file /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8/globalmount when handling delete event: "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8/globalmount": REMOVE
E0921 18:42:28.592752   81286 plugin_watcher.go:120] error could not find plugin for deleted file /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8 when handling delete event: "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8": REMOVE
E0921 18:42:28.592767   81286 plugin_watcher.go:120] error could not find plugin for deleted file /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8/vol_data.json when handling delete event: "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8/vol_data.json": REMOVE
E0921 18:42:28.592967   81286 plugin_watcher.go:120] error could not find plugin for deleted file /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8 when handling delete event: "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8": REMOVE
E0921 18:42:36.611609   81286 plugin_watcher.go:120] error could not find plugin for deleted file /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8/vol_data.json when handling delete event: "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8/vol_data.json": REMOVE
E0921 18:42:36.611485   81286 plugin_watcher.go:115] error error accessing path: /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8/globalmount error: lstat /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8/globalmount: no such file or directory when handling create event: "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8": CREATE
E0921 18:42:36.611575   81286 plugin_watcher.go:120] error could not find plugin for deleted file /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8 when handling delete event: "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8": REMOVE
E0921 18:42:36.611699   81286 plugin_watcher.go:120] error could not find plugin for deleted file /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8 when handling delete event: "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8": REMOVE
E0921 18:42:52.623501   81286 plugin_watcher.go:120] error could not find plugin for deleted file /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8 when handling delete event: "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8": REMOVE
E0921 18:42:52.623535   81286 plugin_watcher.go:115] error stat file /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8 failed: stat /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8: no such file or directory when handling create event: "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8": CREATE
E0921 18:43:24.698730   81286 plugin_watcher.go:120] error could not find plugin for deleted file /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8 when handling delete event: "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ff803a4bdef11e8": REMOVE

What you expected to happen:
The Plugin Watcher should not be getting triggered by file/directory activities not related to plugin registration/unregistration. One possibility is to have the Watcher scan one directory below plugins like /var/lib/kubelet/plugins/registration or something similar

How to reproduce it (as minimally and precisely as possible):
Run plugin watcher with a CSI driver.

cc @RenaudWasTaken @vikaschoudhary16

@vladimirvivien

This comment has been minimized.

Show comment
Hide comment
@vladimirvivien

vladimirvivien Sep 25, 2018

Member

/sig node

Member

vladimirvivien commented Sep 25, 2018

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node and removed needs-sig labels Sep 25, 2018

@tossmilestone

This comment has been minimized.

Show comment
Hide comment
@tossmilestone

tossmilestone Sep 25, 2018

Contributor

@vladimirvivien I think you should only place the plugin socket file in the plugin registration directory but not the plugin data.

Contributor

tossmilestone commented Sep 25, 2018

@vladimirvivien I think you should only place the plugin socket file in the plugin registration directory but not the plugin data.

@tianshapjq

This comment has been minimized.

Show comment
Hide comment
@tianshapjq

tianshapjq Sep 25, 2018

Contributor

agree with @tossmilestone, we can only place the registration file in the specified path instead of placing all data into it. On the other hand, you can place data into the /var/lib/kubelet/plugins/registration even if registration created, that's a no-end loop.

Contributor

tianshapjq commented Sep 25, 2018

agree with @tossmilestone, we can only place the registration file in the specified path instead of placing all data into it. On the other hand, you can place data into the /var/lib/kubelet/plugins/registration even if registration created, that's a no-end loop.

@vladimirvivien

This comment has been minimized.

Show comment
Hide comment
@vladimirvivien

vladimirvivien Sep 25, 2018

Member

@tossmilestone
As I mentioned in issue, existing in-tree volume plugins were already using that directory as a location to store storage operation artifacts (not just csi). Here is a shortened list of driver locations that use /var/lib/kubelet/plugins in their code already:

And several more locations.

The fix would be to have plugin watcher scan a subdirectory of plugins (say /var/lib/kubelet/plugins/registration) instead of plugins directly as it is doing today.

The watcher functions as intended. Except now it picks up all directory activities whether related or not to plugin registration.

Member

vladimirvivien commented Sep 25, 2018

@tossmilestone
As I mentioned in issue, existing in-tree volume plugins were already using that directory as a location to store storage operation artifacts (not just csi). Here is a shortened list of driver locations that use /var/lib/kubelet/plugins in their code already:

And several more locations.

The fix would be to have plugin watcher scan a subdirectory of plugins (say /var/lib/kubelet/plugins/registration) instead of plugins directly as it is doing today.

The watcher functions as intended. Except now it picks up all directory activities whether related or not to plugin registration.

@RenaudWasTaken

This comment has been minimized.

Show comment
Hide comment
@RenaudWasTaken

RenaudWasTaken Sep 25, 2018

Member

Well given the fact that the pluginwatcher is beta this is a bit of an issue. My understanding of the "beta" guarantees is that we want to be backwards compatible for all plugins published with the beta API.

Meaning that even if we move the registration directory to a new location, we'll need to keep scanning the old directory (to pick up v1beta plugins).

From a 10 000 feet point of view it seems easier to me if we move the in-tree volume plugin's data. Though it might be a lot of work.

Another less painful option, but a bit grey is the following: Enforce plugins to end with .sock this suffix is recommended for device plugins (though not enforced) and I believe is a convention on linux.
This is a bit grey because we would still be breaking v1beta users but since we're moving a convention from "recommended" to "enforced" it seems "less" breaking to me (and I haven't seen device plugins not using that extension).

What do you think @vladimirvivien ?

Member

RenaudWasTaken commented Sep 25, 2018

Well given the fact that the pluginwatcher is beta this is a bit of an issue. My understanding of the "beta" guarantees is that we want to be backwards compatible for all plugins published with the beta API.

Meaning that even if we move the registration directory to a new location, we'll need to keep scanning the old directory (to pick up v1beta plugins).

From a 10 000 feet point of view it seems easier to me if we move the in-tree volume plugin's data. Though it might be a lot of work.

Another less painful option, but a bit grey is the following: Enforce plugins to end with .sock this suffix is recommended for device plugins (though not enforced) and I believe is a convention on linux.
This is a bit grey because we would still be breaking v1beta users but since we're moving a convention from "recommended" to "enforced" it seems "less" breaking to me (and I haven't seen device plugins not using that extension).

What do you think @vladimirvivien ?

@vladimirvivien

This comment has been minimized.

Show comment
Hide comment
@vladimirvivien

vladimirvivien Oct 8, 2018

Member

@RenaudWasTaken I like the .sock suffix idea. It can be a recommendation (which csi is already doing).

Member

vladimirvivien commented Oct 8, 2018

@RenaudWasTaken I like the .sock suffix idea. It can be a recommendation (which csi is already doing).

@cpressland

This comment has been minimized.

Show comment
Hide comment
@cpressland

cpressland Oct 9, 2018

Hmm, I've just deployed a new k8s cluster on 1.12.0 and noticed this repeating in the logs on a few of our pods. Just for my own sanity, is this something I should be worried about or can I ignore it until a fix is released?

Oct 09 06:37:46 westeurope-worker-09 kubelet[19149]: E1009 06:37:46.003633   19149 plugin_watcher.go:115] error dial failed at socket /var/lib/kubelet/plugins/kubernetes.io/azure-disk/mounts/m856950613/diagnostic.data/metrics.interim.temp, err: failed to dial socket /var/lib/kubelet/plugins/kubernetes.io/azure-disk/mounts/m856950613/diagnostic.data/metrics.interim.temp, err: context deadline exceeded when handling create event: "/var/lib/kubelet/plugins/kubernetes.io/azure-disk/mounts/m856950613/diagnostic.data/metrics.interim.temp": CREATE
Oct 09 06:37:46 westeurope-worker-09 kubelet[19149]: E1009 06:37:46.003717   19149 plugin_watcher.go:115] error dial failed at socket /var/lib/kubelet/plugins/kubernetes.io/azure-disk/mounts/m856950613/diagnostic.data/metrics.interim, err: failed to dial socket /var/lib/kubelet/plugins/kubernetes.io/azure-disk/mounts/m856950613/diagnostic.data/metrics.interim, err: context deadline exceeded when handling create event: "/var/lib/kubelet/plugins/kubernetes.io/azure-disk/mounts/m856950613/diagnostic.data/metrics.interim": CREATE

cpressland commented Oct 9, 2018

Hmm, I've just deployed a new k8s cluster on 1.12.0 and noticed this repeating in the logs on a few of our pods. Just for my own sanity, is this something I should be worried about or can I ignore it until a fix is released?

Oct 09 06:37:46 westeurope-worker-09 kubelet[19149]: E1009 06:37:46.003633   19149 plugin_watcher.go:115] error dial failed at socket /var/lib/kubelet/plugins/kubernetes.io/azure-disk/mounts/m856950613/diagnostic.data/metrics.interim.temp, err: failed to dial socket /var/lib/kubelet/plugins/kubernetes.io/azure-disk/mounts/m856950613/diagnostic.data/metrics.interim.temp, err: context deadline exceeded when handling create event: "/var/lib/kubelet/plugins/kubernetes.io/azure-disk/mounts/m856950613/diagnostic.data/metrics.interim.temp": CREATE
Oct 09 06:37:46 westeurope-worker-09 kubelet[19149]: E1009 06:37:46.003717   19149 plugin_watcher.go:115] error dial failed at socket /var/lib/kubelet/plugins/kubernetes.io/azure-disk/mounts/m856950613/diagnostic.data/metrics.interim, err: failed to dial socket /var/lib/kubelet/plugins/kubernetes.io/azure-disk/mounts/m856950613/diagnostic.data/metrics.interim, err: context deadline exceeded when handling create event: "/var/lib/kubelet/plugins/kubernetes.io/azure-disk/mounts/m856950613/diagnostic.data/metrics.interim": CREATE
@RenaudWasTaken

This comment has been minimized.

Show comment
Hide comment
@RenaudWasTaken

RenaudWasTaken Oct 9, 2018

Member

@cpressland you can ignore these errors.
@vladimirvivien if it's a recommendation then we'll keep spamming the logs with all the data files.

Another thing to keep in mind, it seems like CSI plugins puts a lot of data (and creates a lot of directories). It might be worth considering limiting the recursion to one level.

I'm concerned we might exhaust the amount of fds by tracking the plugins data.

Member

RenaudWasTaken commented Oct 9, 2018

@cpressland you can ignore these errors.
@vladimirvivien if it's a recommendation then we'll keep spamming the logs with all the data files.

Another thing to keep in mind, it seems like CSI plugins puts a lot of data (and creates a lot of directories). It might be worth considering limiting the recursion to one level.

I'm concerned we might exhaust the amount of fds by tracking the plugins data.

@vladimirvivien

This comment has been minimized.

Show comment
Hide comment
@vladimirvivien

vladimirvivien Oct 10, 2018

Member

@RenaudWasTaken yeah CSI keeps track of volume operation metadata in the plugins dir. This probably will not change.

I understand this is beta. However, besides CSI and Device plugin, are there any other implementers out there for this feature ? Would it be possible to propose a new location (i.e. /reg/?

Member

vladimirvivien commented Oct 10, 2018

@RenaudWasTaken yeah CSI keeps track of volume operation metadata in the plugins dir. This probably will not change.

I understand this is beta. However, besides CSI and Device plugin, are there any other implementers out there for this feature ? Would it be possible to propose a new location (i.e. /reg/?

@vladimirvivien

This comment has been minimized.

Show comment
Hide comment
@vladimirvivien

vladimirvivien Oct 18, 2018

Member

@RenaudWasTaken let me know if you open a PR for this as discussed.

Member

vladimirvivien commented Oct 18, 2018

@RenaudWasTaken let me know if you open a PR for this as discussed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment