Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic Flexvolume plugin discovery, probing with filesystem watch. #50031

Merged
merged 1 commit into from
Aug 26, 2017

Conversation

verult
Copy link
Contributor

@verult verult commented Aug 2, 2017

What this PR does / why we need it: Enables dynamic Flexvolume plugin discovery. This model uses a filesystem watch (fsnotify library), which notifies the system that a probe is necessary only if something changes in the Flexvolume plugin directory.

This PR uses the dependency injection model in #49668.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #51470

Release Note:

Dynamic Flexvolume plugin discovery. Flexvolume plugins can now be discovered on the fly rather than only at system initialization time.

/sig-storage

/assign @jsafrane @saad-ali
/cc @bassam @chakri-nelluri @kokhang @liggitt @thockin

@k8s-ci-robot
Copy link
Contributor

@verult: GitHub didn't allow me to request PR reviews from the following users: kokhang, bassam, chakri-nelluri.

Note that only kubernetes members can review this PR, and authors cannot review their own PRs.

In response to this:

What this PR does / why we need it: Enables dynamic Flexvolume plugin discovery. This model reloads Flexvolume plugins every time the system tries to match a volume with a plugin (e.g. during attach/detach, mount/unmount), which occurs very frequently.

This PR uses the dependency injection model in #49668.

do-not-merge

/release-note-none

/sig-storage

/assign @jsafrane @saad-ali
/cc @bassam @chakri-nelluri @kokhang @liggitt @thockin

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 2, 2017
@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note-label-needed labels Aug 2, 2017
@@ -90,7 +90,7 @@ func NewController(p ControllerParameters) (*PersistentVolumeController, error)
resyncPeriod: p.SyncPeriod,
}

if err := controller.volumePluginMgr.InitPlugins(p.VolumePlugins, controller); err != nil {
if err := controller.volumePluginMgr.InitPlugins(p.VolumePlugins, nil, controller); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add volume.DynamicPluginProber to controllerparameters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK PV controller was never aware of Flexvolume.

@verult verult changed the title WIP: Dynamic Flexvolume plugin discovery, probing with polling. WIP: Dynamic Flexvolume plugin discovery, probing with filesystem watch. Aug 5, 2017
@verult verult force-pushed the ConnectedProbe branch 2 times, most recently from b2accc5 to 7737dfb Compare August 5, 2017 04:37
@verult
Copy link
Contributor Author

verult commented Aug 5, 2017

Added filesystem watch implementation of the prober.

@kokhang
Copy link
Contributor

kokhang commented Aug 8, 2017

@verult I tested this locally and it did work. Kubelet was able to load the Rook flexvolume without having to restart. Here is the log:
I0808 02:22:21.564582 17759 plugins.go:420] Loaded volume plugin "flexvolume-kubernetes.io/rook"

However there are still some concerns that i have listed in kubernetes/community#833. Namely about how to obtain or construct a client config so that the driver can talk to the K8S api.

@verult
Copy link
Contributor Author

verult commented Aug 8, 2017

/cc @msau42

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 9, 2017
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 9, 2017
@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 12, 2017
Copy link
Member

@saad-ali saad-ali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a first pass

@@ -90,7 +90,7 @@ func NewController(p ControllerParameters) (*PersistentVolumeController, error)
resyncPeriod: p.SyncPeriod,
}

if err := controller.volumePluginMgr.InitPlugins(p.VolumePlugins, controller); err != nil {
if err := controller.volumePluginMgr.InitPlugins(p.VolumePlugins, nil, controller); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Whenever you pass in a parameter where it is not obvious what it is, add a small inline comment with the paramter name, for clarity:

	if err := controller.volumePluginMgr.InitPlugins(p.VolumePlugins, nil /* prober */, controller); err != nil {

Also add an inline comment explaining why you are leaving the value nil. e.g. PV controller was never aware of Flexvolume...

}
}

if time.Since(prober.lastUpdated) > time.Second { // Reduce the frequency of probes.
Copy link
Member

@saad-ali saad-ali Aug 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add to const at top of file.

}

if eventOpIs(event, fsnotify.Remove) && eventPathAbs == pluginDirAbs {
// pluginDir needs to exist in order to be watched.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Post a warning to log.

return err
}

if eventOpIs(event, fsnotify.Remove) && eventPathAbs == pluginDirAbs {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment above this block indicating what you're doing and why.

prober.lastUpdated = time.Now()
}

return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a race here if files are dropped quickly in succession, and the read happens before the 2nd event is processed.

Let's make sure to add testing to see how often this might happen, and if it is infrequent put a comment indicating you know about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing this by adding a limit on number of events processed per second, and ignoring events from .dot files.

Users need to install drivers atomically, so if there are ever multiple files in an installation, O(1) events are triggered regardless. Multiple events are triggered only when multiple Flexvolume drivers are installed.

@kokhang
Copy link
Contributor

kokhang commented Aug 17, 2017

I know this is unrelated but what do you recommend to deploy the flexvolume drivers on nodes that are not configured to have the same flexvolumeDriverDir? I followed the proposed example using daemon set. But i was thinking that this would not work in the case where the driverDir is different in other nodes.

@verult
Copy link
Contributor Author

verult commented Aug 17, 2017

@kokhang There was some discussion about this topic here: kubernetes/community#833 (comment)

@k8s-github-robot k8s-github-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 18, 2017
@liggitt
Copy link
Member

liggitt commented Aug 24, 2017

I would recommend reporting the issue to kubeadm as a bug… I wouldn't expect that to be a desired deployment configuration

edit: is the controller-manager making use of flex plugins net new to this PR? If so, it wouldn't be a bug, it would be part of adding this enhancement

@kokhang
Copy link
Contributor

kokhang commented Aug 24, 2017

@liggitt ill contact to the kubeadm folks. But im also wondering if the controller-manager needs to probe for the flexvolume driver as well? Or is this case already handled by this kubelet probe?

@verult
Copy link
Contributor Author

verult commented Aug 24, 2017

In order to perform attach-detach operations, the controller-manager would have to probe for Flexvolume driver. One possible solution for now, while waiting for a response from the kubeadm team, is to use one of controller-manager pod's existing hostpath. I'm not sure if the kubeadm installation uses this but, /etc/srv/kubernetes might be a possibility.

@kokhang
Copy link
Contributor

kokhang commented Aug 24, 2017

In order to perform attach-detach operations, the controller-manager would have to probe for Flexvolume driver.

Does this mean that this dynamic flexvolume discovery fix is only applicable for non attacher-detacher-controller based flexvolume plugins?

@kokhang
Copy link
Contributor

kokhang commented Aug 24, 2017

I just saw the code. It seems you are doing the probe in the controller-manager as well

@saad-ali
Copy link
Member

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 25, 2017
@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 25, 2017
@k8s-github-robot k8s-github-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 25, 2017
@saad-ali
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 25, 2017
@saad-ali saad-ali added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 25, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: saad-ali, verult
We suggest the following additional approvers: derekwaynecarr, mikedanese

Assign the PR to them by writing /assign @derekwaynecarr @mikedanese in a comment when ready.

No associated issue. Update pull-request body to add a reference to an issue, or get approval with /approve no-issue

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 51054, 51101, 50031, 51296, 51173)

@k8s-github-robot k8s-github-robot merged commit 932e07a into kubernetes:master Aug 26, 2017
k8s-github-robot pushed a commit that referenced this pull request Sep 3, 2017
Automatic merge from submit-queue (batch tested with PRs 51805, 51725, 50925, 51474, 51638)

Flexvolume dynamic plugin discovery: Prober unit tests and basic e2e test.

**What this PR does / why we need it**: Tests for changes introduced in PR #50031 .
As part of the prober unit test, I mocked filesystem, filesystem watch, and Flexvolume plugin initialization.
Moved the filesystem event goroutine to watcher implementation.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #51147

**Special notes for your reviewer**:
First commit contains added functionality of the mock filesystem.
Second commit is the refactor for moving mock filesystem into a common util directory.
Third commit is the unit and e2e tests.

**Release note**:

```release-note
NONE
```
/release-note-none
/sig storage
/assign @saad-ali @liggitt 
/cc @mtaufen @chakri-nelluri @wongma7
dims pushed a commit to dims/kubernetes that referenced this pull request Feb 8, 2018
Automatic merge from submit-queue (batch tested with PRs 51054, 51101, 50031, 51296, 51173)

Dynamic Flexvolume plugin discovery, probing with filesystem watch.

**What this PR does / why we need it**: Enables dynamic Flexvolume plugin discovery. This model uses a filesystem watch (fsnotify library), which notifies the system that a probe is necessary only if something changes in the Flexvolume plugin directory.

This PR uses the dependency injection model in kubernetes#49668.

**Release Note**:
```release-note
Dynamic Flexvolume plugin discovery. Flexvolume plugins can now be discovered on the fly rather than only at system initialization time.
```

/sig-storage

/assign @jsafrane @saad-ali 
/cc @bassam @chakri-nelluri @kokhang @liggitt @thockin
@verult verult deleted the ConnectedProbe branch March 24, 2018 01:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flexvolume Dynamic Plugin Discovery
8 participants