New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSI: Skip attach/detach for non-attachable volumes #2523
Conversation
466cc38
to
cf360f6
Compare
|
||
In all cases listed above, a non-attachable CSI driver becomes "attachable" (i.e. requires external attacher). Downgrade does not affect attachable CSI drivers, both "old" and "new" Kubernetes processes them in the same way. | ||
|
||
For non-attachable volumes, if the volume was mounted by "new" Kubernetes, it has no VolumeAttachment instance. "Old" A/D controller does not know about it. However, it will periodically call plugin's `VolumesAreAttached()` that checks for `VolumeAttachment` presence. Volumes without `VolumeAttachment` will be reported as not attached and A/D controller will call `Attach()` on these. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, the VolumeAttachment object will be created, but the CSI driver does not have external-attacher running, so we will be calling Attach() forever?
Or do we require that if you downgrade Kubernetes, you also downgrade the CSI driver?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you downgrade Kubernetes (or delete CSIDriver instance or disable the alpha feature), you're required to run an external CSI attacher for non-attachable CSI drivers, just like before this proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a note about external attacher.
|
||
* A/D controller will not create `VolumeAttachment` instances for non-attachable volumes. Etcd load will be reduced. | ||
|
||
* On the other hand, all CSI volumes still must go though A/D controller. A/D controller **must** process every CSI volume and kubelet **must** wait until A/D controller marks a volume as attached, even if A/D controller basically does nothing. All CSI volumes must be added to `Node.Status.VolumesInUse` and `Node.Status.VolumesAttached`. This does not introduce any new API calls, all this is already implemented, however this proposal won't reduce `Node.Status` update frequency in any way. If *all* volumes will move to CSI eventually, pod startup will be slower than when using in-tree volume plugins that don't go through A/D controller. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Node object will also get much bigger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a note about it.
Fixed the comments. And renamed |
|
||
### Kubelet / VolumeManager | ||
* Kubelet must create a (shared) informer to watch CSIDriver and pass it to CSI volume plugin in [`ProbeVolumePlugins`](https://github.com/kubernetes/kubernetes/blob/8db5328c4c1f9467ab0d70ccb991a12d4675b6a7/cmd/kubelet/app/plugins.go#L101). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may have to update RBAC policies so as kubelet can read that CR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and NodeAuthorizer. if the access is read-only, updating the system:node
role's rules would be sufficient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kubelet needs only read-only access to CSIDrivers, IMO we don't need to update NodeAuthorizer.
|
||
In all cases listed above, an "attachable" CSI driver becomes non-attachable. Upgrade does not affect attachable CSI drivers, both "old" and "new" Kubernetes processes them in the same way. | ||
|
||
For non-attachable volumes, if the volume was attached by "old" Kubernetes (or "new" Kubernetes before CSIDriver instance was created), it has `VolumeAttachment` instance. It will be deleted by `Detach()`, as it deletes `VolumeAttachment` instance also for non-attachable volumes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a problem when controller-manager is newer but kubelet is older? The controller-manager will not create VolumeAttachment
object but kubelet will be stuck waiting for WaitForAttach
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Admin should not enable the alpha feature in this case, I think. Can we solve it by documentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about for beta/GA? The standard upgrade procedure is to update master first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or is the requirement to not update your drivers until both master+nodes are beyond some version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that whatever creates CSIDriver instances should create it only when both A/D controller and kubelet understand it. I.e. they're both at the right version and they have necessary alpha features enabled.
On downgrade, A/D controller gets updated first and it starts creating VolumeAttachment instances. It's up to the admin to start external attacher in this case, because CSI required it at that old version (or version with disabled alpha features).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a few questions. Main concerns is complexity introduced to avoid racing conditions and up/down grades.
* In addition to the above, `Detach()` removes `VolumeAttachment` instance even if the volume is not attachable. This deletes `VolumeAttachment` instances created by old A/D controller or before `CSIDriver` instance was created. | ||
|
||
### A/D controller | ||
* A/D controller must pass a (shared) informer to watch CSIDriver and pass it to CSI volume plugin in [`ProbeVolumePlugins`](https://github.com/kubernetes/kubernetes/blob/8db5328c4c1f9467ab0d70ccb991a12d4675b6a7/cmd/kube-controller-manager/app/plugins.go#L82). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying to inject an informer from the A/D controller loop into the CSIPlugin.NewAttacher
during plugin boot up? If so, that sounds a bit leaky where controller concerns making their way to the plugin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll rephrase it, it does not need to be a/d controller, it can be kube-controller-manager itself. And I tried to add a parameter with the informer to ProbeVolumePlugins()
and it's too early. kubelet does not have fully established API client at that time. So I'll either create a new non-shared informer in Init()
of the plugin or add it to VolumeHost
.
* A/D controller must pass a (shared) informer to watch CSIDriver and pass it to CSI volume plugin in [`ProbeVolumePlugins`](https://github.com/kubernetes/kubernetes/blob/8db5328c4c1f9467ab0d70ccb991a12d4675b6a7/cmd/kube-controller-manager/app/plugins.go#L82). | ||
|
||
### Kubelet / VolumeManager | ||
* Kubelet must create a (shared) informer to watch CSIDriver and pass it to CSI volume plugin in [`ProbeVolumePlugins`](https://github.com/kubernetes/kubernetes/blob/8db5328c4c1f9467ab0d70ccb991a12d4675b6a7/cmd/kubelet/app/plugins.go#L101). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, the Kubelet is creating the informer and pass it to the CSI volume plugin. I am guessing the informer is created in a place like CSIVolumePlugin.Init()
and made available to ProbeVolumePlugin
. If so, that same informer could be made available for the CSIPlugin.{Attacher|Detacher}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the informer will be in csiPlugin
struct, so it's available for Mounter and Attacher. I need to figure out who creates the informer, see above.
* If CSIDriver for the driver does not exist, perform usual logic (i.e. treat the volume as attachable). | ||
* This keeps the behavior the same as in old Kubernetes version without CSIDriver object. | ||
* This also happens when CSIDriver informer has not been quick enough. It is suggested that CSIDriver instance is created **before** any pod that uses corresponding CSI driver can run. | ||
* In case that CSIDriver informer (or user) is too slow, CSI volume plugin `Attach()` will create `VolumeAttachment` instance and wait for (non-existing) external attacher to fulfill it. The CSI plugin shall recover when `CSIDriver` instance is created and skip attach. Any `VolumeAttachment` instance created here will be deleted on `Detach()`, see the next bullet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What situation would cause a driver not to have a CSIDriver object representing it? We should try not to have this racing and synchronization case. Resolution seems complex. Now I understand why @saad-ali was thinking of having a CSIDriver.Status.Ready
flag.
Also, maybe the ideal thing to have is the CSI registrar create the CSIDriver
, instead of Human, which would guarantee that CSIDriver object does not show up before CSI Driver is ready. Once feature is introduced, I think CSIDriver object should always be coupled with a running/deployed driver it describes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@saad-ali wants CSIDriver instances optional. So if admin decides that they don't want the instance for a driver, Kubernetes must work. Admin can create the instance at any later time and kubernetes must carry on. Admin can even delete the instance and kubernetes and all running pods must survive.
I'd like to have the instance mandatory and immutable (users can't switch driver from attachable to non-attachable), but that's perhaps unrealistic requirement.
|
||
## Alternatives considered | ||
A/D controller and kubelet can be easily extended to check if a given volume is attachable. This would make mounting of non-attachable volumes easier, as kubelet would not need to wait for A/D controller to mark the volume as attached. However, there would be issues when upgrading or downgrading Kubernetes (or marking CSIDriver as attachable or non-attachable, which has basically the same handling). | ||
* On upgrade (i.e. a previously attachable CSI volume becomes non-attachable), A/D controller could discover that an attached volume is not attachable any longer. A/D controller could clean up `Node.Status.VolumesAttached`, but since A/D controller does not know anything about `VolumeAttachment`, we would either need to introduce a new volume plugin call to clean it up in CSI volume plugin, or something else would need to clean it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a realistic scenario ? Can a driver that was attachable becomes non-attachable? That sounds like a misconfiguration. Again, maybe the feature should be gated during alpha/beta forcing users to buy into the CSIDriver registry thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can a driver that was attachable becomes non-attachable?
Yes, admin creates / modifies CSIDriver for the driver.
Or admin enables alpha feature that gates the CSIDriver logic in Kubernetes. This one is particularly interesting, because there will be some (short?) time when the feature is enabled on some components and disabled on others.
## Alternatives considered | ||
A/D controller and kubelet can be easily extended to check if a given volume is attachable. This would make mounting of non-attachable volumes easier, as kubelet would not need to wait for A/D controller to mark the volume as attached. However, there would be issues when upgrading or downgrading Kubernetes (or marking CSIDriver as attachable or non-attachable, which has basically the same handling). | ||
* On upgrade (i.e. a previously attachable CSI volume becomes non-attachable), A/D controller could discover that an attached volume is not attachable any longer. A/D controller could clean up `Node.Status.VolumesAttached`, but since A/D controller does not know anything about `VolumeAttachment`, we would either need to introduce a new volume plugin call to clean it up in CSI volume plugin, or something else would need to clean it. | ||
* On downgrade (i.e. a previously non-attachable CSI volume becomes attachable), kubelet must discover that already mounted volume has changed from non-attachable to attachable and put it into `Node.Status.VolumesInUse`. This would race with A/D controller detaching the volume when a pod was deleted at the same time a CSIDriver instance was made attachable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here, is that possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, admin deletes CSIDriver instance. Or changes it to "attachable". Or disables the alpha feature. Or downgrades Kubernetes to version that does not implement CSIDriver logic. From our point of view (A/D controller + kubelet) , a driver that was not attachable becomes attachable
* GA: 1.14 | ||
|
||
## Alternatives considered | ||
A/D controller and kubelet can be easily extended to check if a given volume is attachable. This would make mounting of non-attachable volumes easier, as kubelet would not need to wait for A/D controller to mark the volume as attached. However, there would be issues when upgrading or downgrading Kubernetes (or marking CSIDriver as attachable or non-attachable, which has basically the same handling). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this issue is introduced as alpha
and feature-gated
, that should help avoid misconfigurations right? Meaning, this feature will only work when enabled
along with a properly-configured CSIDriver object (and supporting code) is running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sure we need to support downgrade to older Kubernetes that did not have this feature. And IMO we should also support disabling the feature in a running cluster.
Added some clarification about informer creation and upgrade/downgrade in "Alternatives considered". |
Is there anything missing in this proposal? I'm working on proof-of-concept and it looks good so far. Creating / deleting CSIDriver instances while pod with non-attachable CSI volumes (surprisingly ;-) ) works. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: saad-ali The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Automatic merge from submit-queue (batch tested with PRs 68161, 68023, 67909, 67955, 67731). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md. CSI: skip attach for non-attachable drivers **What this PR does / why we need it**: This is implementation of kubernetes/community#2523. CSI volumes that don't need attach/detach now don't need external attacher running. WIP: * contains #67803 to get CSIDriver API. Ignore the first commit. * ~~missing e2e test~~ /sig storage cc: @saad-ali @vladimirvivien @verult @msau42 @gnufied @davidz627 **Release note**: ```release-note CSI volume plugin does not need external attacher for non-attachable CSI volumes. ```
CSI: Skip attach/detach for non-attachable volumes
In this proposal, we describe a way how to skip creation of
VolumeAttachments
for non-attachable CSI drivers.Benefits:
This PR depends on
CSIPlugin
CSIDriver
introduced in #2514