-
Notifications
You must be signed in to change notification settings - Fork 39.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add logic for initializing CSINode on Node startup for CSI Migration [Replaces #70909] #74835
Conversation
Please only look at most recent commit, the others are dependencies from xing that will be merged before this one It is only |
/remove-kind api-change |
e214c4e
to
495e78b
Compare
/cc @cheftako |
495e78b
to
f251b08
Compare
f251b08
to
537b8e1
Compare
adcHost, ok := og.volumePluginMgr.Host.(volume.AttachDetachVolumeHost) | ||
if !ok { | ||
// This function is running not on the AttachDetachController | ||
// We assume that Kubelet is servicing this function and therefore is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this saying that if the Host doesn't implement CSINodeLister()
we assume we're running on the kubelet? that seems like a leap, for a few reasons:
- the
AttachDetachVolumeHost
interface could easily be satisfied by things that are not the attachdetachcontroller. if we want positive identification you need a method that is attachdetach-specific - not implementing
AttachDetachVolumeHost
doesn't mean we're a kubelet, it just means we're not the attachdetach host.
can you clarify what the intended checks/responses are?
- if we don't have a node name, return false?
- if the volume can't be migrated, return false?
- if the feature is disabled, return false?
- if we're on the kubelet, return true to defer to
useCSIPlugin
? - if we're not on the kubelet, but have access to CSINodes, check that (and return false on lookup errors)?
- if we're not on the kubelet, and don't have access to CSINodes, ... do what? I'd have expected to return false to be safe, but this code appears to be returning true in this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good point. It always slips my mind that go interfaces are not strict when it comes to who is implementing what interfaces.
The function's intention is to run (and be useful on) ONLY the AttachDetachController to check whether the Kubelet has the volume plugin migrated on that particular Node for that particular volume.
intended checks and reasoning:
- If we don't have Node Name -> return an error as this is unexpected
- If the volume can't be migrated -> return false because it is assumed Kubelet volume also can't be migrated (if this is not true it's undefined behavior we do not support Kubelet version > master version)
- If the feature is disabled -> return false because it is assumed Kubelet volume also should have feature disabled (if this is not true it's undefined behavior we do not support Kubelet version > master version or this skew)
- If we're on Kubelet [Unclear whether
kubeClient== nil
is the perfect check for that] -> return true because we do not need to "reach from ADC to Kubelet" and defer touseCSIPlugin
which will check feature flags/translation lib etc. - If we're on ADC and have access to CSINodes -> Check the annotation, return error on API lookup (will retry in outer loop), return false if plugin is not found in annotation
- If we're not on the Kubelet and don't have access to CSINodes -> Should be returning an error
I think the most suspect area (which is why I commented it heavily) is the intended behavior here:
kubeClient := og.volumePluginMgr.Host.GetKubeClient()
if kubeClient == nil {
// Assume standalone kubelet mode. In which case the we assume there is no CSINode
// to check and defer to useCSIPlugin for picking the backend
return true, nil
}
adcHost, ok := og.volumePluginMgr.Host.(volume.AttachDetachVolumeHost)
if !ok {
// This function is running not on the AttachDetachController
// We assume that Kubelet is servicing this function and therefore is
// defer to useCSIPlugin for picking the backend
return true, nil
}
csiNode, err := adcHost.CSINodeLister().Get(string(nodeName))
if err != nil {
return false, err
}
What is meant to be captured is:
- If in standalone Kubelet mode, then
nodeUsingCSIPlugin == useCSIPlugin
since we are on the node (except we don't want to use CSINode to check) so we just return true and defer touseCSIPlugin
. - The sketchy one: If for some reason we're not running on the AttachDetachController (determined by type assertion) we again defer to
useCSIPlugin
for the same reason.- I guess the sketchy assumption here is that implicitly we know this function will only be called during Attach/Detach which can only be serviced by either ADC (in normal case) or Kubelet (if
enableControllerAttachDetach
is enabled)
- I guess the sketchy assumption here is that implicitly we know this function will only be called during Attach/Detach which can only be serviced by either ADC (in normal case) or Kubelet (if
- If all of the above is not true (we know we are running in Attach or Detach on some entity with access to CSINode) and yet we cannot
GET
theCSINode
we care about, this is considered an error and will need to be retried in an outer loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would _, isKubelet := og.volumePluginMgr.Host.(volume.KubeletVolumeHost)
be a more definitive check of whether we were on the kubelet?
If for some reason we're not running on the AttachDetachController (determined by type assertion) we again defer to use CSIPlugin for the same reason.
It seems like if we're running anywhere other than the kubelet, we need to have access to check CSINodes. The only place other than the kubelet happens to be the attachdetachcontroller today, but would it be more robust to require CSINode for all non-kubelet callers?
if _, isKubelet := og.volumePluginMgr.Host.(volume.KubeletVolumeHost); isKubelet {
// We assume that Kubelet is servicing this function and therefore
// defer to useCSIPlugin for picking the backend
return true, nil
}
csiNodeLister, ok := og.volumePluginMgr.Host.(volume.CSINodeListerHost)
if !ok {
return false, fmt.Errorf("need access to CSINode objects to determine if the node is migrated to CSI")
}
csiNode, err := csiNodeLister.CSINodeLister().Get(string(nodeName))
if err != nil {
return false, err
}
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, if we are on the kubelet, does useCSIPlugin
include checks of whether the driver is installed/available (since returning early skips those checks in nodeUsingCSIPlugin)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like if we're running anywhere other than the kubelet, we need to have access to check CSINodes.
Don't think this is true actually. It's more of an ADC specific thing. Other storage components that run outside of Kubelet (pv controller, expand controller) will not need to make this check (as it is just for syncing attach/detach with mount/unmount). Therefore, I think this check needs to be done exclusively on the Attach Detach Controller, not not on the Kubelet.
also, if we are on the kubelet, does useCSIPlugin include checks of whether the driver is installed/available (since returning early skips those checks in nodeUsingCSIPlugin)?
This is a good point. Kubelet will not check whether the driver is installed, only the whether the feature flags are flipped. The only benefit of checking for driver installation on the Kubelet is to be able to throw a better error message if the driver is not installed like this one: fmt.Errorf("in-tree plugin %s is migrated on node %s but driver %s is not installed", pluginName, string(nodeName), driverName)
.
Right now without the check the error will just be the generic CSI Driver not installed mount error.
However, it is questionable to me whether it's worth adding all the infrastructure for a Kubelet CSINode informer and extra logic right now in order to facilitate that. I could add a TODO to follow up on that? LMK if you think it is necessary to get in right now. @msau42 for opinions as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the existing "csi driver not found" error during mount is sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other storage components that run outside of Kubelet (pv controller, expand controller) will not need to make this check (as it is just for syncing attach/detach with mount/unmount). Therefore, I think this check needs to be done exclusively on the Attach Detach Controller, not not on the Kubelet.
ok, then I think the current code is reasonable, but would recommend making the AttachDetachVolumeHost interface stronger (adding a marker IsAttachDetachController()
method or something)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. Wish there was a better way to do strict interface assertions but this is an ok workaround. Thanks for digging deeper into the logic here 👍
a91ec25
to
f97807f
Compare
/approve lgtm otherwise, will let sig-storage have final tag |
b2105ea
to
84176b6
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: davidz627, liggitt, msau42, Random-Liu The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
/lgtm |
/retest |
84176b6
to
41b3579
Compare
/lgtm |
@davidz627: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
This PR adds the logic for initializing CSINode with an annotation including migrated plugins used for migration during node startup. This is necessary so that the Attach Detach controller can see the availability/pluginmechanism matrix when making decisions about how to deal with volumes.
It also adds logic in a seperate commit for the ADC to fallback to using the in-tree mechanism if the annotation is not present
Replaces: #70909
I've also addressed the outstanding comments on the previous PR, resolved most of the questions in code
/kind feature
/assign @jsafrane @liggitt @yujuhong
/cc @leakingtapan @ddebroy
Fixes #70514