New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes zone/region labels setup and kubelet stucking on startup if credentials stored in secret for legacy vSphere cloudprovider. #101028
Fixes zone/region labels setup and kubelet stucking on startup if credentials stored in secret for legacy vSphere cloudprovider. #101028
Conversation
|
Hi @lobziik. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/ok-to-test |
| func (vs *VSphere) GetZone(ctx context.Context) (cloudprovider.Zone, error) { | ||
| nodeName, err := vs.CurrentNodeName(ctx, vs.hostName) | ||
| if err != nil { | ||
| klog.Errorf("Cannot get node name.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| klog.Errorf("Cannot get node name.") | |
| klog.ErrorS(nil, "Cannot get node name.") |
|
Is it necessary to structured log migration? |
|
@yangjunmyfm192085 Dunno, in my opinion logs format migration should be done for entire module. Would prefer to keep it consistent with other provider's code for now. |
|
/retest |
|
@andrewsykim @cheftako, Hello! Could you please take a brief look on this please? Is this approach reasonable? :) |
| @@ -894,7 +901,16 @@ func (vs *VSphere) LoadBalancer() (cloudprovider.LoadBalancer, bool) { | |||
| } | |||
|
|
|||
| func (vs *VSphere) isZoneEnabled() bool { | |||
| return vs.cfg != nil && vs.cfg.Labels.Zone != "" && vs.cfg.Labels.Region != "" | |||
| isEnabled := vs.cfg != nil && vs.cfg.Labels.Zone != "" && vs.cfg.Labels.Region != "" | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really both zone and region need to be set or just one of zone and region needs to be set? (I see that was the previous logic, so no problems going forward like that, just checking)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://vmware.github.io/vsphere-storage-for-kubernetes/documentation/zones.html this doc tells that both need to be set. Case with one parameter does not described at all.
|
Happy to approve but would like to see one of the primary vsphere contributors lgtm first. |
|
@divyenpatel @andrewsykim @SandeepPissay Hello! Could you please take a look on this please? Would be awesome to get a feedback from vsphere maintainers :) |
|
@lobziik unfortunately I do not have bandwidth to review code changes in legacy vSphere cloud provider. Its also been a while I looked at that code. |
@SandeepPissay our mutal customers would like to be able to support zones. If you are unable to review is there someone else in the reviewer/approver list that can? |
|
/assign |
2030d91
to
bfdbfa2
Compare
|
Updated pr a bit, what was done:
@andrewsykim, official, i think. I introduced client instantiation here: https://github.com/kubernetes/kubernetes/pull/101028/files#diff-5000d45019379218cc35bc5f967f2731b4dbff230399361c8f6b9a8af3e7e1f0R277 |
|
/test pull-kubernetes-integration |
Fwiw I think this is just used as the user-agent in the client and not for RBAC |
|
Thanks for reminding Andrew!
Ok so it means the client will still use the kubelet's service account? Let me check again |
|
Looking at the code of ClientOrDie here, looks like the input will be used as service account name to create config kubernetes/staging/src/k8s.io/controller-manager/pkg/clientbuilder/client_builder_dynamic.go Lines 121 to 129 in bfdbfa2
|
|
Ah you're right @lubronzhan, I was thinking of the simple client builder: kubernetes/staging/src/k8s.io/controller-manager/pkg/clientbuilder/client_builder.go Lines 44 to 48 in bfdbfa2
|
We no longer support bootstrapping role/rolebindings specific to cloud providers, these should be handled externally |
|
If the role/rolebinding is handled externally, could we run into this issue?
|
|
@lubronzhan, I went through the code again and yes, seems this will engage only once - during new Node handling within KCM... I will need check this again from kubelet perspective, poked around quite a while ago and don't exactly remember how it behaves with secret there. |
|
I dug into kubelet code, and, as far as I can say there is only one place where zone labels populates: kubernetes/pkg/kubelet/kubelet_node_status.go Line 406 in bad4faf
In other words, yes, in case of not enough permissions this what would happen exactly.
|
|
I attempted to fix the case with not sufficient permissions in 3dfc011 |
|
Hi @lobziik Right now the node label will be set every time when node is updated. |
Hi! For some reason i thought that this handler would be triggered time to time, during status update/heartbeat, but seems not. So, yes, in current variant node update need to be to be triggered somehow for reconcile zone labels. p.s. made this handler run every 5 minutes in cd1d530 for avoid necessity to trigger node update. cc WDYT @lubronzhan ? |
In the case of `vsphere-legacy-cloud-provider` client has insufficient permissions to update Node after its addition, this handler will attempt to populate Nodes topology labels later.
3dfc011
to
cd1d530
Compare
|
Gentle ping @lubronzhan, @andrewsykim :) |
|
Sorry I missed the email notification. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cheftako, lobziik, lubronzhan The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Starting kube 1.24, legacy vsphere cloud-provider requires permission to update nodes in order to set topology related labels. For more info see kubernetes/kubernetes#101028
Starting kube 1.24, legacy vsphere cloud-provider requires permission to update nodes in order to set topology related labels. For more info see kubernetes/kubernetes#101028
What this PR does / why we need it:
Disables attempts of obtaining zones within kubelet during initial node registration by vSphere provider if credentials stored in secret.
Setting zone and region labels for node moved to KCM in such case.
If credentials stored in cloud-provider config file as plaintext current behaviour does not change.
Which issue(s) this PR fixes:
Fixes #75175. Kubelet does not stucking on startup with this patch. Zone labels populates for nodes during KCM startup.
Notes:
For proper functioning ClusterRole + ClusterRoleBinding need to be created if RBAC is in use. This should be documented somewhere, would be awesome if anybody will point me a good place for this. In case of lack of permissions labels will not be set.
What type of PR is this?
/kind bug
/sig cloud-provider
/assing @andrewsykim
Does this PR introduce a user-facing change?