Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent reconciliation if CSINodeTopology instance is already at Success state #1906

Merged

Conversation

shalini-b
Copy link
Collaborator

@shalini-b shalini-b commented Aug 4, 2022

What this PR does / why we need it: The syncer container reconciles an already successful CSINodeTopology instance if it is restarted and this may result in changes to the CSINodeTopology instance if the node has migrated to another topology domain. This can cause confusion in the environment and nodes start crashing as node labels cannot be updated once created.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Testing done:
Tested that the node daemonset pods do not crash and syncer throws the following logs:

{"level":"info","time":"2022-08-04T17:30:35.672631716Z","caller":"csinodetopology/csinodetopology_controller.go:250","msg":"CSINodeTopology instance with name \"worker2\" is already at \"Success\" state. No need to reconcile further."}
{"level":"info","time":"2022-08-04T17:30:35.674919416Z","caller":"csinodetopology/csinodetopology_controller.go:250","msg":"CSINodeTopology instance with name \"worker3\" is already at \"Success\" state. No need to reconcile further."}
{"level":"info","time":"2022-08-04T17:30:35.6759921Z","caller":"csinodetopology/csinodetopology_controller.go:250","msg":"CSINodeTopology instance with name \"master1\" is already at \"Success\" state. No need to reconcile further."}
{"level":"info","time":"2022-08-04T17:30:35.676984253Z","caller":"csinodetopology/csinodetopology_controller.go:250","msg":"CSINodeTopology instance with name \"master2\" is already at \"Success\" state. No need to reconcile further."}
{"level":"info","time":"2022-08-04T17:30:35.677868338Z","caller":"csinodetopology/csinodetopology_controller.go:250","msg":"CSINodeTopology instance with name \"master3\" is already at \"Success\" state. No need to reconcile further."}
{"level":"info","time":"2022-08-04T17:30:35.680529663Z","caller":"csinodetopology/csinodetopology_controller.go:250","msg":"CSINodeTopology instance with name \"worker1\" is already at \"Success\" state. No need to reconcile further."}

Thanks to @divyenpatel for helping with the testing.

Tested deleting CSINodeTopology instances and deleting all CSI pods. New CSI pods came up without any issue.

$kubectl delete csinodetopology --all
csinodetopology.cns.vmware.com "master1" deleted
csinodetopology.cns.vmware.com "master2" deleted
csinodetopology.cns.vmware.com "master3" deleted
csinodetopology.cns.vmware.com "worker1" deleted
csinodetopology.cns.vmware.com "worker2" deleted
csinodetopology.cns.vmware.com "worker3" deleted

$kubectl delete pods --all --namespace=vmware-system-csi
pod "vsphere-csi-controller-59f7cc8d7f-498v9" deleted
pod "vsphere-csi-controller-59f7cc8d7f-bq5fc" deleted
pod "vsphere-csi-controller-59f7cc8d7f-fcpfb" deleted
pod "vsphere-csi-node-9pttq" deleted
pod "vsphere-csi-node-b7fk2" deleted
pod "vsphere-csi-node-h5vcb" deleted
pod "vsphere-csi-node-jc9z8" deleted
pod "vsphere-csi-node-rz4rb" deleted
pod "vsphere-csi-node-zkpcb" deleted

$kubectl get pods -o wide --namespace=vmware-system-csi
NAME                                      READY   STATUS    RESTARTS       AGE     IP               NODE      NOMINATED NODE   READINESS GATES
vsphere-csi-controller-59f7cc8d7f-gdv2p   7/7     Running   0              6m39s   10.244.0.137     master2   <none>           <none>
vsphere-csi-controller-59f7cc8d7f-nrjs9   7/7     Running   2 (5m1s ago)   6m39s   10.244.0.7       master3   <none>           <none>
vsphere-csi-controller-59f7cc8d7f-xzglb   7/7     Running   0              6m39s   10.244.0.70      master1   <none>           <none>
vsphere-csi-node-2zkcp                    3/3     Running   0              6m6s    10.191.163.130   master3   <none>           <none>
vsphere-csi-node-54nn2                    3/3     Running   0              6m6s    10.191.162.199   worker1   <none>           <none>
vsphere-csi-node-bkqzd                    3/3     Running   0              6m6s    10.191.170.77    master2   <none>           <none>
vsphere-csi-node-dwlnf                    3/3     Running   0              6m5s    10.191.171.97    worker2   <none>           <none>
vsphere-csi-node-r6tzj                    3/3     Running   0              6m6s    10.191.172.134   worker3   <none>           <none>
vsphere-csi-node-spgzf                    3/3     Running   0              6m7s    10.191.174.245   master1   <none>           <none>

Special notes for your reviewer:

Release note:

Prevent reconciliation if CSINodeTopology instance is already at Success state

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Aug 4, 2022
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 4, 2022
@divyenpatel
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Aug 4, 2022
@divyenpatel
Copy link
Member

In an environment where the datastore is not replicated and no shared datastores available across both zones, if Node failover to different az, and still report it belongs to original AZ, we will have issues provisioning volumes after failover.

So we need to make sure to document that unless customer has this specific setup, the node should never failover to another az.

@divyenpatel
Copy link
Member

/approve
/lgtm

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: divyenpatel, shalini-b

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [divyenpatel,shalini-b]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 5, 2022
@k8s-ci-robot k8s-ci-robot merged commit 31b2939 into kubernetes-sigs:master Aug 5, 2022
shalini-b added a commit to shalini-b/vsphere-csi-driver that referenced this pull request Aug 5, 2022
shalini-b added a commit to shalini-b/vsphere-csi-driver that referenced this pull request Aug 10, 2022
shalini-b added a commit to shalini-b/vsphere-csi-driver that referenced this pull request Aug 10, 2022
shalini-b added a commit to shalini-b/vsphere-csi-driver that referenced this pull request Aug 10, 2022
adikul30 pushed a commit to adikul30/vsphere-csi-driver that referenced this pull request Sep 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants