Prevent reconciliation if CSINodeTopology instance is already at Success state #1906

shalini-b · 2022-08-04T03:07:23Z

What this PR does / why we need it: The syncer container reconciles an already successful CSINodeTopology instance if it is restarted and this may result in changes to the CSINodeTopology instance if the node has migrated to another topology domain. This can cause confusion in the environment and nodes start crashing as node labels cannot be updated once created.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Testing done:
Tested that the node daemonset pods do not crash and syncer throws the following logs:

{"level":"info","time":"2022-08-04T17:30:35.672631716Z","caller":"csinodetopology/csinodetopology_controller.go:250","msg":"CSINodeTopology instance with name \"worker2\" is already at \"Success\" state. No need to reconcile further."}
{"level":"info","time":"2022-08-04T17:30:35.674919416Z","caller":"csinodetopology/csinodetopology_controller.go:250","msg":"CSINodeTopology instance with name \"worker3\" is already at \"Success\" state. No need to reconcile further."}
{"level":"info","time":"2022-08-04T17:30:35.6759921Z","caller":"csinodetopology/csinodetopology_controller.go:250","msg":"CSINodeTopology instance with name \"master1\" is already at \"Success\" state. No need to reconcile further."}
{"level":"info","time":"2022-08-04T17:30:35.676984253Z","caller":"csinodetopology/csinodetopology_controller.go:250","msg":"CSINodeTopology instance with name \"master2\" is already at \"Success\" state. No need to reconcile further."}
{"level":"info","time":"2022-08-04T17:30:35.677868338Z","caller":"csinodetopology/csinodetopology_controller.go:250","msg":"CSINodeTopology instance with name \"master3\" is already at \"Success\" state. No need to reconcile further."}
{"level":"info","time":"2022-08-04T17:30:35.680529663Z","caller":"csinodetopology/csinodetopology_controller.go:250","msg":"CSINodeTopology instance with name \"worker1\" is already at \"Success\" state. No need to reconcile further."}

Thanks to @divyenpatel for helping with the testing.

Tested deleting CSINodeTopology instances and deleting all CSI pods. New CSI pods came up without any issue.

$kubectl delete csinodetopology --all
csinodetopology.cns.vmware.com "master1" deleted
csinodetopology.cns.vmware.com "master2" deleted
csinodetopology.cns.vmware.com "master3" deleted
csinodetopology.cns.vmware.com "worker1" deleted
csinodetopology.cns.vmware.com "worker2" deleted
csinodetopology.cns.vmware.com "worker3" deleted

$kubectl delete pods --all --namespace=vmware-system-csi
pod "vsphere-csi-controller-59f7cc8d7f-498v9" deleted
pod "vsphere-csi-controller-59f7cc8d7f-bq5fc" deleted
pod "vsphere-csi-controller-59f7cc8d7f-fcpfb" deleted
pod "vsphere-csi-node-9pttq" deleted
pod "vsphere-csi-node-b7fk2" deleted
pod "vsphere-csi-node-h5vcb" deleted
pod "vsphere-csi-node-jc9z8" deleted
pod "vsphere-csi-node-rz4rb" deleted
pod "vsphere-csi-node-zkpcb" deleted

$kubectl get pods -o wide --namespace=vmware-system-csi
NAME                                      READY   STATUS    RESTARTS       AGE     IP               NODE      NOMINATED NODE   READINESS GATES
vsphere-csi-controller-59f7cc8d7f-gdv2p   7/7     Running   0              6m39s   10.244.0.137     master2   <none>           <none>
vsphere-csi-controller-59f7cc8d7f-nrjs9   7/7     Running   2 (5m1s ago)   6m39s   10.244.0.7       master3   <none>           <none>
vsphere-csi-controller-59f7cc8d7f-xzglb   7/7     Running   0              6m39s   10.244.0.70      master1   <none>           <none>
vsphere-csi-node-2zkcp                    3/3     Running   0              6m6s    10.191.163.130   master3   <none>           <none>
vsphere-csi-node-54nn2                    3/3     Running   0              6m6s    10.191.162.199   worker1   <none>           <none>
vsphere-csi-node-bkqzd                    3/3     Running   0              6m6s    10.191.170.77    master2   <none>           <none>
vsphere-csi-node-dwlnf                    3/3     Running   0              6m5s    10.191.171.97    worker2   <none>           <none>
vsphere-csi-node-r6tzj                    3/3     Running   0              6m6s    10.191.172.134   worker3   <none>           <none>
vsphere-csi-node-spgzf                    3/3     Running   0              6m7s    10.191.174.245   master1   <none>           <none>

Special notes for your reviewer:

Release note:

Prevent reconciliation if CSINodeTopology instance is already at Success state

…ess state

divyenpatel · 2022-08-04T05:10:14Z

/ok-to-test

divyenpatel · 2022-08-04T15:11:56Z

In an environment where the datastore is not replicated and no shared datastores available across both zones, if Node failover to different az, and still report it belongs to original AZ, we will have issues provisioning volumes after failover.

So we need to make sure to document that unless customer has this specific setup, the node should never failover to another az.

divyenpatel · 2022-08-05T14:48:27Z

/approve
/lgtm

k8s-ci-robot · 2022-08-05T14:48:38Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: divyenpatel, shalini-b

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [divyenpatel,shalini-b]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

…ess state (kubernetes-sigs#1906)

…ess state (#1906) (#1910)

…ess state (kubernetes-sigs#1906)

…ess state (#1906) (#1923)

…ess state (#1906) (#1925)

…ess state (kubernetes-sigs#1906)

Prevent reconciliation if CSINodeTopology instance is already at Succ…

4988d65

…ess state

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Aug 4, 2022

k8s-ci-robot requested review from divyenpatel and gnufied August 4, 2022 03:07

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 4, 2022

k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Aug 4, 2022

k8s-ci-robot assigned divyenpatel Aug 5, 2022

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 5, 2022

k8s-ci-robot merged commit 31b2939 into kubernetes-sigs:master Aug 5, 2022

shalini-b added a commit to shalini-b/vsphere-csi-driver that referenced this pull request Aug 5, 2022

Prevent reconciliation if CSINodeTopology instance is already at Succ…

5802bad

…ess state (kubernetes-sigs#1906)

shalini-b mentioned this pull request Aug 5, 2022

[Cherry-pick 2.6 #1906] Prevent reconciliation if CSINodeTopology instance is already at Success state #1910

Merged

shalini-b added a commit to shalini-b/vsphere-csi-driver that referenced this pull request Aug 10, 2022

Prevent reconciliation if CSINodeTopology instance is already at Succ…

48812e4

…ess state (kubernetes-sigs#1906)

k8s-ci-robot pushed a commit that referenced this pull request Aug 10, 2022

Prevent reconciliation if CSINodeTopology instance is already at Succ…

76b42de

…ess state (#1906) (#1910)

shalini-b added a commit to shalini-b/vsphere-csi-driver that referenced this pull request Aug 10, 2022

Prevent reconciliation if CSINodeTopology instance is already at Succ…

1c02fed

…ess state (kubernetes-sigs#1906)

shalini-b mentioned this pull request Aug 10, 2022

[Cherry-pick 2.4 #1906] Prevent reconciliation if CSINodeTopology instance is already at Success state #1923

Merged

shalini-b added a commit to shalini-b/vsphere-csi-driver that referenced this pull request Aug 10, 2022

Prevent reconciliation if CSINodeTopology instance is already at Succ…

b086c76

…ess state (kubernetes-sigs#1906)

shalini-b mentioned this pull request Aug 10, 2022

[Cherry-pick 2.5 #1906] Prevent reconciliation if CSINodeTopology instance is already at Success state #1925

Merged

k8s-ci-robot pushed a commit that referenced this pull request Aug 11, 2022

Prevent reconciliation if CSINodeTopology instance is already at Succ…

d4dcf2b

…ess state (#1906) (#1923)

k8s-ci-robot pushed a commit that referenced this pull request Aug 11, 2022

Prevent reconciliation if CSINodeTopology instance is already at Succ…

984ff15

…ess state (#1906) (#1925)

adikul30 pushed a commit to adikul30/vsphere-csi-driver that referenced this pull request Sep 22, 2022

Prevent reconciliation if CSINodeTopology instance is already at Succ…

32a963f

…ess state (kubernetes-sigs#1906)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent reconciliation if CSINodeTopology instance is already at Success state #1906

Prevent reconciliation if CSINodeTopology instance is already at Success state #1906

shalini-b commented Aug 4, 2022 •

edited

Loading

divyenpatel commented Aug 4, 2022

divyenpatel commented Aug 4, 2022

divyenpatel commented Aug 5, 2022

k8s-ci-robot commented Aug 5, 2022

Prevent reconciliation if CSINodeTopology instance is already at Success state #1906

Prevent reconciliation if CSINodeTopology instance is already at Success state #1906

Conversation

shalini-b commented Aug 4, 2022 • edited Loading

divyenpatel commented Aug 4, 2022

divyenpatel commented Aug 4, 2022

divyenpatel commented Aug 5, 2022

k8s-ci-robot commented Aug 5, 2022

shalini-b commented Aug 4, 2022 •

edited

Loading