-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set volume health to inaccessible when PVC not found in CNS #944
Conversation
jtest wcp |
jtest gc |
Started WCP block pipeline... |
Started GC block pipeline... |
|
jtest wcp |
Started WCP block pipeline... |
|
|
Started WCP block pipeline... |
|
Started WCP block pipeline... |
|
Started WCP block pipeline... |
|
Started WCP block pipeline... |
|
Started WCP block pipeline... |
|
Started WCP block pipeline... |
|
Started WCP block pipeline... |
|
Started WCP block pipeline... |
|
@xing-yang Could you run some test to make sure when PVC not found in CNS, the volume health status is set to inaccessible? |
This is a rare corner case. I don't know how to reproduce this case. Any suggestions? |
Started WCP block pipeline... |
|
In the code comment, I saw "When a Datastore is removed from VC (like vSAN direct disk decommisson with noAction does)" then PVC will not be found in CNS. So is there any bug that this PR try to fix? Maybe we can try to delete the FCD disk out of band to simulate the case that PVC not found in CNS? |
We have two cases to verify manually. Assuming that you can change the frequency of volume health poller from the default 5 mins, here are the two cases: Run tests for both cases as follows.
|
Started WCP block pipeline... |
|
Started WCP block pipeline... |
|
Manually ran the following test: Default volume heath sync is 2 min and full sync is also 2 min
Created a new volume, no health status yetkubectl describe pvc -n e2e-test-namespace Name: block-pvc Normal ExternalProvisioning 25s persistentvolume-controller waiting for a volume to be created, either by external provisioner "csi.vsphere.vmware.com" or manually created by system administrator Health status shows up nowkubectl describe pvc -n e2e-test-namespace Name: block-pvc Corresponding logs: 2021-06-07T19:38:08.528Z DEBUG syncer/volume_health.go:122 updateVolumeHealthStatus: update volume health annotation for pvc e2e-test-namespace/block-pvc from old value to new value accessible {"TraceId": "3f7dc2ef-843a-476a-b1cb-0ec6515b55fc"} Delete FCD at 19:44:23 UTC 2021; Health status stayed as accessible until it is updated laterkubectl describe pvc -n e2e-test-namespace Name: block-pvc Changed to “inaccessible” 4 minutes after FCD is deletedIt coincided with the ending of a FullSync period. kubectl describe pvc -n e2e-test-namespace Name: block-pvc Corresponding logs: 2021-06-07T19:48:07.994Z WARN syncer/fullsync.go:367 could not find any volume which is present in both k8s and in CNS {"TraceId": "0d895233-57fa-4639-9db9-d4201a3a8572"} 2021-06-07T19:48:07.998Z INFO syncer/fullsync.go:430 FullSync: Volume with id: "bd9536ef-6696-4383-a5ba-aefa2a56dd84" and name: "pvc-af778591-5423-43ea-8731-99142edb54b9" is added to cnsCreationMap {"TraceId": "0d895233-57fa-4639-9db9-d4201a3a8572"} 2021-06-07T19:48:08.000Z INFO syncer/fullsync.go:137 FullSync: end {"TraceId": "0d895233-57fa-4639-9db9-d4201a3a8572"} 2021-06-07T19:48:08.006Z DEBUG syncer/volume_health.go:122 updateVolumeHealthStatus: update volume health annotation for pvc e2e-test-namespace/block-pvc from old value accessible to new value inaccessible {"TraceId": "0d895233-57fa-4639-9db9-d4201a3a8572"} |
@subramanian-neelakantan @SandeepPissay @lipingxue Comments are addressed. PTAL. |
Started WCP block pipeline... |
|
Looks good to me. /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: subramanian-neelakantan, xing-yang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes looks good to me. I have few comments on the logging.
@SandeepPissay comments are addressed. PTAL. |
/lgtm |
What this PR does / why we need it:
Set volume health to inaccessible when PVC not found in CNS
Which issue this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged): fixes #Testing done:
https://container-dp.svc.eng.vmware.com/view/Pre-Checkin-CSI/job/csi-wcp-pre-check-in/29/testReport/(root)/CNS%20CSI%20Driver%20End-to-End%20Tests/
13 tests passed, 1 failed.
CNS CSI Driver End-to-End Tests.Basic Static Provisioning [csi-supervisor] Verify static provisioning workflow - when DuplicateFCD is used
/home/worker/workspace/csi-wcp-pre-check-in@2/Results/29/vsphere-csi-driver/tests/e2e/csi_static_provisioning_basic.go:882
Unexpected error:
<*errors.errorString | 0xc000eaed10>: {
s: "PersistentVolume static-pv-561ef975-ada8-4c07-a801-5a5a5c78f4ad still exists within 3m0s",
}
PersistentVolume static-pv-561ef975-ada8-4c07-a801-5a5a5c78f4ad still exists within 3m0s
occurred
/home/worker/workspace/csi-wcp-pre-check-in@2/Results/29/vsphere-csi-driver/tests/e2e/csi_static_provisioning_basic.go:959
The test failure is not caused by the change.
Special notes for your reviewer:
Release note: