Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set health annotation to Inaccessible when it is not set #852

Merged
merged 1 commit into from
May 17, 2021

Conversation

xing-yang
Copy link
Contributor

What this PR does / why we need it:
If HealthStatus is not set by SPBM, it implies the volume does not exist any more.
Set health annotation to "Inaccessible" so that the caller can make appropriate reactions based on this status.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Special notes for your reviewer:

Release note:

Set health annotation to Inaccessible if it is not set by SPBM as it implies volume does not exist.

When health status is not set by SPBM, set health annotation
to Inaccessible.
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 11, 2021
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels May 11, 2021
@xing-yang
Copy link
Contributor Author

jtest wcp

@xing-yang
Copy link
Contributor Author

jtest gc

@svcbot-qecnsdp
Copy link

Started WCP block pipeline...

@xing-yang
Copy link
Contributor Author

jtest block-vanilla

@svcbot-qecnsdp
Copy link

Started GC block pipeline...

Copy link
Member

@divyenpatel divyenpatel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@divyenpatel
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label May 12, 2021
@@ -351,6 +351,10 @@ func ConvertVolumeHealthStatus(volHealthStatus string) (string, error) {
case string(pbmtypes.PbmHealthStatusForEntityUnknown):
return string(pbmtypes.PbmHealthStatusForEntityUnknown), nil
default:
return "", fmt.Errorf("cannot convert invalid volume health status %s", volHealthStatus)
// NOTE: volHealthStatus is not set by SPBM in this case.
// This implies the volume does not exist any more.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Volume not existing is different to saying the volume health is inaccessible. Isn't this a bug in SPBM?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a discussion with @subramanian-neelakantan on this. In this case, FCD is missing from inventory so SPBM will not populate the health status field. According to Subbu, this is a permanent failure case so the caller would want to do some reaction to fix it as it actually means the volume is "inaccessible". So this should be treated differently from "Unknown" which is a temporary failure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPBM is the component that should report the volume health correctly. I did not quite understand why SPBM should not be fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are the notes after the offline meeting on this topic. We agreed that the volume health can be set to "inaccessible" whenever CNS says it is "red" or "" (missing). PSP operators expect the Volume health to be eventually consistent. They also have the health timestamp to allow waiting for a "long" time (like an hour) before taking corrective actions based on the Volume health. So it is a correct design choice to reflect the current CNS health snapshot on the PVCs.

@svcbot-qecnsdp
Copy link

Started Vanilla block pipeline...

@svcbot-qecnsdp
Copy link

Block vanilla build status: FAILURE 
Stage before exit: testbed-deploy 

@xing-yang
Copy link
Contributor Author

jtest block-vanilla

@svcbot-qecnsdp
Copy link

Started Vanilla block pipeline...

@svcbot-qecnsdp
Copy link

GC build status: FAILURE 
Stage before exit: deploy-gc-testbed 

@svcbot-qecnsdp
Copy link

WCP build status: SUCCESS 
Stage before exit: finally 
Jenkins E2E Test Results: 
SSS
Ran 32 of 173 Specs in 6243.299 seconds
SUCCESS! -- 32 Passed | 0 Failed | 0 Pending | 141 Skipped
PASS

Ginkgo ran 1 suite in 1h44m26.095101518s
Test Suite Passed
make: Leaving directory `/home/worker/workspace/github-csi-wcp-CICD/Results/641/vsphere-csi-driver`

@svcbot-qecnsdp
Copy link

Block vanilla build status: FAILURE 
Stage before exit: e2e-tests 
Jenkins E2E Test Results: 
Ran 43 of 173 Specs in 16461.073 seconds
FAIL! -- 31 Passed | 12 Failed | 0 Pending | 130 Skipped
--- FAIL: TestE2E (16461.14s)
FAIL

Ginkgo ran 1 suite in 4h34m43.444085418s
Test Suite Failed
make: Leaving directory `/home/worker/workspace/github-csi-block-vanilla/Results/508/vsphere-csi-driver`

@xing-yang
Copy link
Contributor Author

jtest gc

@svcbot-qecnsdp
Copy link

Started GC block pipeline...

@xing-yang
Copy link
Contributor Author

jtest block-vanilla

@svcbot-qecnsdp
Copy link

Started Vanilla block pipeline...

@svcbot-qecnsdp
Copy link

GC build status: FAILURE 
Stage before exit: deploy-gc-testbed 

@xing-yang
Copy link
Contributor Author

jtest gc

@svcbot-qecnsdp
Copy link

Started GC block pipeline...

@svcbot-qecnsdp
Copy link

GC build status: FAILURE 
Stage before exit: deploy-gc-testbed 

@svcbot-qecnsdp
Copy link

Block vanilla build status: FAILURE 
Stage before exit: e2e-tests 
Jenkins E2E Test Results: 
Ran 43 of 173 Specs in 11693.760 seconds
FAIL! -- 42 Passed | 1 Failed | 0 Pending | 130 Skipped
--- FAIL: TestE2E (11693.84s)
FAIL

Ginkgo ran 1 suite in 3h15m22.370581471s
Test Suite Failed
make: Leaving directory `/home/worker/workspace/github-csi-block-vanilla@2/Results/511/vsphere-csi-driver`

@xing-yang
Copy link
Contributor Author

jtest gc

@svcbot-qecnsdp
Copy link

Started GC block pipeline...

@svcbot-qecnsdp
Copy link

GC build status: FAILURE 
Stage before exit: deploy-gc-testbed 

@subramanian-neelakantan
Copy link
Contributor

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: divyenpatel, subramanian-neelakantan, xing-yang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [divyenpatel,xing-yang]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@divyenpatel
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 17, 2021
@k8s-ci-robot k8s-ci-robot merged commit 713f989 into kubernetes-sigs:master May 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants