Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: add CSIVolumeHealth feature and gate #99284

Merged

Conversation

fengzixu
Copy link
Contributor

@fengzixu fengzixu commented Feb 21, 2021

What type of PR is this?

/kind feature

What this PR does / why we need it:

why we need it

According to the change of kep, we decide to move the responsibility (checking volume condition and reporting abnormal volume event to pod ) from external-health-monitor-agent to kubelet.

what this PR does

  1. Modify GetMetrics function to return the volumeStats objet which includes the volumeCondition info
  2. Sending event to the corresponding pod which is using an abnormal volume
  3. Make compatibility change on functions which called GetMetrics function

Which issue(s) this PR fixes:

Not yet.

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Enables Kubelet to check volume condition and log events to corresponding pods.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/pull/2286

@k8s-ci-robot k8s-ci-robot added release-note kind/feature size/L cncf-cla: yes do-not-merge/needs-sig needs-triage needs-ok-to-test needs-priority labels Feb 21, 2021
@k8s-ci-robot
Copy link
Contributor

@k8s-ci-robot k8s-ci-robot commented Feb 21, 2021

Hi @fengzixu. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added area/kubelet sig/apps sig/instrumentation sig/node sig/storage and removed do-not-merge/needs-sig labels Feb 21, 2021
@k8s-ci-robot k8s-ci-robot requested review from brendandburns and dims Feb 21, 2021
@fengzixu
Copy link
Contributor Author

@fengzixu fengzixu commented Feb 21, 2021

/assign @xing-yang @NickrenREN @fengzixu

@NickrenREN
Copy link
Member

@NickrenREN NickrenREN commented Feb 22, 2021

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted and removed needs-triage labels Feb 22, 2021
@xing-yang
Copy link
Contributor

@xing-yang xing-yang commented Feb 22, 2021

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test and removed needs-ok-to-test labels Feb 22, 2021
@fengzixu fengzixu force-pushed the support-external-health-monitor branch from e862e69 to bd8748c Compare Feb 22, 2021
@ehashman ehashman added this to Triage in SIG Node PR Triage Feb 22, 2021
@fengzixu
Copy link
Contributor Author

@fengzixu fengzixu commented Feb 23, 2021

/test pull-kubernetes-e2e-kind

@k8s-ci-robot k8s-ci-robot added size/L size/XXL and removed size/L labels Mar 9, 2021
@fengzixu
Copy link
Contributor Author

@fengzixu fengzixu commented Mar 9, 2021

This was hard to review for me because there is so much noise in the diff due to adding the EventRecorder and renaming stat to metrics. It is pretty difficult to extract the actual KEP related functional change from this PR.

Please break this out into 2-3 commits which isolates the actual KEP work from the non-functional changes.

Maybe:

  • rename stat to metrics
  • add EventRecorder to ResourceAnalyzer
  • add CSIVolumeHealth feature

Just thinking about the feature versions of ourselves and having a useful change history

Thanks for your suggestion. I split this PR into 2 commits. Because add CSIVolumeHealth feature needs eventRecorder support.

@fengzixu
Copy link
Contributor Author

@fengzixu fengzixu commented Mar 9, 2021

@sjenning You can check it now

@fengzixu fengzixu force-pushed the support-external-health-monitor branch from 866f5d3 to f87d229 Compare Mar 9, 2021
@k8s-ci-robot k8s-ci-robot added size/L and removed size/XXL labels Mar 9, 2021
@@ -74,38 +86,53 @@ func (c *fakeCsiDriverClient) NodeGetInfo(ctx context.Context) (
}

func (c *fakeCsiDriverClient) NodeGetVolumeStats(ctx context.Context, volID string, targetPath string) (
usageCountMap *volume.Metrics, err error) {
usageCountMap *volume.Stats, err error) {
Copy link
Contributor

@xing-yang xing-yang Mar 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's still renaming in the first commit

@fengzixu fengzixu force-pushed the support-external-health-monitor branch from f87d229 to fdc348a Compare Mar 9, 2021
1. add EventRecorder to ResourceAnalyzer
2. add CSIVolumeHealth feature and gate
@fengzixu fengzixu force-pushed the support-external-health-monitor branch from fdc348a to edc1c62 Compare Mar 9, 2021
@sjenning
Copy link
Contributor

@sjenning sjenning commented Mar 9, 2021

/approve

@k8s-ci-robot
Copy link
Contributor

@k8s-ci-robot k8s-ci-robot commented Mar 9, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fengzixu, msau42, sjenning

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved label Mar 9, 2021
@msau42
Copy link
Member

@msau42 msau42 commented Mar 9, 2021

/milestone v1.21

@k8s-ci-robot k8s-ci-robot added this to the v1.21 milestone Mar 9, 2021
@xing-yang
Copy link
Contributor

@xing-yang xing-yang commented Mar 9, 2021

/test pull-kubernetes-integration

@k8s-ci-robot k8s-ci-robot removed the needs-rebase label Mar 9, 2021
@xing-yang
Copy link
Contributor

@xing-yang xing-yang commented Mar 9, 2021

/test pull-kubernetes-integration

@k8s-ci-robot
Copy link
Contributor

@k8s-ci-robot k8s-ci-robot commented Mar 9, 2021

@fengzixu: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubernetes-e2e-azure-file-windows-containerd fd4d3b1 link /test pull-kubernetes-e2e-azure-file-windows-containerd

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@xing-yang
Copy link
Contributor

@xing-yang xing-yang commented Mar 9, 2021

/test pull-kubernetes-integration

@xing-yang
Copy link
Contributor

@xing-yang xing-yang commented Mar 9, 2021

Test failure is not related to this change.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label Mar 9, 2021
@xing-yang
Copy link
Contributor

@xing-yang xing-yang commented Mar 9, 2021

@fengzixu Please rename the PR to: feature: add CSIVolumeHealth feature and gate

@fengzixu fengzixu changed the title feature: sending corresponding event of abnormal volume in kubelet feature: add CSIVolumeHealth feature and gate Mar 9, 2021
@k8s-ci-robot k8s-ci-robot merged commit dcb3c56 into kubernetes:master Mar 10, 2021
18 checks passed
SIG Node PR Triage automation moved this from Needs Reviewer to Done Mar 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved area/kubelet cncf-cla: yes kind/feature lgtm ok-to-test priority/important-soon release-note sig/apps sig/instrumentation sig/node sig/storage size/L triage/accepted
Projects
Development

Successfully merging this pull request may close these issues.

None yet