Add metrics for cleanup controller #399

justinblalock87 · 2023-07-30T19:35:59Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

Adds metrics to the local PV node cleanup controller.

Release note:

Add metrics to local PV node cleanup controller.

k8s-ci-robot · 2023-07-30T19:36:07Z

Hi @justinblalock87. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

justinblalock87 · 2023-07-31T16:12:17Z

/cc @msau42

msau42 · 2023-07-31T17:55:27Z

/ok-to-test

msau42 · 2023-07-31T18:25:55Z

pkg/metrics/node-cleanup/metrics.go

+
+var (
+	// APIServerRequestsTotal is used to collect accumulated count of apiserver requests.
+	APIServerRequestsTotal = prometheus.NewCounterVec(


It looks like client-go may already have metrics we can leverage: https://github.com/kubernetes/client-go/blob/master/tools/metrics/metrics.go. workqueue also has metrics that may be good to gather.

I'm not 100% sure, but I think you can enable those + other standard metrics with k8s.io/component-base/metrics. Example: https://github.com/kubernetes-csi/csi-lib-utils/blob/85029276ff37ed0f2dce8fd14353f058bc527dc9/metrics/metrics.go#L30C3-L30C32

If you import client-go metrics I think you can remove this one:
https://github.com/kubernetes/kubernetes/blob/99190634ab252604a4496882912ac328542d649d/cmd/kube-proxy/proxy.go#L24

pkg/metrics/node-cleanup/metrics.go

msau42 · 2023-07-31T18:29:38Z

pkg/metrics/node-cleanup/metrics.go

+		},
+	)
+	// PersistentVolumeClaimDeleteFailedTotal is used to collect accumulated count of persistent volume claim delete failed attempts.
+	PersistentVolumeClaimDeleteFailedTotal = prometheus.NewCounter(


Do you also want a pvc delete total metric?

There is a pvc delete total metric above, do you mean pv delete total? I don't have pv delete total because since the PV Deleter runs on a set interval, if the interval is too low, it calls the apiserver to delete the PV multiple times, skewing the metrics.

Sorry I mean PV delete total. I guess the issue is that the informer lister may be delayed at removing the entry after the delete API call.

One solution we have done in the past, is to create a 2nd level cache. And when we delete the API objects, we'll delete it from the 2nd level cache too: https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner/blob/master/pkg/cache/cache.go

Given that the cache itself may take a while to implement and the current time crunch, can I add a TODO in the controller to implement the cache. I'm not sure that it is feasible with the time constraints.

Sounds fine. In the meantime, I would go ahead and add the metric in and document the caveat of having a short sync period.

justinblalock87 · 2023-07-31T23:48:25Z

/kind feature

justinblalock87 · 2023-08-01T16:57:34Z

/remove-kind bug

msau42 · 2023-08-01T22:35:30Z

/lgtm
/approve

k8s-ci-robot · 2023-08-01T22:35:36Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: justinblalock87, msau42

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [msau42]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Add metrics for cleanup controller

40504d1

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 30, 2023

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 30, 2023

k8s-ci-robot requested review from mauriciopoppe and wongma7 July 30, 2023 19:36

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 30, 2023

k8s-ci-robot requested a review from msau42 July 31, 2023 16:12

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 31, 2023

msau42 reviewed Jul 31, 2023

View reviewed changes

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 31, 2023

justinblalock87 force-pushed the cleanup-metrics branch from 258c1f6 to 5f99ede Compare August 1, 2023 16:45

justinblalock87 requested a review from msau42 August 1, 2023 16:51

k8s-ci-robot removed the kind/bug Categorizes issue or PR as related to a bug. label Aug 1, 2023

Add clientgo metrics to node cleanup controlelr

c2b32d4

justinblalock87 force-pushed the cleanup-metrics branch from 5f99ede to c2b32d4 Compare August 1, 2023 21:12

k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 1, 2023

k8s-ci-robot assigned msau42 Aug 1, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 1, 2023

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 1, 2023

k8s-ci-robot merged commit a1c19d5 into kubernetes-sigs:master Aug 1, 2023
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add metrics for cleanup controller #399

Add metrics for cleanup controller #399

justinblalock87 commented Jul 30, 2023 •

edited

k8s-ci-robot commented Jul 30, 2023

justinblalock87 commented Jul 31, 2023

msau42 commented Jul 31, 2023

msau42 Jul 31, 2023

justinblalock87 Aug 1, 2023

msau42 Aug 1, 2023 •

edited

justinblalock87 Aug 1, 2023

msau42 Jul 31, 2023

justinblalock87 Jul 31, 2023

msau42 Jul 31, 2023

justinblalock87 Jul 31, 2023

msau42 Aug 1, 2023

justinblalock87 Aug 1, 2023

justinblalock87 commented Jul 31, 2023 •

edited

justinblalock87 commented Aug 1, 2023

msau42 commented Aug 1, 2023

k8s-ci-robot commented Aug 1, 2023

Add metrics for cleanup controller #399

Add metrics for cleanup controller #399

Conversation

justinblalock87 commented Jul 30, 2023 • edited

k8s-ci-robot commented Jul 30, 2023

justinblalock87 commented Jul 31, 2023

msau42 commented Jul 31, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

msau42 Aug 1, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

justinblalock87 commented Jul 31, 2023 • edited

justinblalock87 commented Aug 1, 2023

msau42 commented Aug 1, 2023

k8s-ci-robot commented Aug 1, 2023

justinblalock87 commented Jul 30, 2023 •

edited

msau42 Aug 1, 2023 •

edited

justinblalock87 commented Jul 31, 2023 •

edited