New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add metrics for aggregated discovery #115630
Add metrics for aggregated discovery #115630
Conversation
/triage accepted |
4b1a190
to
3cbb5cd
Compare
The changelog entry doesn't look right. Try this: -Yes, additional metrics are published for aggregated discovery
+Changed metrics for aggregated discovery to publish new time series (alpha). |
ack, updated. |
/cc @deads2k |
staging/src/k8s.io/apiserver/pkg/endpoints/discovery/aggregated/etag.go
Outdated
Show resolved
Hide resolved
staging/src/k8s.io/apiserver/pkg/endpoints/discovery/aggregated/metrics.go
Outdated
Show resolved
Hide resolved
a48d511
to
610d1d0
Compare
92a29e5
to
29e6d68
Compare
StabilityLevel: metrics.ALPHA, | ||
}, | ||
) | ||
regenerationDurationGauge = metrics.NewGauge( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need both? Doesn't the gauge also count the number of occurences? I think we discussed that, sorry if I keep forgetting 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes both are needed. https://pkg.go.dev/github.com/prometheus/client_golang/prometheus?utm_source=godoc#Gauge, I was planning to use this to record only the latest aggregation duration.
You might be thinking of histogram, and I'm not sure if that's needed for this. It seems more suitable for counting request count/duration rather than aggregation count/duration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gauge seems like the wrong choice here? Aren't we trying to gather duration that we'll want to get averages/median times?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a sense. Gathering durations for something like api requests make sense because we have multiple request for the same discovery doc so they can be bucketed.
For reaggregation, an unchanged discovery doc will only have one data point. Realistically reaggregation only happen when CRDs are modified and the time should be proportional to the # custom resources. I think we really only care about the latest time after all the CRDs are installed (final state). Taking the average doesn't make as much sense because we don't really care about the duration for partial states (eg: half the CRDs are applied).
(I'm using CRD/aggregated apiservers synonymously but the same logic applies to both)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline, dropping the duration metric since the number is negligible and we will have pretty good visibility with the regeneration counter as well as the request duration instrumentation.
aee1c51
to
02a110f
Compare
thanks |
LGTM label has been added. Git tree hash: 8dcb607fdeed872020cfdeec3a8729728d3bd392
|
/test pull-kubernetes-integration |
staging/src/k8s.io/apiserver/pkg/endpoints/discovery/aggregated/handler.go
Outdated
Show resolved
Hide resolved
just the question about component, lgtm otherwise. |
02a110f
to
387d976
Compare
Updated, thanks! |
/lgtm |
LGTM label has been added. Git tree hash: e7418e0975caaf033e2c21ba45eea3de81c045f2
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: apelisse, deads2k, Jefftree The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@Jefftree: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest |
/kind feature
What this PR does / why we need it:
Add metrics for aggregated discovery. Adds metrics for # of requests split by status code, number of times discovery cache was aggregated, and the duration for aggregation.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
/cc @apelisse @alexzielenski