-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added requestSloLatencies metric #105890
Added requestSloLatencies metric #105890
Conversation
Hi @pawbana. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
f4104f6
to
7efe5d4
Compare
532d6e7
to
5d0a3d7
Compare
0aef7dc
to
0c54cfe
Compare
@@ -263,7 +264,13 @@ func (a *mutatingDispatcher) callAttrMutatingHook(ctx context.Context, h *admiss | |||
} | |||
} | |||
|
|||
if err := r.Do(ctx).Into(response); err != nil { | |||
wd, ok := endpointsrequest.WebhookDurationFrom(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about:
do := func() { err = r.Do(ctx).Into(response)
if wd, ok := endpointsrequest.WebhookDurationFrom(ctx); ok {
tmp := do
do = func() { wd.AdmitTracker.Track(do())
}
do()
} | ||
|
||
func WithWebhookDurationAndCustomClock(parent context.Context, c clock.Clock) context.Context { | ||
return WithValue(parent, webhookDurationKey, WebhookDuration{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would actually store a pointer in the context.
[And then below you just return "nil, false" in WebhookDurationFrom below]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
b1ff221
to
d6bc59e
Compare
staging/src/k8s.io/apiserver/pkg/admission/plugin/webhook/validating/plugin_test.go
Show resolved
Hide resolved
type durationTracker struct { | ||
clock clock.Clock | ||
latency time.Duration | ||
mu *sync.RWMutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't make it a pointer - just make it sync.Lock
[and don't use RWLock - we don't expect any contention on read here and RWLock is more expensive than Lock]
f5918c2
to
dc4a8df
Compare
wd, ok := request.WebhookDurationFrom(ctx) | ||
if !ok { | ||
if test.InitContext { | ||
t.Errorf("expected webhook duraiton to be initialized") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: duration (same below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed.
7ad0c53
to
fc0f7be
Compare
staging/src/k8s.io/apiserver/pkg/admission/plugin/webhook/testing/testcase.go
Show resolved
Hide resolved
staging/src/k8s.io/apiserver/pkg/admission/plugin/webhook/testing/webhook_server.go
Show resolved
Hide resolved
@@ -468,6 +482,10 @@ func MonitorRequest(req *http.Request, verb, group, version, resource, subresour | |||
} | |||
} | |||
requestLatencies.WithContext(req.Context()).WithLabelValues(reportedVerb, dryRun, group, version, resource, subresource, scope, component).Observe(elapsedSeconds) | |||
wd, ok := request.WebhookDurationFrom(req.Context()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit;
if wd, ok := ... ; ok {
@@ -468,6 +482,10 @@ func MonitorRequest(req *http.Request, verb, group, version, resource, subresour | |||
} | |||
} | |||
requestLatencies.WithContext(req.Context()).WithLabelValues(reportedVerb, dryRun, group, version, resource, subresource, scope, component).Observe(elapsedSeconds) | |||
wd, ok := request.WebhookDurationFrom(req.Context()) | |||
if ok { | |||
requestSloLatencies.WithContext(req.Context()).WithLabelValues(reportedVerb, group, version, resource, subresource, scope).Observe(elapsedSeconds - (wd.AdmitTracker.GetLatency() + wd.ValidateTracker.GetLatency()).Seconds()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
sloLatency := elapsedSecond - (wd.AdmiTracker.GetLatency() + wd.ValidateTracker.GetLatency()).Seconds()
requestSLOLatencies...
fc0f7be
to
2446939
Compare
2446939
to
0afa569
Compare
/lgtm Thanks! |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: pawbana, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…available in <1.23.0-0 clusters In upstream Kubernetes (specifically in this commit kubernetes/apiserver@9144ea1), starting k8s 1.23 a new version of the apiserver_request_duration_seconds metric was released called apiserver_request_slo_duration_seconds. As noted in the pull request kubernetes/kubernetes#105890, the fundamental difference between the newer and older metric is that the newer metric excludes the processing time spent by webhooks that the API server calls out to when configured as part of a MutatingWebhookConfiguration or ValidatingWebhookConfiguration. However, since this metric was only introduced in k8s 1.23, this breaks the API Server dashboards in clusters <1.23.0 since that metric does not exist. Therefore, to fix this issue in older clusters using the latest Monitoring chart, specific logic has been added to the chart to utilize the older version of the metric when the Kubernetes version is detected to be <1.23.
…available in <1.23.0-0 clusters In upstream Kubernetes (specifically in this commit kubernetes/apiserver@9144ea1), starting k8s 1.23 a new version of the apiserver_request_duration_seconds metric was released called apiserver_request_slo_duration_seconds. As noted in the pull request kubernetes/kubernetes#105890, the fundamental difference between the newer and older metric is that the newer metric excludes the processing time spent by webhooks that the API server calls out to when configured as part of a MutatingWebhookConfiguration or ValidatingWebhookConfiguration. However, since this metric was only introduced in k8s 1.23, this breaks the API Server dashboards in clusters <1.23.0 since that metric does not exist. Therefore, to fix this issue in older clusters using the latest Monitoring chart, specific logic has been added to the chart to utilize the older version of the metric when the Kubernetes version is detected to be <1.23.
What type of PR is this?
/kind feature
What this PR does / why we need it:
Current apiserver_request_duration_* metrics measure whole request duration including time spent processing webhooks. Webhook duration is mostly dependant on user configuration de so they should not be counted when considering request latency SLOs.
Which issue(s) this PR fixes:
This PR introduces better metric for measuring request latency SLOs.
Special notes for your reviewer:
Does this PR introduce a user-facing change?