New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
apf: print watch initialization latency in httplog #105403
Conversation
/assign @wojtek-t @MikeSpreitzer |
if requestInfo, ok := apirequest.RequestInfoFrom(ctx); ok { | ||
watch = strconv.FormatBool(requestInfo.Verb == "watch") | ||
} | ||
apiserverRequestExecutionSeconds.WithContext(ctx).WithLabelValues(priorityLevel, flowSchema, watch).Observe(executionTime.Seconds()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could include the verb in the metric, but that would increase the cardinality. I think for this metric we probably don't care beyond whether it's a watch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are other long-running requests, but they currently do not contribute to this metric --- right? Perhaps we should future-proof by calling this label "long-running"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean as a label in the metric? Yeah - that sgtm.
@tkashem - also - might be useful to split the metrics PR from the logs PR (the metrics will have release note, as suggest above, etc.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we want to future-proof it (assuming that at some point apf may account for long-running requests), then i would suggest renaming the label to type
with values {'regular'|'watch'|'non-watch-long-running'}
(for lack of better terms on my part) so we can differentiate between watch and non-watch long running requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it - let's change the label name to "type" and for now use:
- "regular" [I'm not a fan of this name - but I don't have better alternative]
- "watch"
If at some point we start supporting other calls, we can introduce new values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I split the metrics into a separate PR - #105517 with release notes
@@ -270,7 +272,7 @@ var ( | |||
Buckets: requestDurationSecondsBuckets, | |||
StabilityLevel: compbasemetrics.ALPHA, | |||
}, | |||
[]string{priorityLevel, flowSchema}, | |||
[]string{priorityLevel, flowSchema, "watch"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@logicalhan - I'm assuming we can still modify alpha-stability metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be fine, but we should add a release note about it.
@@ -176,6 +177,15 @@ func WithPriorityAndFairness( | |||
}() | |||
|
|||
execute := func() { | |||
startedAt := time.Now() | |||
defer func() { | |||
// TODO: Once PR#104557 merges, we can remove the 'klog.V(3).Enabled()' check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm - I'm not sure it will ever merge, because of : #104557 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now, it's not static - the filter is always enabled and i am checking runtime for each request if log level 3 is enabled, only then the httplog construct is initialized.
once #104920 merges I will ask David to remove the hold.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think leaving the V(3).Enabled()
check here permanently is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With #104557 merged - let's address this now.
(1) I am starting to get concerned about how much debug information we are putting in the httplog line. Perhaps only do this at a higher (2) I also wonder if we would prefer to get this information as a metric (a histogram of latencies) rather than in log messages. |
it would be nice to have it at all log levels where httplog is enabled, this is the only place where watch initialization latency is recorded in log. (We found an issue with apf watch accounting with aggregated types - please see #105409)
well |
/test pull-kubernetes-e2e-gce-ubuntu-containerd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If nobody else complains about this, it is good enough for me.
Thanks.
/LGTM
/triage accepted |
c9cabc8
to
a18e1c5
Compare
oops, bad rebase, fixing it |
a18e1c5
to
9b21e11
Compare
it's ready for another pass |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: MikeSpreitzer, tkashem, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test pull-kubernetes-unit |
/retest |
What type of PR is this?
/kind bug
What this PR does / why we need it:
show watch initialization latency in
httplog
for watch requestsWhich issue(s) this PR fixes:
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: