Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apf: print watch initialization latency in httplog #105403

Merged
merged 1 commit into from Oct 7, 2021

Conversation

tkashem
Copy link
Contributor

@tkashem tkashem commented Oct 1, 2021

What type of PR is this?

/kind bug

What this PR does / why we need it:

show watch initialization latency in httplog for watch requests

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Oct 1, 2021
@tkashem tkashem changed the title apf: print watch init latency in httplog apf: print watch initialization latency in httplog Oct 1, 2021
@k8s-ci-robot k8s-ci-robot added area/apiserver sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 1, 2021
@tkashem
Copy link
Contributor Author

tkashem commented Oct 1, 2021

/assign @wojtek-t @MikeSpreitzer

if requestInfo, ok := apirequest.RequestInfoFrom(ctx); ok {
watch = strconv.FormatBool(requestInfo.Verb == "watch")
}
apiserverRequestExecutionSeconds.WithContext(ctx).WithLabelValues(priorityLevel, flowSchema, watch).Observe(executionTime.Seconds())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could include the verb in the metric, but that would increase the cardinality. I think for this metric we probably don't care beyond whether it's a watch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are other long-running requests, but they currently do not contribute to this metric --- right? Perhaps we should future-proof by calling this label "long-running"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean as a label in the metric? Yeah - that sgtm.

@tkashem - also - might be useful to split the metrics PR from the logs PR (the metrics will have release note, as suggest above, etc.).

Copy link
Contributor Author

@tkashem tkashem Oct 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we want to future-proof it (assuming that at some point apf may account for long-running requests), then i would suggest renaming the label to type with values {'regular'|'watch'|'non-watch-long-running'} (for lack of better terms on my part) so we can differentiate between watch and non-watch long running requests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it - let's change the label name to "type" and for now use:

  • "regular" [I'm not a fan of this name - but I don't have better alternative]
  • "watch"

If at some point we start supporting other calls, we can introduce new values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I split the metrics into a separate PR - #105517 with release notes

@@ -270,7 +272,7 @@ var (
Buckets: requestDurationSecondsBuckets,
StabilityLevel: compbasemetrics.ALPHA,
},
[]string{priorityLevel, flowSchema},
[]string{priorityLevel, flowSchema, "watch"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@logicalhan - I'm assuming we can still modify alpha-stability metrics?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fine, but we should add a release note about it.

@@ -176,6 +177,15 @@ func WithPriorityAndFairness(
}()

execute := func() {
startedAt := time.Now()
defer func() {
// TODO: Once PR#104557 merges, we can remove the 'klog.V(3).Enabled()' check.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm - I'm not sure it will ever merge, because of : #104557 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now, it's not static - the filter is always enabled and i am checking runtime for each request if log level 3 is enabled, only then the httplog construct is initialized.

once #104920 merges I will ask David to remove the hold.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think leaving the V(3).Enabled() check here permanently is fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With #104557 merged - let's address this now.

@MikeSpreitzer
Copy link
Member

MikeSpreitzer commented Oct 1, 2021

(1) I am starting to get concerned about how much debug information we are putting in the httplog line. Perhaps only do this at a higher -v level (perhaps 4)?

(2) I also wonder if we would prefer to get this information as a metric (a histogram of latencies) rather than in log messages.

@tkashem
Copy link
Contributor Author

tkashem commented Oct 1, 2021

(1) I am starting to get concerned about how much debug information we are putting in the httplog line. Perhaps only do this at a higher -v level (perhaps 4)?

it would be nice to have it at all log levels where httplog is enabled, this is the only place where watch initialization latency is recorded in log. (We found an issue with apf watch accounting with aggregated types - please see #105409)

(2) I also wonder if we would prefer to get this information as a metric (a histogram of latencies) rather than in log messages.

well apiserver_flowcontrol_request_execution_seconds records watch initialization latency, but it is currently indistinguishable from latency of regular requests, this PR adds a new label named watch, it is set to true if it is from watch initialization.

@tkashem
Copy link
Contributor Author

tkashem commented Oct 1, 2021

/test pull-kubernetes-e2e-gce-ubuntu-containerd

Copy link
Member

@MikeSpreitzer MikeSpreitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If nobody else complains about this, it is good enough for me.
Thanks.
/LGTM

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 4, 2021
@fedebongio
Copy link
Contributor

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 5, 2021
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 6, 2021
@tkashem
Copy link
Contributor Author

tkashem commented Oct 6, 2021

oops, bad rebase, fixing it

@k8s-ci-robot k8s-ci-robot added the sig/auth Categorizes an issue or PR as relevant to SIG Auth. label Oct 6, 2021
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. release-note-none Denotes a PR that doesn't merit a release note. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Oct 6, 2021
@tkashem
Copy link
Contributor Author

tkashem commented Oct 6, 2021

it's ready for another pass

@wojtek-t
Copy link
Member

wojtek-t commented Oct 6, 2021

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 6, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MikeSpreitzer, tkashem, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 6, 2021
@tkashem
Copy link
Contributor Author

tkashem commented Oct 6, 2021

/test pull-kubernetes-unit

@MikeSpreitzer
Copy link
Member

/retest

@k8s-ci-robot k8s-ci-robot merged commit 5c3a789 into kubernetes:master Oct 7, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Oct 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants