Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

track latency incurred in various layers of apiserver #107910

Merged
merged 2 commits into from Mar 25, 2022

Conversation

tkashem
Copy link
Contributor

@tkashem tkashem commented Feb 1, 2022

What type of PR is this?

/kind feature

What this PR does / why we need it:

  • track latency incurred in various components within the apiserver, for example, storage layer, transformation of response object, http.ResponseWriter.Write and admission webhooks
  • and annotate the latency in audit event
{
  "kind": "Event",
  "apiVersion": "audit.k8s.io/v1",
  "level": "Metadata",
  "stage": "ResponseComplete",
  
  "requestReceivedTimestamp": "2022-03-24T21:38:52.741392Z",
  "stageTimestamp": "2022-03-24T21:38:52.743090Z",
  "annotations": {
    "apiserver.latency.k8s.io/etcd": "1.138415ms",
    "apiserver.latency.k8s.io/response-write": "415ns",
    "apiserver.latency.k8s.io/serialize-response-object": "28.677µs",
    "apiserver.latency.k8s.io/total": "1.698553ms",
    "apiserver.latency.k8s.io/transform-response-object": "1.41µs",
}

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/apiserver labels Feb 1, 2022
@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 1, 2022
@tkashem tkashem force-pushed the latency-breakdown branch 3 times, most recently from d5cd2b2 to 740bb81 Compare February 2, 2022 19:43
@tkashem tkashem changed the title [WIP] track latency track latency incurred in various layers of apiserver Feb 2, 2022
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 2, 2022
@tkashem
Copy link
Contributor Author

tkashem commented Feb 2, 2022

/assign @deads2k @wojtek-t @dgrisonnet

@tkashem
Copy link
Contributor Author

tkashem commented Feb 2, 2022

@wojtek-t Is it possible to decorate from the watch cache layer and down?

@tkashem tkashem force-pushed the latency-breakdown branch 2 times, most recently from bc6611c to 7eb7f54 Compare February 3, 2022 17:34
@fedebongio
Copy link
Contributor

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 3, 2022
@tkashem
Copy link
Contributor Author

tkashem commented Feb 3, 2022

/retest

// NewETCDLatencyTracker returns an implementation of
// clientv3.KV that times the calls from the specified
// 'delegate' KV instance in order to track latency incurred.
func NewETCDLatencyTracker(delegate clientv3.KV) clientv3.KV {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of tapping into storage.Interface, I am now tapping into etcd3 client KV so we can measure latency from etcd3 round trips only

}

type clientV3TxnTracker struct {
ctx context.Context
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blech. Exists in the etcd client though, so I think we're stuck.

@deads2k
Copy link
Contributor

deads2k commented Mar 25, 2022

/lgtm

@deads2k
Copy link
Contributor

deads2k commented Mar 25, 2022

/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 25, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, tkashem

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 25, 2022
@tkashem
Copy link
Contributor Author

tkashem commented Mar 25, 2022

Test started today at 9:59 AM failed after 46m30s.
133/133 Tests Passed!

a timeout in some other layer, may be in collection of artifatcs?

/test pull-kubernetes-integration

@aojea
Copy link
Member

aojea commented Mar 25, 2022

a timeout in some other layer, may be in collection of artifatcs?

I think this PR is doing something to the etcd client that is making it fail

/hold

https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/107910/pull-kubernetes-integration/1507356118538522624/build-log.txt

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 25, 2022
@aojea
Copy link
Member

aojea commented Mar 25, 2022

a timeout in some other layer, may be in collection of artifatcs?

I think this PR is doing something to the etcd client that is making it fail

/hold

https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/107910/pull-kubernetes-integration/1507356118538522624/build-log.txt

now it went through, let me give it another shot to see if that was unrelated to this PR, I will unhold

Sorry for holding it but this job is very sensible to things that impact apiserver resource usage, I just wanted to be completely sure 😄

/test pull-kubernetes-integration


annotations := map[string]string{}
if latency := tracker.TransformTracker.GetLatency(); latency != 0 {
annotations["apiserver.latency.k8s.io/transform-response-object"] = latency.String()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the late nit: can we use consts for these strings and use those when we retrieve values? I see spots in the PR where we assume that the key is there if the annotations map is returned but a small bug in the key could cause a NPE.

Copy link
Contributor Author

@tkashem tkashem Mar 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@logicalhan we don't retrieve by key name where the annotations are returned. it added "apiserver.latency.k8s.io/total" key to the annotations map if it wan non-empty, but I changed it a bit so that the annotations map returned is not mutated (that seems a better pattern)
With this i don't see any need to use const since these key names are are not referred to anywhere else.

        // record the total latency for this request, for convenience.
-       layerLatencies["apiserver.latency.k8s.io/total"] = latency.String()
+       audit.LogAnnotation(ev, "apiserver.latency.k8s.io/total", latency.String())
        for k, v := range layerLatencies {
                audit.LogAnnotation(ev, k, v)
        }

Copy link
Contributor Author

@tkashem tkashem Mar 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but if you think having these key names in one place is more readable, I can make this change :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nah it's fine then, thanks!
/lgtm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@logicalhan the keys have been moved to consts now, ptal

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!!

@aojea
Copy link
Member

aojea commented Mar 25, 2022

@tkashem but you have a new comment #107910 (comment)

you are free to unhold, the job has started to be unstable since a few hours , so is not because of this PR
Thanks

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 25, 2022
@tkashem
Copy link
Contributor Author

tkashem commented Mar 25, 2022

i did a force push, this is the only change.

        // record the total latency for this request, for convenience.
-       layerLatencies["apiserver.latency.k8s.io/total"] = latency.String()
+       audit.LogAnnotation(ev, "apiserver.latency.k8s.io/total", latency.String())
        for k, v := range layerLatencies {
                audit.LogAnnotation(ev, k, v)
        }

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 25, 2022
@aojea
Copy link
Member

aojea commented Mar 25, 2022

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 25, 2022
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 25, 2022
Copy link
Member

@logicalhan logicalhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 25, 2022
@k8s-ci-robot k8s-ci-robot merged commit 84bd51c into kubernetes:master Mar 25, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.24 milestone Mar 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants