Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add authentication overall latency metrics #82409

Conversation

@RainbowMango
Copy link
Member

RainbowMango commented Sep 6, 2019

What type of PR is this?
/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes:
Part of #81028.

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Added metrics 'authentication_latency_seconds' that can be used to understand the latency of authentication.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

/assign @mikedanese
/cc @enj
/sig auth

@RainbowMango

This comment has been minimized.

Copy link
Member Author

RainbowMango commented Sep 6, 2019

/retest

@RainbowMango RainbowMango force-pushed the RainbowMango:pr_add_authentication_overall_latency_metrics branch from 48ac11b to 0ea2476 Sep 6, 2019
Copy link
Member

mikedanese left a comment

Minor comments, otherwise this looks good. Thanks!

@RainbowMango RainbowMango force-pushed the RainbowMango:pr_add_authentication_overall_latency_metrics branch 3 times, most recently from 65a03ec to e71827b Sep 9, 2019
&metrics.HistogramOpts{
Name: "authentication_duration_seconds",
Help: "Authentication duration in seconds broken out by result.",
Buckets: prometheus.ExponentialBuckets(0.001, 2, 10),

This comment has been minimized.

Copy link
@enj

enj Sep 9, 2019

Member

This means the largest bucket is only just over a second right? I can see webhook tail latency being much higher. 15 steps would give you over 30 seconds which is generally the timeout for a request. Granted I forget what we set the timeout for webhooks to.

This comment has been minimized.

Copy link
@RainbowMango

RainbowMango Sep 9, 2019

Author Member

Thanks for your information.
I can't agree more.

@RainbowMango RainbowMango force-pushed the RainbowMango:pr_add_authentication_overall_latency_metrics branch 2 times, most recently from 79aacf2 to 34c28f2 Sep 9, 2019
Copy link
Contributor

logicalhan left a comment

/lgtm

on the metrics-side of things.

Add alpha tags to authentication_attempts explicitly.
@RainbowMango RainbowMango force-pushed the RainbowMango:pr_add_authentication_overall_latency_metrics branch from 34c28f2 to 0c0d69e Sep 18, 2019
@k8s-ci-robot k8s-ci-robot removed the lgtm label Sep 18, 2019
@RainbowMango

This comment has been minimized.

Copy link
Member Author

RainbowMango commented Sep 18, 2019

@enj @mikedanese
Please take a look.

Minor update after last time review:
Update prometheus.ExponentialBuckets to metrics.ExponentialBuckets, because metrics has provided a wrapper for bucket functionality.

@RainbowMango

This comment has been minimized.

Copy link
Member Author

RainbowMango commented Sep 18, 2019

/test pull-kubernetes-integration

@RainbowMango

This comment has been minimized.

Copy link
Member Author

RainbowMango commented Sep 18, 2019

/test pull-kubernetes-integration

fake test.

@mikedanese

This comment has been minimized.

Copy link
Member

mikedanese commented Sep 19, 2019

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm label Sep 19, 2019
&metrics.HistogramOpts{
Name: "authentication_duration_seconds",
Help: "Authentication duration in seconds broken out by result.",
Buckets: metrics.ExponentialBuckets(0.001, 2, 15),

This comment has been minimized.

Copy link
@mikedanese

mikedanese Sep 19, 2019

Member

This bucketing might be a little abrasive depending on the actual latency. We may need to adjust.

This comment has been minimized.

Copy link
@RainbowMango

RainbowMango Sep 20, 2019

Author Member

Could you give me some specific instructions?

This comment has been minimized.

Copy link
@mikedanese

mikedanese Sep 20, 2019

Member

If I read this right, the buckets are:

n | 0.001 n^2
1 | 0.001
2 | 0.004
3 | 0.009
4 | 0.016
5 | 0.025
6 | 0.036
7 | 0.049
8 | 0.064
9 | 0.081
10 | 0.1
11 | 0.121
12 | 0.144
13 | 0.169
14 | 0.196
15 | 0.225

What if authentication takes a second or ten seconds? It'll get put in the 15th bucket along with the ones the requests that took 0.2 seconds. In my experience, I've never been thrilled with exponential bucketing. Unless you actually have a good idea of performance before setting the parameters, it tends to cause weird data issues when you actually want to know how long stuff takes.

This comment has been minimized.

Copy link
@RainbowMango

RainbowMango Sep 20, 2019

Author Member

The buckets seem not what you think. Wait a moment, I try to dump from my local cluster.

This comment has been minimized.

Copy link
@RainbowMango

RainbowMango Sep 20, 2019

Author Member

This comes from my local cluster.

# HELP authentication_duration_seconds [ALPHA] Authentication duration in seconds broken out by result.
# TYPE authentication_duration_seconds histogram
authentication_duration_seconds_bucket{result="success",le="0.001"} 1143
authentication_duration_seconds_bucket{result="success",le="0.002"} 1143
authentication_duration_seconds_bucket{result="success",le="0.004"} 1144
authentication_duration_seconds_bucket{result="success",le="0.008"} 1144
authentication_duration_seconds_bucket{result="success",le="0.016"} 1144
authentication_duration_seconds_bucket{result="success",le="0.032"} 1144
authentication_duration_seconds_bucket{result="success",le="0.064"} 1144
authentication_duration_seconds_bucket{result="success",le="0.128"} 1144
authentication_duration_seconds_bucket{result="success",le="0.256"} 1144
authentication_duration_seconds_bucket{result="success",le="0.512"} 1144
authentication_duration_seconds_bucket{result="success",le="1.024"} 1144
authentication_duration_seconds_bucket{result="success",le="2.048"} 1144
authentication_duration_seconds_bucket{result="success",le="4.096"} 1144
authentication_duration_seconds_bucket{result="success",le="8.192"} 1144
authentication_duration_seconds_bucket{result="success",le="16.384"} 1144
authentication_duration_seconds_bucket{result="success",le="+Inf"} 1144
authentication_duration_seconds_sum{result="success"} 0.14704390199999995
authentication_duration_seconds_count{result="success"} 1144

This comment has been minimized.

Copy link
@RainbowMango

RainbowMango Sep 20, 2019

Author Member

@enj has pointed the similar issue. So, I enlarged the bucketed number from 10 to 15.

This comment has been minimized.

Copy link
@mikedanese

mikedanese Sep 20, 2019

Member

Oops, I did n^2 instead of 2^n. Those look fine for now although we may want higher density above 1 second in the future.

This comment has been minimized.

Copy link
@RainbowMango

RainbowMango Sep 20, 2019

Author Member

OK. By the way, most latency metric using the same bucket.

Now let's call @liggitt help review and approve.

This comment has been minimized.

Copy link
@liggitt

liggitt Sep 24, 2019

Member

cc @brancz @logicalhan on bucket count/size

This comment has been minimized.

Copy link
@ehashman

ehashman Sep 25, 2019

Member

Bucket count/sizes look okay to me.

@RainbowMango

This comment has been minimized.

Copy link
Member Author

RainbowMango commented Sep 20, 2019

/assign @liggitt

@mikedanese

This comment has been minimized.

Copy link
Member

mikedanese commented Sep 20, 2019

/lgtm

@liggitt

This comment has been minimized.

Copy link
Member

liggitt commented Sep 24, 2019

I'll defer to @kubernetes/sig-instrumentation-pr-reviews on bucket size/count

@liggitt

This comment has been minimized.

Copy link
Member

liggitt commented Sep 24, 2019

/approve

/hold
for @logicalhan or @brancz or delegate to take a look at the bucketing

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Sep 24, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt, mikedanese, RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@brancz

This comment has been minimized.

Copy link
Member

brancz commented Sep 24, 2019

I don’t know the surrounding code too well. Is this metric describing a network request or largely in process? If it’s a network request then these buckets are fine.

@liggitt

This comment has been minimized.

Copy link
Member

liggitt commented Sep 24, 2019

Is this metric describing a network request or largely in process? If it’s a network request then these buckets are fine.

it is recording metrics for the authentication filter, which can involve a network request (to check a credential stored in etcd or verify a token against a remote webhook)

/hold
cancel

@RainbowMango

This comment has been minimized.

Copy link
Member Author

RainbowMango commented Sep 25, 2019

/hold cancel
:)

@k8s-ci-robot k8s-ci-robot merged commit 07025a5 into kubernetes:master Sep 25, 2019
25 checks passed
25 checks passed
cla/linuxfoundation RainbowMango authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-conformance-image-test Skipped.
pull-kubernetes-conformance-kind-ipv6 Skipped.
pull-kubernetes-cross Skipped.
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-alpha-features Skipped.
pull-kubernetes-e2e-gce-csi-serial Skipped.
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gce-iscsi Skipped.
pull-kubernetes-e2e-gce-iscsi-serial Skipped.
pull-kubernetes-e2e-gce-storage-slow Skipped.
pull-kubernetes-godeps Skipped.
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped.
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
pull-publishing-bot-validate Skipped.
tide In merge pool.
Details
@k8s-ci-robot k8s-ci-robot added this to the v1.17 milestone Sep 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.