Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add latency metric for CR Webhook Conversion #74376

Merged
merged 3 commits into from Mar 8, 2019

Conversation

@mbohlool
Copy link
Member

mbohlool commented Feb 21, 2019

This PR adds a latency histogram metric for Custom Resource Conversion webhook calls. It records the name of the parent CRD, to and from versions and success state in each observation as a label.

- Add duration metric for CRD webhook converters

@sttts @liggitt

Show resolved Hide resolved ...rc/k8s.io/apiextensions-apiserver/pkg/apiserver/conversion/monitoring.go Outdated
Show resolved Hide resolved ...rc/k8s.io/apiextensions-apiserver/pkg/apiserver/conversion/monitoring.go Outdated
Show resolved Hide resolved ...rc/k8s.io/apiextensions-apiserver/pkg/apiserver/conversion/monitoring.go Outdated
)

var (
latencyBuckets = prometheus.ExponentialBuckets(1, 2, 5)

This comment has been minimized.

@liggitt

liggitt Feb 22, 2019

Member

what do these boundaries represent?

This comment has been minimized.

@mbohlool

mbohlool Feb 25, 2019

Author Member

Actually I chose them arbitrarily. Thinking more about this, maybe we want to start with smaller numbers:

< 50ms Very Good converter
< 100ms Good converter
< 200ms fair
< 400ms ok
< 1s slow
< 5s very slow
others: very very slow

wdyt?

This comment has been minimized.

@brancz

brancz Feb 26, 2019

Member

As webhooks can wildly vary and it's hard to predict as we don't have control over them, a common solution is to use a summary when interacting with 3rd party providers. I'm not sure aggregating client latencies against webhooks fleet wide is a very strong use case, I'd rather expect the webhook server to expose those metrics as a histogram and do the fleet wide analysis on that side.

This comment has been minimized.

@mbohlool

mbohlool Feb 27, 2019

Author Member

First we may start providing webhooks ourselves as standard types eventually move to CRD (or new standard types at least). Also I think summary metric can be derived from histogram metric, right?
Depends on number of times we call conversion webhooks, we may want them to be less than some number to make sure the API call is fast enough. I think these buckets can change in future but histogram make more sense. Also we do have histogram metrics for admission webhooks and if we decided we want summary here, we should be consistent.

This comment has been minimized.

@liggitt

liggitt Mar 6, 2019

Member

I can't tell if this discussion got resolved or not. given discussion of stability guarantees around metrics, how sure are we that this metric (name and structure) will not change?

This comment has been minimized.

@logicalhan

logicalhan Mar 7, 2019

Contributor

I think it is generally preferable to use a histogram than summary metrics. In respect to the aggregation issue, while I agree there is probably a marginal usecases for aggregating this across a fleet, it does feel safer to me to leave that door open and just opt to use a histogram.

Insofar as the bucket sizing, I agree with starting with smaller numbers. It seems like we opt to use this bucket sizing for latency measurements (but obviously this should be shaped to the thing being measured).

This comment has been minimized.

@mbohlool

mbohlool Mar 7, 2019

Author Member

Changed the bucket sizing to what @logicalhan suggested

This comment has been minimized.

@brancz

brancz Mar 7, 2019

Member

that bucketing sounds good to me as well

Show resolved Hide resolved ...rc/k8s.io/apiextensions-apiserver/pkg/apiserver/conversion/monitoring.go Outdated
@roycaihw

This comment has been minimized.

Copy link
Member

roycaihw commented Feb 25, 2019

@DirectXMan12
Copy link
Contributor

DirectXMan12 left a comment

minor nits

@mbohlool mbohlool force-pushed the mbohlool:gimli branch from dae1cad to 36bd57d Feb 28, 2019

@k8s-ci-robot k8s-ci-robot added size/XL and removed size/L labels Feb 28, 2019

@mbohlool

This comment has been minimized.

Copy link
Member Author

mbohlool commented Feb 28, 2019

@mbohlool

This comment has been minimized.

Copy link
Member Author

mbohlool commented Feb 28, 2019

/test pull-kubernetes-godeps

@mbohlool

This comment has been minimized.

Copy link
Member Author

mbohlool commented Mar 6, 2019

@logicalhan
Copy link
Contributor

logicalhan left a comment

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label Mar 7, 2019

@mbohlool mbohlool force-pushed the mbohlool:gimli branch from a06c788 to 9be562b Mar 7, 2019

@k8s-ci-robot k8s-ci-robot removed the lgtm label Mar 7, 2019

@mbohlool

This comment has been minimized.

Copy link
Member Author

mbohlool commented Mar 7, 2019

/retest

@liggitt

This comment has been minimized.

Copy link
Member

liggitt commented Mar 8, 2019

/lgtm
/approve

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Mar 8, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt, mbohlool

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mbohlool mbohlool added this to the v1.14 milestone Mar 8, 2019

@mbohlool mbohlool force-pushed the mbohlool:gimli branch from 8f74879 to 18be830 Mar 8, 2019

@k8s-ci-robot k8s-ci-robot removed the lgtm label Mar 8, 2019

@dims

This comment has been minimized.

Copy link
Member

dims commented Mar 8, 2019

re-applying lgtm from @liggitt after rebase

/lgtm

@spiffxp

This comment has been minimized.

Copy link
Member

spiffxp commented Mar 8, 2019

/milestone v1.14
Discussed in #sig-release. Given this was already approved and reviewed and passed pre freeze this seems reasonable. If this break things we reserve the right to revert

@mbohlool

This comment has been minimized.

Copy link
Member Author

mbohlool commented Mar 8, 2019

/retest

1 similar comment
@mbohlool

This comment has been minimized.

Copy link
Member Author

mbohlool commented Mar 8, 2019

/retest

@k8s-ci-robot k8s-ci-robot merged commit 349ce4c into kubernetes:master Mar 8, 2019

17 checks passed

cla/linuxfoundation mbohlool authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-conformance-image-test Skipped.
pull-kubernetes-cross Skipped.
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-godeps Job succeeded.
Details
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped.
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
pull-publishing-bot-validate Skipped.
tide In merge pool.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.