Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add admission metrics for webhooks #55183

Merged
merged 5 commits into from Nov 14, 2017

Conversation

jpbetz
Copy link
Contributor

@jpbetz jpbetz commented Nov 6, 2017

Implements the Admission Webhooks: Prometheus Metrics design.

Fixes: #55030

ref: kubernetes/enhancements#492

Metrics have been added for monitoring admission plugins, including the new dynamic (webhook-based) ones.

@jpbetz jpbetz added kind/feature Categorizes issue or PR as related to a new feature. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Nov 6, 2017
@jpbetz jpbetz added this to the v1.9 milestone Nov 6, 2017
@jpbetz jpbetz self-assigned this Nov 6, 2017
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Nov 6, 2017
@jpbetz jpbetz added the release-note-none Denotes a PR that doesn't merit a release note. label Nov 6, 2017
@jpbetz jpbetz mentioned this pull request Nov 6, 2017

// NewChainHandler creates a new chain handler from an array of handlers. Used for testing.
func NewChainHandler(handlers ...Interface) chainAdmissionHandler {
func NewChainHandler(handlers ...NamedHandler) chainAdmissionHandler {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you'll need to update a couple unit tests:

chainHandler := admission.NewChainHandler(handler)

handler := admission.NewChainHandler(pvHandler)

(See https://github.com/kubernetes/kubernetes/pull/55086/files#diff-97778d709e3ef8a841d3749d2c16072a)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tallclair! I've fixed the tests, making heavy use of the work you did for #55086.

@k8s-ci-robot k8s-ci-robot removed the release-note-none Denotes a PR that doesn't merit a release note. label Nov 6, 2017
// namedHandler requires each admission.Interface be named, primarly for metrics tracking purposes.
type NamedHandler interface {
Interface
GetName() string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it much harder to just add Name() to the existing interface?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please correct me if I'm wrong, but my current understanding is that since we don't guarantee a 1:1 mapping of Interface to plugin, we can't name the interface implementations directly, but must instead supplement them with a name only after they are bound to a plugin, which is what I've gone for here.

latencyBuckets = prometheus.ExponentialBuckets(10000, 2.0, 8)
latencySummaryMaxAge = 5 * time.Hour

// Admission step metrics. Each step is identified by a distinct type label value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define 'step'?

stepMetrics.observe(elapsed, rejected, string(attr.GetOperation()), gvr.Group, gvr.Version, gvr.Resource, attr.GetSubresource(), stepType)
}

// ObserveAdmissionController records admission related metrics for a build-in admission controller, identified by it's plugin handler name.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/build-in/built-in/

prometheus.MustRegister(m.latenciesSummary)
}

func (m *admissionMetrics) observe(elapsed time.Duration, rejected bool, labels ...string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the order of the labels necessary?

Copy link
Contributor Author

@jpbetz jpbetz Nov 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the labels are positional. They're defined using a []string in calls to functions like NewCounterVec and are assigned values via the varadic WithLabelValues function. It's not type safe, which is why I've hidden this observe method behind strongly typed Observe...() functions-- it contains the unchecked positional ordering to a small region of code.

}

// ObserveExternalWebhook records admission related metrics for a external admission webhook.
func ObserveExternalWebhook(elapsed time.Duration, rejected bool, hook *v1alpha1.ExternalAdmissionHook, attr Attributes) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if you make this a method of the admissionMetrics object?

Then you can expose the global one by saying

var (
  ObserveExternalWebhook = externalWebhookMetrics.observeExternal
  ...
)

Then you can test the internal method and be confident that the exposed one works, and you don't have to reference the global in the test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea.

The catch here is that admissionMetrics is common to all the different use cases (step, controller, external webhook), but the Observe*() functions customized for each of those use cases for type safety purposes:

ObserveAdmissionStep(elapsed time.Duration, rejected bool, attr Attributes, stepType string)
ObserveAdmissionController(elapsed time.Duration, rejected bool, handler NamedHandler, attr Attributes)

This is why only the observe() function, which is common to all the use cases, is on the admissionMetrics type.

But I should be able to restructure this to get the test-ability improvement you're suggesting. I'll work on it.

m.latenciesSummary.WithLabelValues(labels...).Observe(elapsedMicroseconds)
}

func newAdmissionMetrics(name string, labels []string, helpTemplate string) *admissionMetrics {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making this a constructor which emits a testable thing! I have one suggestion above to make it even more testable.

@lavalamp
Copy link
Member

lavalamp commented Nov 6, 2017

Can you add a test? Want to make sure that, if anyone ever changes the set of labels, a test at least warns them that monitoring might depend on them.

@lavalamp lavalamp assigned lavalamp and unassigned jpbetz Nov 6, 2017
@tallclair
Copy link
Member

Testing prometheus metrics is a bit messy, but see https://github.com/kubernetes/kubernetes/pull/55086/files#diff-486060c5d6c3f596cc0ea1d054b9a2d6R191 for a starting point.

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 7, 2017
@jpbetz jpbetz force-pushed the webhook-metrics branch 2 times, most recently from d41fb11 to 8be4872 Compare November 8, 2017 00:52
@jpbetz jpbetz added the release-note-none Denotes a PR that doesn't merit a release note. label Nov 8, 2017
@jpbetz
Copy link
Contributor Author

jpbetz commented Nov 8, 2017

/test pull-kubernetes-bazel-test

@crassirostris
Copy link

Metric names seem more understandable now, at least to me, a person without much context, thanks a lot!

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 14, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: crassirostris, jpbetz, lavalamp, tallclair

Associated issue: 55030

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot
Copy link

[MILESTONENOTIFIER] Milestone Pull Request Current

@crassirostris @jpbetz @lavalamp @tallclair

Pull Request Labels
  • sig/api-machinery: Pull Request will be escalated to these SIGs if needed.
  • priority/important-soon: Escalate to the pull request owners and SIG owner; move out of milestone after several unsuccessful escalation attempts.
  • kind/feature: New functionality.
Help

@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@jpbetz
Copy link
Contributor Author

jpbetz commented Nov 14, 2017

/test pull-kubernetes-kubemark-e2e-gce

@k8s-github-robot
Copy link

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit 02b3928 into kubernetes:master Nov 14, 2017

// chainAdmissionHandler is an instance of admission.NamedHandler that performs admission control using
// a chain of admission handlers
type chainAdmissionHandler []NamedHandler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpbetz This type is a union type of Interfaces to produce a union Interface. This change breaks the composition model of having a union type that unions the types it is exposing. Please make a followup that restores the composition model.

You can do this by creating an Interface wrapper which provides both Admit and Validate, and does type casting on the delegate. That will preserve the abstractions and allows layering.

err = chainHandler.Admit(admission.NewAttributesRecord(&pod, nil, api.Kind("Pod").WithVersion("version"), pod.Namespace, pod.Name, api.Resource("pods").WithVersion("version"), "", admission.Update, nil))
if err != nil {
t.Errorf("unexpected error returned from admission handler")
if handler.Handles(admission.Update) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpbetz This check isn't the same as the Admit call that was previously done. You need a followup to restore the test.

@deads2k
Copy link
Contributor

deads2k commented Nov 15, 2017

@jpbetz looks like your forgot to tag @kubernetes/sig-api-machinery-pr-reviews and/or myself, @sttts, or @derekwaynecarr (owners of the local admission area). I've made a couple comments that need to be addressed. Please make the requested changes and copy us into the followup pulls.

@jpbetz
Copy link
Contributor Author

jpbetz commented Nov 15, 2017

@deads2k, my mistake, I'll submit a follow up PR shortly.

jpbetz added a commit to jpbetz/kubernetes that referenced this pull request Nov 17, 2017
sttts pushed a commit to sttts/kubernetes that referenced this pull request Nov 20, 2017
k8s-github-robot pushed a commit that referenced this pull request Nov 21, 2017
…trics

Automatic merge from submit-queue (batch tested with PRs 55938, 56055, 53385, 55796, 55922). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

admission: make admission metrics compositional

Metrics emission of admission plugins and the admission chain can be implemented compositionally, i.e. completely independently from the chain logic. This PR does that, moves the whole metrics code into a sub-package to contain complexity. The plumbing logic for the emitted metrics finally is cleanly done in the apiserver bootstrapping code, instead of being totally interleaved with the core admission logic.

Ratio:
- considerably less complexity
- admission plugins are compositional, including the chain. We cannot assume that there is only one chain at the outside of the admission plugin structure. Downstream projects might have more complex admission chains, i.e. multiple chain object nested.
- addition of metrics is plumbing and should be in the apiserver plumbing code. This makes it much easier to reason about the security critical admission chain.

Follow-up of #55183 and based on #55919.
@smarterclayton
Copy link
Contributor

This is O(M*N) with resources and admission controllers. On a large cluster with lots of CRDs this causes a significant amount of memory use and churn. On a smaller cluster with lots of admission controllers and CRDs, 75% of all apiserver prometheus metrics (38k series out of 46k series) were this metric.

The primary dimension that is arguably not useful is by resource type and sub resource, since the vast majority of these are not matched.

I think we should drop the resource and subresource latency tracking.

@tallclair
Copy link
Member

@smarterclayton There are 3 different metrics added in this PR: "step", "controller", and "webhook". I believe controller & webhook are only recorded when the admission controller handles the resource type. Step is for all resources, but only not by controller. Do you know what the breakdown of these 3 metrics are in the clusters you're looking at? Are they all problematic, or just one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/admission-control cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants