New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add admission metrics for webhooks #55183
Conversation
3105c9c
to
8cb5a8e
Compare
|
||
// NewChainHandler creates a new chain handler from an array of handlers. Used for testing. | ||
func NewChainHandler(handlers ...Interface) chainAdmissionHandler { | ||
func NewChainHandler(handlers ...NamedHandler) chainAdmissionHandler { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you'll need to update a couple unit tests:
chainHandler := admission.NewChainHandler(handler) |
handler := admission.NewChainHandler(pvHandler) |
(See https://github.com/kubernetes/kubernetes/pull/55086/files#diff-97778d709e3ef8a841d3749d2c16072a)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tallclair! I've fixed the tests, making heavy use of the work you did for #55086.
// namedHandler requires each admission.Interface be named, primarly for metrics tracking purposes. | ||
type NamedHandler interface { | ||
Interface | ||
GetName() string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it much harder to just add Name()
to the existing interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please correct me if I'm wrong, but my current understanding is that since we don't guarantee a 1:1 mapping of Interface
to plugin, we can't name the interface implementations directly, but must instead supplement them with a name only after they are bound to a plugin, which is what I've gone for here.
latencyBuckets = prometheus.ExponentialBuckets(10000, 2.0, 8) | ||
latencySummaryMaxAge = 5 * time.Hour | ||
|
||
// Admission step metrics. Each step is identified by a distinct type label value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Define 'step'?
stepMetrics.observe(elapsed, rejected, string(attr.GetOperation()), gvr.Group, gvr.Version, gvr.Resource, attr.GetSubresource(), stepType) | ||
} | ||
|
||
// ObserveAdmissionController records admission related metrics for a build-in admission controller, identified by it's plugin handler name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/build-in/built-in/
prometheus.MustRegister(m.latenciesSummary) | ||
} | ||
|
||
func (m *admissionMetrics) observe(elapsed time.Duration, rejected bool, labels ...string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the order of the labels necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the labels are positional. They're defined using a []string
in calls to functions like NewCounterVec
and are assigned values via the varadic WithLabelValues
function. It's not type safe, which is why I've hidden this observe
method behind strongly typed Observe...()
functions-- it contains the unchecked positional ordering to a small region of code.
} | ||
|
||
// ObserveExternalWebhook records admission related metrics for a external admission webhook. | ||
func ObserveExternalWebhook(elapsed time.Duration, rejected bool, hook *v1alpha1.ExternalAdmissionHook, attr Attributes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if you make this a method of the admissionMetrics
object?
Then you can expose the global one by saying
var (
ObserveExternalWebhook = externalWebhookMetrics.observeExternal
...
)
Then you can test the internal method and be confident that the exposed one works, and you don't have to reference the global in the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea.
The catch here is that admissionMetrics
is common to all the different use cases (step, controller, external webhook), but the Observe*()
functions customized for each of those use cases for type safety purposes:
ObserveAdmissionStep(elapsed time.Duration, rejected bool, attr Attributes, stepType string)
ObserveAdmissionController(elapsed time.Duration, rejected bool, handler NamedHandler, attr Attributes)
This is why only the observe()
function, which is common to all the use cases, is on the admissionMetrics
type.
But I should be able to restructure this to get the test-ability improvement you're suggesting. I'll work on it.
m.latenciesSummary.WithLabelValues(labels...).Observe(elapsedMicroseconds) | ||
} | ||
|
||
func newAdmissionMetrics(name string, labels []string, helpTemplate string) *admissionMetrics { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for making this a constructor which emits a testable thing! I have one suggestion above to make it even more testable.
Can you add a test? Want to make sure that, if anyone ever changes the set of labels, a test at least warns them that monitoring might depend on them. |
Testing prometheus metrics is a bit messy, but see https://github.com/kubernetes/kubernetes/pull/55086/files#diff-486060c5d6c3f596cc0ea1d054b9a2d6R191 for a starting point. |
d41fb11
to
8be4872
Compare
/test pull-kubernetes-bazel-test |
Metric names seem more understandable now, at least to me, a person without much context, thanks a lot! /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: crassirostris, jpbetz, lavalamp, tallclair Associated issue: 55030 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
[MILESTONENOTIFIER] Milestone Pull Request Current @crassirostris @jpbetz @lavalamp @tallclair Pull Request Labels
|
/test all [submit-queue is verifying that this PR is safe to merge] |
/test pull-kubernetes-kubemark-e2e-gce |
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here. |
|
||
// chainAdmissionHandler is an instance of admission.NamedHandler that performs admission control using | ||
// a chain of admission handlers | ||
type chainAdmissionHandler []NamedHandler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jpbetz This type is a union type of Interfaces
to produce a union Interface
. This change breaks the composition model of having a union type that unions the types it is exposing. Please make a followup that restores the composition model.
You can do this by creating an Interface
wrapper which provides both Admit
and Validate
, and does type casting on the delegate. That will preserve the abstractions and allows layering.
err = chainHandler.Admit(admission.NewAttributesRecord(&pod, nil, api.Kind("Pod").WithVersion("version"), pod.Namespace, pod.Name, api.Resource("pods").WithVersion("version"), "", admission.Update, nil)) | ||
if err != nil { | ||
t.Errorf("unexpected error returned from admission handler") | ||
if handler.Handles(admission.Update) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jpbetz This check isn't the same as the Admit
call that was previously done. You need a followup to restore the test.
@jpbetz looks like your forgot to tag @kubernetes/sig-api-machinery-pr-reviews and/or myself, @sttts, or @derekwaynecarr (owners of the local admission area). I've made a couple comments that need to be addressed. Please make the requested changes and copy us into the followup pulls. |
@deads2k, my mistake, I'll submit a follow up PR shortly. |
…trics Automatic merge from submit-queue (batch tested with PRs 55938, 56055, 53385, 55796, 55922). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. admission: make admission metrics compositional Metrics emission of admission plugins and the admission chain can be implemented compositionally, i.e. completely independently from the chain logic. This PR does that, moves the whole metrics code into a sub-package to contain complexity. The plumbing logic for the emitted metrics finally is cleanly done in the apiserver bootstrapping code, instead of being totally interleaved with the core admission logic. Ratio: - considerably less complexity - admission plugins are compositional, including the chain. We cannot assume that there is only one chain at the outside of the admission plugin structure. Downstream projects might have more complex admission chains, i.e. multiple chain object nested. - addition of metrics is plumbing and should be in the apiserver plumbing code. This makes it much easier to reason about the security critical admission chain. Follow-up of #55183 and based on #55919.
This is O(M*N) with resources and admission controllers. On a large cluster with lots of CRDs this causes a significant amount of memory use and churn. On a smaller cluster with lots of admission controllers and CRDs, 75% of all apiserver prometheus metrics (38k series out of 46k series) were this metric. The primary dimension that is arguably not useful is by resource type and sub resource, since the vast majority of these are not matched. I think we should drop the resource and subresource latency tracking. |
@smarterclayton There are 3 different metrics added in this PR: "step", "controller", and "webhook". I believe controller & webhook are only recorded when the admission controller handles the resource type. Step is for all resources, but only not by controller. Do you know what the breakdown of these 3 metrics are in the clusters you're looking at? Are they all problematic, or just one? |
Implements the Admission Webhooks: Prometheus Metrics design.
Fixes: #55030
ref: kubernetes/enhancements#492