Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add webhooks client config service metrics #2114

Merged
merged 1 commit into from
Jul 26, 2023

Conversation

dgrisonnet
Copy link
Member

What this PR does / why we need it:

In order to better identify, classify, and debug webhook latency issues, it is important to have a metric that would point to the resource it is responsible for. However, it is not possible to have that dimension in the metrics exposed by Kubernetes because of the unbound cardinality that such a label would have.

The name of the webhook could be an alternative since it usually contains some information about the resource that the webhook targets, however this is not very practical to use in multi-tenants environments.

A solution for these kind of platform is to tie a specific webhook to a namespace in order to be able to know which tenant manages it and take actions depending on that. This is achievable by leveraging the client config information of webhooks configured via WebhookConfiguration resources since Services are namespaced objects.

With these new metrics, users will be able to split the alerting severity of webhook latency / rejection rate per namespace on top of being able to do it based on the webhook name. This is key in environment where administrators don't have control over the webhooks installed by the various tenants.

How does this change affect the cardinality of KSM: (increases, decreases or does not change cardinality)

O(webhooks)

In order to better identify, prioritize, and debug webhook latency
issues it is important to have a metric that would point to the resource
it is responsible for.  However, it is not possible to have that
dimension in the metrics exposed by Kubernetes because of the unbound
cardinality that such a label would have.

The name of the webhook could be an alternative since it usually
contains some information about the resource that the webhook targets,
however this is not very practical to use in multi-tenants
environments.

A solution for these kind of platform is to tie a specific webhook to a
namespace in order to be able to know which tenant manages it and take
actions depending on that. This is achieveable by leveraging the client
config information of webhooks configured via WebhookConfiguration
resources since Services are namespaced objects.

With these new metrics, users will be able to split the alerting
severity of webhook latency / rejection rate per namespace on top of
being able to do it based on the webhook name. This is key in
environment where administrators don't have control over the webhooks
installed by the various tenants.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
@dgrisonnet dgrisonnet added kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Jul 4, 2023
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 4, 2023
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 4, 2023
@dgrisonnet dgrisonnet changed the title Add webhooks client config service metrics feat: Add webhooks client config service metrics Jul 4, 2023
@mrueg
Copy link
Member

mrueg commented Jul 13, 2023

/hold
for others to review, feel free to cancel if nobody else gets around to it
/lgtm

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 13, 2023
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 13, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dgrisonnet, mrueg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mrueg
Copy link
Member

mrueg commented Jul 26, 2023

/hold cancel

2 weeks have passed, let's get this change in.

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 26, 2023
@k8s-ci-robot k8s-ci-robot merged commit 61c9de0 into kubernetes:main Jul 26, 2023
12 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants