-
Notifications
You must be signed in to change notification settings - Fork 363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jsonnet: Decrease KubeClientCertificateExpiration expiration threshold #275
jsonnet: Decrease KubeClientCertificateExpiration expiration threshold #275
Conversation
assets/prometheus-k8s/rules.yaml
Outdated
@@ -800,17 +800,17 @@ spec: | |||
- alert: KubeClientCertificateExpiration | |||
annotations: | |||
message: A client certificate used to authenticate to the apiserver is expiring | |||
in less than 7 days. | |||
in less than 0 days. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should change the alert message format to use hours rather than days, as this is not quite helpful right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will prepare a patch upstream in the kubernetes-mixins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the lowest bucket of the |
@mxinden since upstream is merged, can you create a downstream cherry-pick too? |
@s-urbaniak Downstream is already merged here: openshift/origin#22205, I am just waiting for kubernetes-monitoring/kubernetes-mixin#168 currently. |
This just needs a regenerate of bindata.go, then good to go. |
Given that Openshift rotates certificates each issued for ~4h, the default rule threshold is not correct. With this patch a warning is fired 1.5h before expiration and a critical alert is fired 1h before expiration.
963ec88
to
e76ee4c
Compare
for: 10m | ||
labels: | ||
severity: warning | ||
- alert: KubeClientCertificateExpiration | ||
annotations: | ||
message: A client certificate used to authenticate to the apiserver is expiring | ||
in less than 7 days. | ||
in less than 1 hours. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once kubernetes-monitoring/kubernetes-mixin#179 merged this will be 1.5 hours
. Even though this is a bit confusing I would like to continue here and get kubernetes-monitoring/kubernetes-mixin#179 into cluster-monitoring-operator in a follow up patch. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just merged kubernetes-monitoring/kubernetes-mixin#179. We can continue with this PR as it is or get 179 in here as well. You decide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it involves a patch on Prometheus Operator as well, let's move on here and do the update as a follow up. Thanks @metalmatze.
@squat @s-urbaniak @metalmatze would you mind taking another look? Sorry for the noise on the manifests. Given that I had to update kube-prometheus, it includes unrelated changes as well. |
I think I mentioned before, but the warning alert is probably not all that useful with such low rotation intervals. We could think about excluding it in the kubernetes-mixin if 0 or filter it out in here. I'm tending a little towards the latter I think, as it's quite specific. I'm happy with either being a follow up though, so this lgtm 👍 . /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: brancz, mxinden The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Please review the full test history for this PR and help us cut down flakes. |
Given that Openshift rotates certificates each issued for ~4h, the
default rule threshold is not correct. With this patch a warning is
fired 1.5h before expiration and a critical alert is fired 1h before
expiration.
I think here the jsonnet mixins really shine. Very cool!