Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jsonnet: Decrease KubeClientCertificateExpiration expiration threshold #275

Merged
merged 2 commits into from
Apr 1, 2019

Conversation

mxinden
Copy link
Contributor

@mxinden mxinden commented Feb 28, 2019

Given that Openshift rotates certificates each issued for ~4h, the
default rule threshold is not correct. With this patch a warning is
fired 1.5h before expiration and a critical alert is fired 1h before
expiration.

I think here the jsonnet mixins really shine. Very cool!

@mxinden mxinden added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 28, 2019
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 28, 2019
@openshift-ci-robot openshift-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Feb 28, 2019
@@ -800,17 +800,17 @@ spec:
- alert: KubeClientCertificateExpiration
annotations:
message: A client certificate used to authenticate to the apiserver is expiring
in less than 7 days.
in less than 0 days.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should change the alert message format to use hours rather than days, as this is not quite helpful right now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will prepare a patch upstream in the kubernetes-mixins.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mxinden
Copy link
Contributor Author

mxinden commented Feb 28, 2019

Given that the lowest bucket of the certificate_expiration_seconds histogram is 6h we need to make a patch upstream first. I will do that next.

@s-urbaniak
Copy link
Contributor

@mxinden since upstream is merged, can you create a downstream cherry-pick too?

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 8, 2019
@mxinden
Copy link
Contributor Author

mxinden commented Mar 11, 2019

@mxinden since upstream is merged, can you create a downstream cherry-pick too?

@s-urbaniak Downstream is already merged here: openshift/origin#22205, I am just waiting for kubernetes-monitoring/kubernetes-mixin#168 currently.

@brancz
Copy link
Contributor

brancz commented Mar 22, 2019

This just needs a regenerate of bindata.go, then good to go.

Given that Openshift rotates certificates each issued for ~4h, the
default rule threshold is not correct. With this patch a warning is
fired 1.5h before expiration and a critical alert is fired 1h before
expiration.
@openshift-ci-robot openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 29, 2019
for: 10m
labels:
severity: warning
- alert: KubeClientCertificateExpiration
annotations:
message: A client certificate used to authenticate to the apiserver is expiring
in less than 7 days.
in less than 1 hours.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once kubernetes-monitoring/kubernetes-mixin#179 merged this will be 1.5 hours. Even though this is a bit confusing I would like to continue here and get kubernetes-monitoring/kubernetes-mixin#179 into cluster-monitoring-operator in a follow up patch. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just merged kubernetes-monitoring/kubernetes-mixin#179. We can continue with this PR as it is or get 179 in here as well. You decide.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it involves a patch on Prometheus Operator as well, let's move on here and do the update as a follow up. Thanks @metalmatze.

@mxinden
Copy link
Contributor Author

mxinden commented Mar 29, 2019

@squat @s-urbaniak @metalmatze would you mind taking another look? Sorry for the noise on the manifests. Given that I had to update kube-prometheus, it includes unrelated changes as well.

@brancz
Copy link
Contributor

brancz commented Apr 1, 2019

I think I mentioned before, but the warning alert is probably not all that useful with such low rotation intervals. We could think about excluding it in the kubernetes-mixin if 0 or filter it out in here. I'm tending a little towards the latter I think, as it's quite specific. I'm happy with either being a follow up though, so this lgtm 👍 .

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 1, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: brancz, mxinden

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mxinden mxinden removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 1, 2019
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants