-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PrometheusRule and basic alert. Fixes #3342 #3368
Conversation
Hi @PaulusTM. Thanks for your PR. I'm waiting for a jetstack or cert-manager member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
app.kubernetes.io/name: {{ include "cert-manager.name" . }} | ||
app.kubernetes.io/instance: {{ .Release.Name }} | ||
app.kubernetes.io/managed-by: {{ .Release.Service }} | ||
app.kubernetes.io/component: "controller" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This label can be removed as it applies to the deployment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only the app.kubernetes.io/component
label or should all labels be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think only the app.kubernetes.io/component
label should be removed.
Looks good to me! Just some few nits in naming of things. I wouldn't yet say that it fully solves #3342 but it is a great start! /ok-to-test |
enabled: false | ||
labels: {} | ||
rules: | ||
- alert: CertManagerAbsent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems weird to me that we are basically embedding the PrometheusRule spec here - why would a user choose to embed their AlertManager alerts into our values.yaml, when it does not have a schema, as well as less control over the resulting resource that is created? To me at least, it seems like it'd make more sense to have your own standalone chart/set of YAML for this. It doesn't seem like this provides a meaningful/useful abstraction to users...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an user I expect the cert-manager helm chart to provide me with basic alerts grouped as application (with a feature flag, so I can enable or disable them). This is a good start point to create a basic set of alerts.
It can also be used to find alerts that make sense for cert-manager that I can use in my prometheus operator setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but I don't think exposing the entirety of the PrometheusRule
spec is desirable, as this basically makes the cert-manager Helm chart a 'deployment tool' for PrometheusRule resources. I agree that it's great if we can provide some out of the box configuration to guide people & to get them started though.
Instead of exposing all this in values.yaml
, can we instead define this as a resource in the templates/
directory and gate it behind a simple boolean prometheus.alerts.enabled: true/false
? (with a default of false
as a lot of users won't have PrometheusRule as a CRD installed?)
I think that'd make this a lot more palatable to accept, and if in future there is demand for users being able to configure their own custom alerts via our Helm chart, we are not 'boxing ourselves in' and it'd be possible to still add prometheus.alerts.customRules
or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually, what is done is that there are default rules (with prometheusRule.enabled: false
by default) with specific selectors + templating in order NOT to be triggered by another cert-manager, and the whole PrometheusRule
value is given. Of course, it has weak validation from the Chart point of view, but anyway Prometheus validator webhook is able to do the validation.
See bitnami for example: https://github.com/bitnami/charts/blob/master/bitnami/redis/values.yaml#L1222
It would be very useful to have such default alerts to have alerting out of the box. Today, everybody have to re-implement them.
Defaults alerts such as not ready Certificates (certmanager_certificate_ready_status{condition!="True"} > 0
for more than 30 minutes?) would be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your remarks @munnerz. I'll update the PR to reflect your idea.
81a2891
to
51e3e86
Compare
I added this PrometheusRule to our clusters to check for Certificates that can't be issued (in state
Would love to see this PR getting on the road again! :) |
Signed-off-by: Daniel Paulus <d.paulus@gmail.com>
Signed-off-by: Daniel Paulus <d.paulus@gmail.com>
789fbc5
to
eae4239
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: PaulusTM The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@PaulusTM: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@PaulusTM: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Issues go stale after 90d of inactivity. |
Stale issues rot after 30d of inactivity. |
Rotten issues close after 30d of inactivity. |
@jetstack-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@sebastiangaiser: You can't reopen an issue/PR unless you authored it or you are a collaborator. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This PR adds a PrometheusRule with a basic alert to detect if Cert-Manager is running.
Let's discuss if there need more alerts included.
Note: This is my first PR to cert-manager, please let me know if I missed any important steps.