-
Notifications
You must be signed in to change notification settings - Fork 363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add alert for no routes configured in alertmanager #585
Add alert for no routes configured in alertmanager #585
Conversation
d2ef574
to
b518d04
Compare
/lgtm |
/retest Please review the full test history for this PR and help us cut down flakes. |
7 similar comments
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/hold I expected the e2e test (only 2 firing alerts) to fail. |
jsonnet/rules.jsonnet
Outdated
expr: 'cluster:alertmanager_routing_enabled:max > 0', | ||
alert: 'AlertmanagerReceiversNotConfigured', | ||
annotations: { | ||
message: 'Alerts are not configured to be sent to a notification system. Configuring an alert route will allow you to be notified when important failures occur.', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Configuring an alert route" -> "Configuring a notification receiver"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need at least one route configured to send alerts and while configuring route you need to configure any receiver either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The confusion for me is that there's always at least one route in the routing tree (the top-level route) but this isn't really important. As @lilic commented, it might be good to provide guidance. What about?
Alerts are not configured to be sent to a notification system, meaning that you may not be notified in a timely fashion when important failures occur. Check the OpenShift documentation to learn how to configure notifications with Alertmanager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we would point to a playbook, but the above msg sgtm. 👍
/retest |
Can you rebase? |
b518d04
to
8fa59bc
Compare
rebased |
/hold cancel e2e shouldn't fail since openshift/origin#24276 is merged |
Rebased, checked if alert severity is taken into consideration when rendering alert in openshift console (doesn't matter), changed alert message. |
5ebeef0
to
798e265
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
this lgtm, letting @smarterclayton have a final look
ping @smarterclayton for a final review pass. |
@paulfantom lets just rebase and merge this, if Clayton wants something changed we can do that afterwards as well, just so we don't forget to ship this. SGTY? |
…cal alert when no routes are configured
798e265
to
a1fcadf
Compare
Rebased. Let's merge it asap as it seems people are under impression that this already landed (https://coreos.slack.com/archives/C0VMT03S5/p1579040352101100). We can later tweak messages as those are less important than telemetry data this provides. /cc @lilic @s-urbaniak |
Warning is fine, console can deliberately elevate this with more priority /cherry-pick release-4.3 ^ since all the other pieces are in |
@smarterclayton: once the present PR merges, I will cherry-pick it on top of release-4.3 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: LiliC, paulfantom, s-urbaniak The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Please review the full test history for this PR and help us cut down flakes. |
2 similar comments
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
@smarterclayton: #585 failed to apply on top of branch "release-4.3":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hrm, I’m not seeing this be reported at all: sort_desc(count by (alertname) (count by (_id,alertname) (alerts{alertstate="firing",alertname="AlertmanagerReceiversNotConfigured"}))) This should be firing on a default cluster, is it not? |
Follow up to #554
Requested in #554 (comment)
Requested in https://coreos.slack.com/archives/C0VMT03S5/p1575909697091600
Fixed recording rule name as requested in #554 (comment)
/cc @s-urbaniak @smarterclayton