Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add alert for no routes configured in alertmanager #585

Merged
merged 2 commits into from Jan 15, 2020

Conversation

paulfantom
Copy link
Contributor

@paulfantom paulfantom commented Dec 9, 2019

Follow up to #554

Requested in #554 (comment)
Requested in https://coreos.slack.com/archives/C0VMT03S5/p1575909697091600

Fixed recording rule name as requested in #554 (comment)

/cc @s-urbaniak @smarterclayton

@openshift-ci-robot openshift-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Dec 9, 2019
@s-urbaniak
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 9, 2019
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

7 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@smarterclayton
Copy link
Contributor

/hold

I expected the e2e test (only 2 firing alerts) to fail.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 10, 2019
expr: 'cluster:alertmanager_routing_enabled:max > 0',
alert: 'AlertmanagerReceiversNotConfigured',
annotations: {
message: 'Alerts are not configured to be sent to a notification system. Configuring an alert route will allow you to be notified when important failures occur.',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Configuring an alert route" -> "Configuring a notification receiver"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need at least one route configured to send alerts and while configuring route you need to configure any receiver either way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The confusion for me is that there's always at least one route in the routing tree (the top-level route) but this isn't really important. As @lilic commented, it might be good to provide guidance. What about?

Alerts are not configured to be sent to a notification system, meaning that you may not be notified in a timely fashion when important failures occur. Check the OpenShift documentation to learn how to configure notifications with Alertmanager.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we would point to a playbook, but the above msg sgtm. 👍

@paulfantom
Copy link
Contributor Author

/retest

@smarterclayton
Copy link
Contributor

Can you rebase?

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 12, 2019
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Dec 16, 2019
@paulfantom
Copy link
Contributor Author

paulfantom commented Dec 16, 2019

rebased
/unhold

@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 16, 2019
@paulfantom
Copy link
Contributor Author

/hold cancel

e2e shouldn't fail since openshift/origin#24276 is merged

@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Dec 19, 2019
@paulfantom
Copy link
Contributor Author

Rebased, checked if alert severity is taken into consideration when rendering alert in openshift console (doesn't matter), changed alert message.

@openshift-ci-robot openshift-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 19, 2019
Copy link
Contributor

@lilic lilic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

this lgtm, letting @smarterclayton have a final look

@s-urbaniak
Copy link
Contributor

ping @smarterclayton for a final review pass.

@lilic
Copy link
Contributor

lilic commented Jan 13, 2020

@paulfantom lets just rebase and merge this, if Clayton wants something changed we can do that afterwards as well, just so we don't forget to ship this. SGTY?

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 13, 2020
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 14, 2020
@paulfantom
Copy link
Contributor Author

Rebased. Let's merge it asap as it seems people are under impression that this already landed (https://coreos.slack.com/archives/C0VMT03S5/p1579040352101100). We can later tweak messages as those are less important than telemetry data this provides.

/cc @lilic @s-urbaniak

@smarterclayton
Copy link
Contributor

Warning is fine, console can deliberately elevate this with more priority

/cherry-pick release-4.3

^ since all the other pieces are in

@openshift-cherrypick-robot

@smarterclayton: once the present PR merges, I will cherry-pick it on top of release-4.3 in a new PR and assign it to you.

In response to this:

Warning is fine, console can deliberately elevate this with more priority

/cherry-pick release-4.3

^ since all the other pieces are in

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

@lilic lilic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 15, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: LiliC, paulfantom, s-urbaniak

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [LiliC,paulfantom,s-urbaniak]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

2 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit fb0882f into openshift:master Jan 15, 2020
@openshift-cherrypick-robot

@smarterclayton: #585 failed to apply on top of branch "release-4.3":

error: Failed to merge in the changes.
Using index info to reconstruct a base tree...
M	jsonnet/rules.jsonnet
Falling back to patching base and 3-way merge...
Auto-merging jsonnet/rules.jsonnet
CONFLICT (content): Merge conflict in jsonnet/rules.jsonnet
Patch failed at 0001 jsonnet: fix alertmanager recording rule name; add alertmanager critical alert when no routes are configured

In response to this:

Warning is fine, console can deliberately elevate this with more priority

/cherry-pick release-4.3

^ since all the other pieces are in

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@smarterclayton
Copy link
Contributor

Hrm, I’m not seeing this be reported at all:

sort_desc(count by (alertname) (count by (_id,alertname) (alerts{alertstate="firing",alertname="AlertmanagerReceiversNotConfigured"})))

This should be firing on a default cluster, is it not?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants