-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1992555: Comply with Openshift alerting guidelines #288
Bug 1992555: Comply with Openshift alerting guidelines #288
Conversation
@lmzuccarelli: This pull request references Bugzilla bug 1992555, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/bugzilla refresh |
@lmzuccarelli: An error was encountered querying GitHub for users with public email (hongli@redhat.com) for bug 1992555 on the Bugzilla server at https://bugzilla.redhat.com. No known errors were detected, please see the full error message for details. Full error message.
non-200 OK status code: 403 Forbidden body: "{\n \"documentation_url\": \"https://docs.github.com/en/free-pro-team@latest/rest/overview/resources-in-the-rest-api#secondary-rate-limits\",\n \"message\": \"You have exceeded a secondary rate limit. Please wait a few minutes before you try again.\"\n}\n"
Please contact an administrator to resolve this issue, then request a bug refresh with In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/bugzilla refresh |
@lmzuccarelli: An error was encountered querying GitHub for users with public email (hongli@redhat.com) for bug 1992555 on the Bugzilla server at https://bugzilla.redhat.com. No known errors were detected, please see the full error message for details. Full error message.
non-200 OK status code: 403 Forbidden body: "{\n \"documentation_url\": \"https://docs.github.com/en/free-pro-team@latest/rest/overview/resources-in-the-rest-api#secondary-rate-limits\",\n \"message\": \"You have exceeded a secondary rate limit. Please wait a few minutes before you try again.\"\n}\n"
Please contact an administrator to resolve this issue, then request a bug refresh with In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
message: "{{ $value }} CoreDNS panics observed on {{ $labels.instance }}" | ||
summary: CoreDNS panic | ||
description: "{{ $value }} CoreDNS panics observed on {{ $labels.instance }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does description
supersede message
? The alerting-consistency enhancement document doesn't say whether message
is still needed or not, so if you don't know the answer, we can ask the document's author for clarification or examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I used the cluster-etcd-operator as an example, they did not have 'message'. I will look into the document and ask if need be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation mentions this
Documentation Required
1. The name of the alerting rule should clearly identify the component impacted by the issue (for example etcdInsufficientMembers instead of InsufficientMembers, MachineConfigDaemonDrainError instead of MCDDrainError). It should camel case, without whitespace, starting with a capital letter. The first part of the alert name should be the same for all alerts originating from the same component.
2. Alerting rules should have a "severity" label whose value is either info, warning or critical (matching what we have today and staying aside from the discussion whether we want minor or not).
3. Alerting rules should have a description annotation providing details about what is happening and how to resolve the issue.
4. Alerting rules should have a summary annotation providing a high-level description (similar to the first line of a commit message or email subject).
5. If there's a runbook in https://github.com/openshift/runbooks, it should be linked in the runbook_url annotation.
So it looks like summary
& description
are mandatory. I will ask about message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the #forum-monitoring "it hasn't been superseded, but there was a consensus upstream to use summary and description in favor of message in order to have multiple level of abstraction" this is what upstream recommends: https://github.com/monitoring-mixins/docs#guidelines-for-alert-names-labels-and-annotations. So I think we can drop the message annotation.
/lgtm |
@Miciah: This pull request references Bugzilla bug 1992555, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: lmzuccarelli, Miciah The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@lmzuccarelli: All pull requests linked via external trackers have merged: Bugzilla bug 1992555 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This PR addresses the BZ https://bugzilla.redhat.com/show_bug.cgi?id=1992555