Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1992555: Comply with Openshift alerting guidelines #288

Merged
merged 1 commit into from
Aug 19, 2021

Conversation

lmzuccarelli
Copy link
Contributor

@lmzuccarelli lmzuccarelli changed the title Comply with Openshift alerting guidelines Bug 1992555: Comply with Openshift alerting guidelines Aug 16, 2021
@openshift-ci openshift-ci bot added the bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. label Aug 16, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 16, 2021

@lmzuccarelli: This pull request references Bugzilla bug 1992555, which is invalid:

  • expected the bug to target the "4.9.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1992555: Comply with Openshift alerting guidelines

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Aug 16, 2021
@lmzuccarelli
Copy link
Contributor Author

/bugzilla refresh

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 17, 2021

@lmzuccarelli: An error was encountered querying GitHub for users with public email (hongli@redhat.com) for bug 1992555 on the Bugzilla server at https://bugzilla.redhat.com. No known errors were detected, please see the full error message for details.

Full error message. non-200 OK status code: 403 Forbidden body: "{\n \"documentation_url\": \"https://docs.github.com/en/free-pro-team@latest/rest/overview/resources-in-the-rest-api#secondary-rate-limits\",\n \"message\": \"You have exceeded a secondary rate limit. Please wait a few minutes before you try again.\"\n}\n"

Please contact an administrator to resolve this issue, then request a bug refresh with /bugzilla refresh.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@lmzuccarelli
Copy link
Contributor Author

/bugzilla refresh

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 17, 2021

@lmzuccarelli: An error was encountered querying GitHub for users with public email (hongli@redhat.com) for bug 1992555 on the Bugzilla server at https://bugzilla.redhat.com. No known errors were detected, please see the full error message for details.

Full error message. non-200 OK status code: 403 Forbidden body: "{\n \"documentation_url\": \"https://docs.github.com/en/free-pro-team@latest/rest/overview/resources-in-the-rest-api#secondary-rate-limits\",\n \"message\": \"You have exceeded a secondary rate limit. Please wait a few minutes before you try again.\"\n}\n"

Please contact an administrator to resolve this issue, then request a bug refresh with /bugzilla refresh.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Comment on lines -22 to +23
message: "{{ $value }} CoreDNS panics observed on {{ $labels.instance }}"
summary: CoreDNS panic
description: "{{ $value }} CoreDNS panics observed on {{ $labels.instance }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does description supersede message? The alerting-consistency enhancement document doesn't say whether message is still needed or not, so if you don't know the answer, we can ask the document's author for clarification or examples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I used the cluster-etcd-operator as an example, they did not have 'message'. I will look into the document and ask if need be.

Copy link
Contributor Author

@lmzuccarelli lmzuccarelli Aug 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation mentions this

Documentation Required

1. The name of the alerting rule should clearly identify the component impacted by the issue (for example etcdInsufficientMembers instead of InsufficientMembers, MachineConfigDaemonDrainError instead of MCDDrainError). It should camel case, without whitespace, starting with a capital letter. The first part of the alert name should be the same for all alerts originating from the same component.
2. Alerting rules should have a "severity" label whose value is either info, warning or critical (matching what we have today and staying aside from the discussion whether we want minor or not).
3. Alerting rules should have a description annotation providing details about what is happening and how to resolve the issue.
4. Alerting rules should have a summary annotation providing a high-level description (similar to the first line of a commit message or email subject).
5. If there's a runbook in https://github.com/openshift/runbooks, it should be linked in the runbook_url annotation.

So it looks like summary & description are mandatory. I will ask about message

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the #forum-monitoring "it hasn't been superseded, but there was a consensus upstream to use summary and description in favor of message in order to have multiple level of abstraction" this is what upstream recommends: https://github.com/monitoring-mixins/docs#guidelines-for-alert-names-labels-and-annotations. So I think we can drop the message annotation.

@Miciah
Copy link
Contributor

Miciah commented Aug 19, 2021

/lgtm
/bugzilla refresh

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 19, 2021

@Miciah: This pull request references Bugzilla bug 1992555, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.9.0) matches configured target release for branch (4.9.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @lihongan

In response to this:

/lgtm
/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Aug 19, 2021
@openshift-ci openshift-ci bot requested a review from lihongan August 19, 2021 18:14
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 19, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 19, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lmzuccarelli, Miciah

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 19, 2021
@openshift-merge-robot openshift-merge-robot merged commit 083d37c into openshift:master Aug 19, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 19, 2021

@lmzuccarelli: All pull requests linked via external trackers have merged:

Bugzilla bug 1992555 has been moved to the MODIFIED state.

In response to this:

Bug 1992555: Comply with Openshift alerting guidelines

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants