New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
install/0000_90_cluster-version-operator_02_servicemonitor: Alert summary/descriptions #547
install/0000_90_cluster-version-operator_02_servicemonitor: Alert summary/descriptions #547
Conversation
CC @openshift/sre-alert-sme, in case you want to review the proposed changes. |
install/0000_90_cluster-version-operator_02_servicemonitor.yaml
Outdated
Show resolved
Hide resolved
expr: | | ||
max by (name, condition, endpoint) (cluster_operator_conditions{name="version", condition="Upgradeable", endpoint="metrics"} == 0) | ||
for: 60m | ||
labels: | ||
severity: warning | ||
- alert: ClusterOperatorDown | ||
annotations: | ||
message: Cluster operator {{ "{{ $labels.name }}" }} has not been available for 10 minutes. Operator may be down or disabled, cluster will not be kept up to date and upgrades will not be possible. | ||
summary: Cluster operator has not been available for 10 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is inconsistency here with
Cluster version operator
vs Cluster operator
https://github.com/openshift/cluster-version-operator/pull/547/files#diff-fabad9e1d73a4f70c3d47836ed62e1982b1c6fbb947fce9a633b9cb0a98ecb24R43
I would suggest updating all refs to use Cluster version operator
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ClusterVersionOperatorDown
is only about the cluster-version operator deployment, so Cluster version operator
is appropriate there.
ClusterOperatorDown
and ClusterOperatorDegraded
are about about the conditions on ClusterVersion and ClusterOperators, so the more generic Cluster operator
is appropriate there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/hold for SRE folks approval
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dofinn, LalatenduMohanty, wking The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold cancel |
/retest |
/hold No bug, so we need to wait until 4.8 forks off and master opens for 4.9. |
@wking: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
…mary/descriptions Catching up with upstream's move from 'message', as documented in [1]. [1]: https://github.com/monitoring-mixins/docs/pull/10/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R183
Rebased onto master now that master is open for 4.9. /hold cancel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dofinn, LalatenduMohanty, wking The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/override ci/prow/e2e-agnostic |
@wking: Overrode contexts on behalf of wking: ci/prow/e2e-agnostic In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/override ci/prow/e2e-agnostic-upgrade |
@wking: Overrode contexts on behalf of wking: ci/prow/e2e-agnostic-upgrade In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
gather-audit-logs flake is unrelated: /override ci/prow/e2e-agnostic-operator |
@wking: Overrode contexts on behalf of wking: ci/prow/e2e-agnostic-operator In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/override ci/prow/e2e-agnostic-upgrade |
@wking: Overrode contexts on behalf of wking: ci/prow/e2e-agnostic-upgrade In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/override ci/prow/e2e-agnostic We've passed that one before, and this change is unrelated to other recent master changes. |
@wking: Overrode contexts on behalf of wking: ci/prow/e2e-agnostic In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Catching up with upstream's move from
message
, as documented here.