New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1795617: Remove entirely deprecated alerts #741
Bug 1795617: Remove entirely deprecated alerts #741
Conversation
@soltysh: This pull request references Bugzilla bug 1793850, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@soltysh: This pull request references Bugzilla bug 1795617, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
apiserver_request_total{ | ||
group="extensions", | ||
version="v1beta1", | ||
resource!~"ingresses|", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to add ingress here as well? it is deprecated, just still served upstream
/approve |
pls clean the commits, modulo the ingress question it lgtm |
0b91f37
to
a6de32f
Compare
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/hold
version="v1beta1", | ||
client!~"hyperkube/.*|cluster-policy-controller/.*|velero-server/.*" | ||
}[24h] | ||
) > 0 | ||
labels: | ||
severity: warning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
severity level is hasn't changed (should be info, not warning)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, that won't change as stated before.
/hold cancel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@soltysh I'm confused as here (#730 (review)) you stated that "info" was okay, and I had updated my original PR to reflect this.
Warning alerts should be indicative of the cluster health being degraded in some way, and action is required. In the case of OSD, warning alerts are sent to SREs. This alert is not actionable in this way, and should not be at a warning level. It's a bad experience for RH SREs, and both OSD and OCP customers as it's a piece of information that they should take action before the next minor version upgrade.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like info would work just fine for all known use-cases. Can we go with that, please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that regardless of the severity level it will still show up in the main console view.
Also:
This alert is not actionable in this way, and should not be at a warning level.
Exactly why this is why I think we should reconsider this as it can induce alert fatigue, as it will just be there. Are these all components users can control and migrate away from this client?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@soltysh pinging again here - request to move to info level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@soltysh can you please set the severity of these alerts to info rather than warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this now change to info, to be in line with the master PR (https://github.com/openshift/cluster-kube-apiserver-operator/pull/742/files#diff-9cc7eba97b17dc12b44afb453dc9e69cL126) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The master PR is about different resource that is still present in Kubernetes.
These ones are already removed in the Kubernetes version we are shipping and we'll drop the hack reenabling them in the next release - so warning seems pretty warranted to me. There should be action taken to hunt down the team using it. I think most of our customer use the clusters internally and care about apps that will stop working after the next upgrade. There is also the other case of clusters like Online where you provision for outside users and don't care about what they run, so I'd just suppress the particular warning there unless there is a better way.
/retest |
/retest Please review the full test history for this PR and help us cut down flakes. |
8 similar comments
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cblecker, soltysh, tnozicka The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/bugzilla refresh |
@soltysh: This pull request references Bugzilla bug 1795617, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
skipping this one for approval this week anyway due to the number of other changes already accepted. |
@knobunc This is a low risk PR (removes a PrometheusRule), and is actively causing customer pain (unfixable, never resolving alert in all 4.3 clusters). Asking for consideration for this week's cherrypick batch per your e-mail. |
/retest Please review the full test history for this PR and help us cut down flakes. |
2 similar comments
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest |
1 similar comment
/retest |
The upgrade job does not fail, there is a problem with the test platform, the logs say it passes:
|
/retest now CI should be fixed |
/retest |
/test e2e-aws-operator-encryption |
/test e2e-aws-operator |
@soltysh: All pull requests linked via external trackers have merged. Bugzilla bug 1795617 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This picks commit from #730 and applies the same pattern to the remaining alerts.
/assign @tnozicka
/cc @lilic @cblecker