Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test/e2e/upgrade/alert: Allow AggregatedAPIDown to unblock 4.7->4.8 CI #26220

Conversation

wking
Copy link
Member

@wking wking commented Jun 10, 2021

We're getting:

alert AggregatedAPIDown fired for 210 seconds with labels: {name="v1beta1.metrics.k8s.io", namespace="default", severity="warning"}

and such pretty consistently in those jobs. Tracked in rhbz#1970624. Until that gets fixed, ignore the alert, so we are more likely to notice other breakage.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 10, 2021

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: wking
To complete the pull request process, please assign bparees after the PR has been reviewed.
You can assign the PR to them by writing /assign @bparees in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot requested review from bparees and mfojtik June 10, 2021 21:07
@@ -143,7 +143,7 @@ func (t *UpgradeTest) Test(f *framework.Framework, done <-chan struct{}, upgrade
// Invariant: No non-info level alerts should have fired during the upgrade
firingAlertQuery := fmt.Sprintf(`
sort_desc(
count_over_time(ALERTS{alertstate="firing",severity!="info",alertname!~"Watchdog|AlertmanagerReceiversNotConfigured"}[%[1]s:1s])
count_over_time(ALERTS{alertstate="firing",severity!="info",alertname!~"Watchdog|AlertmanagerReceiversNotConfigured|AggregatedAPIDown"}[%[1]s:1s])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make this a known violation rather than an exclusion, and add more label selectors

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in a965a20 -> ea88b65. Would be good to tighten this down with minor-update Matches, which I'm still working on, but I dunno if we need to block on that.

@wking wking force-pushed the exception-for-AggregatedAPIDown-in-update-jobs branch from a965a20 to ea88b65 Compare June 10, 2021 21:44
We're getting:

  alert AggregatedAPIDown fired for 210 seconds with labels: {name="v1beta1.metrics.k8s.io", namespace="default", severity="warning"}

and such pretty consistently in those jobs.  Tracked in [1].  Until
that gets fixed, ignore the alert, so we are more likely to notice
other breakage.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1970624
@wking wking force-pushed the exception-for-AggregatedAPIDown-in-update-jobs branch from ea88b65 to c26cfb5 Compare June 10, 2021 22:09
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 11, 2021

@wking: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-aws-disruptive ea88b65 link /test e2e-aws-disruptive
ci/prow/e2e-gcp-disruptive c26cfb5 link /test e2e-gcp-disruptive
ci/prow/e2e-aws-csi c26cfb5 link /test e2e-aws-csi
ci/prow/e2e-gcp-upgrade c26cfb5 link /test e2e-gcp-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@smarterclayton smarterclayton merged commit be813ed into openshift:master Jun 11, 2021
@smarterclayton smarterclayton added approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged. labels Jun 11, 2021
@wking wking deleted the exception-for-AggregatedAPIDown-in-update-jobs branch June 11, 2021 20:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants