Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: Move the prometheus alerts test to the beginning of the suite #24499

Merged

Conversation

smarterclayton
Copy link
Contributor

Otherwise some tests trigger temporary alerts

Otherwise some tests trigger temporary alerts
@openshift-ci-robot openshift-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Feb 5, 2020
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: smarterclayton

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 5, 2020
@smarterclayton smarterclayton added the lgtm Indicates that a PR is ready to be merged. label Feb 5, 2020
@smarterclayton
Copy link
Contributor Author

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

Copy link
Contributor

@lilic lilic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm what is a temporary alert? If it’s valid to be temporary we should just include it in the list of excluded alerts we check against.

I think alert check should be at the end and do a rate, I already am working on that. Because we want to make sure no alerts are firing throughout the duration of the cluster being up. This is because some might be in pending state and this won’t catch them.

@paulfantom
Copy link
Contributor

/hold

Testing for firing alerts shouldn't be done at the beginning of the test suite. Doing this can lead to shipping OpenShift installation with firing alerts.
If there are any "temporary" alerts, those should either have:

  • extended for clause, if alert quickly goes away
  • info severity, if we want it to be surfaced only in dashboard
  • include alert name in test expression to exclude that alert

Second solution would need tweaking test expression to allow info alerts firing.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 5, 2020
@lilic
Copy link
Contributor

lilic commented Feb 5, 2020

Note that I already started working on end of run tests for alerts checking. #24492 it detects correctly the OLM alert that is firing every time, whereas we have been seeing flakes which is why I suspect this PR was opened. ( Note that the PR is in WIP. Feedback welcome. :)

@paulfantom
Copy link
Contributor

Excluding info alerts is done in #24500

@smarterclayton smarterclayton added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Feb 7, 2020
@smarterclayton
Copy link
Contributor Author

Going to merge this now while we sort out how to suppress temporary alerts during runs in serial, disruptive, and upgrade tests (which will always generate some number of fired alerts).

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

8 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit a7570dd into openshift:master Feb 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants