-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert "test/extended/prometheus: Validate alerting rules" #26499
Revert "test/extended/prometheus: Validate alerting rules" #26499
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: stbenjam The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Manually merging revert because of permafails across numerous jobs (to be listed by Stephen). Before being un-reverted, the permafails must be eliminated. Same as product code, we are biased toward revert instead of fixes. This is the third such revert in the past week. |
I haven't looked at all the failures yet, but I think it would have been better to update the exception list rather than reverting the whole test since this test comes with one. |
The unrevert can add an exception list and get into a working condition before merging. With large areas of CI stuck, the priority is getting back to a functional state. The unrevert+exceptions can come in another PR that gets the exceptions correct before merging.
I don't see this as the case. If you want to make an unenforcing test to find offenders first, that's a very good idea. If you want to run all the jobs before merging, that's another approach. There is no reason to believe this cannot be done as PR without sacrificing payload promotion and entire platforms of CI jobs. |
Looking at the job failures, they all relate to missing annotations (either description or summary or both). We would have added exceptions if we knew about these failures but unfortunately they weren't exercised in #26476. @deads2k is there any way this could have been prevented? For OVN
For vSphere
|
Yes. You could create a non-failing test to gather data. Or you could manually run vsphere and ovn jobs in the origin repo. |
Ack, we didn't anticipated the side effect and we were too confident about the signal we got from the CI on the particular PR. Lesson learned for the future... |
What's the best way to make a non-failing test? Marking it as a flake means it shows as passing, right? Do we do that, then just manually inspect the logs for runs over the next few days? |
@bison my understanding is that we should use origin/test/extended/prometheus/prometheus.go Line 208 in e000efe
|
Yeah, that makes sense. I just wasn't sure exactly how to search for flakes. Anyway, I'll get a new PR up this morning. |
The [OpenShift Alerting Consistency][1] enhancement defines a style guide for the alerts shipped as part of OpenShift. This adds a test validating some of the guidelines considered required. This was originally added in openshift#26476, but was reverted in openshift#26499 due to failures with OVN and vSphere clusters. This adds the tests back, but adds exceptions for the non-compliant alerts as well as marking the failing tests as flakes for now. We'll gather data and make the tests required once we're reasonably sure things are passing with all the existing alerts. [1]: https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md
New PR: #26504 |
The [OpenShift Alerting Consistency][1] enhancement defines a style guide for the alerts shipped as part of OpenShift. This adds a test validating some of the guidelines considered required. This was originally added in openshift#26476, but was reverted in openshift#26499 due to failures with OVN and vSphere clusters. This adds the tests back, but adds exceptions for the non-compliant alerts as well as marking the failing tests as flakes for now. We'll gather data and make the tests required once we're reasonably sure things are passing with all the existing alerts. [1]: https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md
The [OpenShift Alerting Consistency][1] enhancement defines a style guide for the alerts shipped as part of OpenShift. This adds a test validating some of the guidelines considered required. This was originally added in openshift#26476, but was reverted in openshift#26499 due to failures with OVN and vSphere clusters. This adds the tests back, but adds exceptions for the non-compliant alerts as well as marking the failing tests as flakes for now. We'll gather data and make the tests required once we're reasonably sure things are passing with all the existing alerts. [1]: https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md
Reverts #26476
Many jobs are failing - OVN, and vsphere - due to this new test. This test should be changed to flaking, or have a list of exceptions while they are being fixed.