Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test/extended/prometheus: better check for firing alerts #24005

Merged
merged 1 commit into from Oct 28, 2019

Conversation

paulfantom
Copy link
Contributor

Revert #23995 and improve reporting of which alerts are firing.

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Oct 23, 2019
@openshift-ci-robot openshift-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Oct 23, 2019
@brancz
Copy link
Contributor

brancz commented Oct 23, 2019

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 23, 2019
@brancz
Copy link
Contributor

brancz commented Oct 23, 2019

/hold

Putting on hold as WIP, but feel free to remove.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 23, 2019
@paulfantom
Copy link
Contributor Author

flaky infra

/retest

@paulfantom paulfantom changed the title [WIP] test/extended/prometheus: sum alerts by name when getting all alerts test/extended/prometheus: sum alerts by name when getting all alerts Oct 23, 2019
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 23, 2019
Copy link
Member

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@paulfantom paulfantom changed the title test/extended/prometheus: sum alerts by name when getting all alerts test/extended/prometheus: better check for firing alerts Oct 23, 2019
@paulfantom
Copy link
Contributor Author

/test e2e-aws

@soltysh
Copy link
Member

soltysh commented Oct 23, 2019

/retest

1 similar comment
@tnozicka
Copy link
Contributor

/retest

// Checking for specific alert is done in "should have a Watchdog alert in firing state".
`ALERTS{alertstate="firing"}`: {metricTest{greaterThanEqual: false, value: 2}},
// Checking Watchdog alert state is done in "should have a Watchdog alert in firing state".
`ALERTS{alertname!="Watchdog",alertstate="firing"}`: {metricTest{greaterThanEqual: true, value: 1}},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{greaterThanEqual: true, value: 1} will this output success on 1? I thought if there is 1 for particular alert record it should fail, I don't know much about metrics though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like the alerts outputted for me are:

ALERTS{alertname="UsingDeprecatedAPIExtensionsV1Beta1",alertstate="firing",client="cluster-policy-controller/v0.0.0 (linux/amd64) kubernetes/$Format",code="0",component="apiserver",contentType="application/vnd.kubernetes.protobuf;stream=watch",endpoint="https",group="extensions",instance="10.0.138.54:6443",job="apiserver",namespace="default",resource="daemonsets",scope="cluster",service="kubernetes",severity="warning",verb="WATCH",version="v1beta1"}
| 1 @1571823124.618   1 @1571823154.618   1 @1571823184.618   1 @1571823214.618  ....|

@openshift-ci-robot openshift-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed lgtm Indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Oct 24, 2019
@paulfantom
Copy link
Contributor Author

I had to extend test framework to be able to fail if any result is returned. This is necessary as ALERTS{} metrics are dynamically created only when alert is fired.

I think what we have here now is quite unreadable and some features aren't even used. @s-urbaniak @brancz wdyt about refactoring our tests to just execute promQL and only expect if there is anything returned or not? So instead of doing:

`ALERTS{alertname!="Watchdog",alertstate="firing"}`: {metricTest{greaterThanEqual: true, value: 1, nodata: true}},

We could just forward ALERTS{alertname!="Watchdog",alertstate="firing"} >= 1 and fail if anything is returned.

@paulfantom
Copy link
Contributor Author

flakes...

/retest

@tnozicka
Copy link
Contributor

I think the CI cluster is broken, second time connect to base-4-3-rhel8.ocp.svc port 80: No route to host

@brancz
Copy link
Contributor

brancz commented Oct 24, 2019

We just saw this being fixed on other builds, so retrying.

/retest

@paulfantom
Copy link
Contributor Author

/retest

@tnozicka
Copy link
Contributor

the borked pods (with SDN) were deleted few minutes back and should be working now

@paulfantom
Copy link
Contributor Author

let's try again, but it seems to be broken

/retest

@paulfantom
Copy link
Contributor Author

/test images

…lerts

This extends test framework by adding a way to expect no metrics being
returned. Additionally it should improve testing if no alerts are
firing, apart from Watchdog.
@paulfantom
Copy link
Contributor Author

/retest

1 similar comment
@paulfantom
Copy link
Contributor Author

/retest

Copy link

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 28, 2019
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: brancz, bwplotka, paulfantom, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

if tcs[j].nodata && len(metrics) == 0 {
tcs[j].success = true
break
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you quickly elaborate why this is needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a check for when no metrics were reported and that was expected by us.

@paulfantom
Copy link
Contributor Author

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 28, 2019
@openshift-merge-robot openshift-merge-robot merged commit 2b4c6af into openshift:master Oct 28, 2019
@paulfantom paulfantom deleted the revert_23995 branch October 28, 2019 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants