Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow TechPreviewNoUpgrade alerts when running on a TechPreview cluster #26393

Merged
merged 1 commit into from Aug 17, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
38 changes: 36 additions & 2 deletions test/extended/prometheus/prometheus.go
Expand Up @@ -19,6 +19,8 @@ import (

v1 "k8s.io/api/core/v1"

configv1 "github.com/openshift/api/config/v1"

kapierrs "k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/util/sets"
Expand Down Expand Up @@ -105,6 +107,14 @@ var _ = g.Describe("[sig-instrumentation][Late] Alerts", func() {
},
}

if isTechPreviewCluster(oc) {
allowedFiringAlerts = append(allowedFiringAlerts, helper.MetricCondition{
Selector: map[string]string{"alertname": "TechPreviewNoUpgrade"},
Text: "Allow testing of TechPreviewNoUpgrade clusters, this will only fire when a FeatureGate has been installed",
},
)
}

pendingAlertsWithBugs := helper.MetricConditions{}
allowedPendingAlerts := helper.MetricConditions{
{
Expand Down Expand Up @@ -503,9 +513,21 @@ var _ = g.Describe("[sig-instrumentation] Prometheus", func() {
oc.AdminKubeClient().CoreV1().Pods(ns).Delete(context.Background(), execPod.Name, *metav1.NewDeleteOptions(1))
}()

// Checking Watchdog alert state is done in "should have a Watchdog alert in firing state".
allowedAlertNames := []string{
"Watchdog",
"AlertmanagerReceiversNotConfigured",
"PrometheusRemoteWriteDesiredShards",
}

if isTechPreviewCluster(oc) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this and allowedFiringAlerts above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are two separate test cases that cover slightly different things as far as I can tell. The first excludes these two alerts from its check while this one checks they are firing when they're supposed to IIUC

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, so it's not allowedAlertNames, it's requiredAlertNames? if so, can I get an update?

Copy link
Contributor Author

@JoelSpeed JoelSpeed Aug 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, I got that wrong, the difference is one is in the normal tests and one is in the late tests. They both check that alerts aren't firing apart from some allow list, so the allow is the correct name.

@dgrisonnet Do you know what the concrete difference is between this test shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early] and the late shouldn't report any alerts in firing or pending state apart from Watchdog and AlertmanagerReceiversNotConfigured and have no gaps in Watchdog firing?

Edit: Or the motivation for having both?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Early one makes sure that any alerts that are supposed to fire instantly, during the cluster initialization are disregarded, whereas the other one disregards alerts that may fire during the e2e tests.
In your case, I suppose that TechPreviewNoUpgrade fires instantly if there are any tech preview features enabled in the cluster right? If so, then it is correct to allow the alert in both cases since it can fire at all times.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep that corrects, it fires instantly and continuously throughout the cluster life as soon as the cluster comes up

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything is good then 👍

// On a TechPreviewNoUpgrade cluster we must ignore the TechPreviewNoUpgrade alert
// fired by the Kube API Operator. This alert is expected in this case.
allowedAlertNames = append(allowedAlertNames, "TechPreviewNoUpgrade")
}

tests := map[string]bool{
// Checking Watchdog alert state is done in "should have a Watchdog alert in firing state".
`ALERTS{alertname!~"Watchdog|AlertmanagerReceiversNotConfigured|PrometheusRemoteWriteDesiredShards",alertstate="firing",severity!="info"} >= 1`: false,
fmt.Sprintf(`ALERTS{alertname!~"%s",alertstate="firing",severity!="info"} >= 1`, strings.Join(allowedAlertNames, "|")): false,
}
err := helper.RunQueries(tests, oc, ns, execPod.Name, url, bearerToken)
o.Expect(err).NotTo(o.HaveOccurred())
Expand Down Expand Up @@ -804,3 +826,15 @@ func hasPullSecret(client clientset.Interface, name string) bool {
}
return len(ps.Auths[name].Auth) > 0
}

func isTechPreviewCluster(oc *exutil.CLI) bool {
featureGate, err := oc.AdminConfigClient().ConfigV1().FeatureGates().Get(context.Background(), "cluster", metav1.GetOptions{})
if err != nil {
if kapierrs.IsNotFound(err) {
return false
}
e2e.Failf("could not retrieve feature-gate: %v", err)
}

return featureGate.Spec.FeatureSet == configv1.TechPreviewNoUpgrade
}