New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changing the severity of "missing runbook_url annotation for critical alerts" test case from flaky to failure #27987
Changing the severity of "missing runbook_url annotation for critical alerts" test case from flaky to failure #27987
Conversation
should the check for invalid |
/retest |
violations := sets.NewString() | ||
runbook := string(alert.Annotations["runbook_url"]) | ||
|
||
if runbook != "" { | ||
// If there's a 'runbook_url' annotation, make sure it's a | ||
// valid URL and that we can fetch the contents. | ||
if err := helper.ValidateURL(runbook, 10*time.Second); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would separate the test to perform 1) the semantic validation of the URL (e.g. it can be parsed by url.Parse() and the scheme is "http" or "https") and 2) the actual verification that the URL returns a 200 response code. The former can fail the test if it detects an invalid URL and the latter should only flake as the target might be temporarily unavailable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The helper.ValidateURL
function uses url.Parse(rawURL)
and returns error if it fails to parse.
can i add check for "https" scheme in the same validation function?
https://github.com/openshift/origin/blob/master/test/extended/util/prometheus/helpers.go#L355
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can i add check for "https" scheme in the same validation function?
yes and I would split the actual request to the URL into another function (e.g. QueryURL()).
03bd912
to
a9b3aa7
Compare
/retest |
return err | ||
} | ||
|
||
if u.Scheme != "https" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"http" should be fine too.
} | ||
|
||
if u.Scheme != "https" { | ||
errstr := fmt.Sprintf("ValidateURL(%q) URL scheme is not https", rawURL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
errstr := fmt.Sprintf("ValidateURL(%q) URL scheme is not https", rawURL) | |
errstr := fmt.Sprintf("%q: URL scheme is invalid, it should be 'http' or 'https'", rawURL) |
return fmt.Errorf(errstr) | ||
} | ||
if u.Host == "" { | ||
errstr := fmt.Sprintf("ValidateURL(%q) URL does not have a host part", rawURL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
errstr := fmt.Sprintf("ValidateURL(%q) URL does not have a host part", rawURL) | |
errstr := fmt.Sprintf("%q: host should not be empty", rawURL) |
|
||
if u.Scheme != "https" { | ||
errstr := fmt.Sprintf("ValidateURL(%q) URL scheme is not https", rawURL) | ||
framework.Logf(errstr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to log it here since it's already done by the caller?
} | ||
if u.Host == "" { | ||
errstr := fmt.Sprintf("ValidateURL(%q) URL does not have a host part", rawURL) | ||
framework.Logf(errstr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question here
@@ -174,8 +253,6 @@ var _ = g.Describe("[sig-instrumentation][Late] OpenShift alerting rules [apigro | |||
}) | |||
|
|||
if err != nil { | |||
// We are still gathering data on how many alerts need to | |||
// be fixed, so this is marked as a flake for now. | |||
testresult.Flakef(err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testresult.Flakef(err.Error()) | |
// We can't fail the test because the runbook URLs might be temporarily unavailable. | |
// At least we can manually check the CI logs to identify buggy URLs. | |
testresult.Flakef(err.Error()) |
} | ||
}) | ||
|
||
g.It("should have valid runbook URLs", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
g.It("should have valid runbook URLs", func() { | |
g.It("should link to a valid URL if the runbook_url annotation is defined", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) to be more specific and distinguish from the next test
g.It("should have valid runbook URLs", func() { | |
g.It("should link to an HTTP(S) location if the runbook_url annotation is defined", func() { |
violations := sets.NewString() | ||
runbook_url := string(alert.Annotations["runbook_url"]) | ||
|
||
if runbook_url != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) easier to read.
if runbook_url != "" { | |
if runbook_url == "" { | |
return nil | |
} |
// If there's a 'runbook_url' annotation, make sure that it is a valid URL | ||
if err := helper.ValidateURL(runbook_url); err != nil { | ||
violations.Insert( | ||
fmt.Sprintf("WARNING: Alert %q has an 'runbook_url' annotation: %q which is not valid. Error: %v", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ForEachAlertingRule() always include the name of the alert.
fmt.Sprintf("WARNING: Alert %q has an 'runbook_url' annotation: %q which is not valid. Error: %v", | |
fmt.Sprintf("has an 'runbook_url' annotation which is not valid: %v", |
violations.Insert( | ||
fmt.Sprintf("WARNING: Alert %q has an invalid 'runbook_url' annotation: %v", | ||
alert.Name, err), | ||
fmt.Sprintf("WARNING: Alert %q has an 'runbook_url' annotation: %q which cannot be fetched. Error: %v", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment about the alert name already being included
fmt.Sprintf("WARNING: Alert %q has an 'runbook_url' annotation: %q which cannot be fetched. Error: %v", | |
fmt.Sprintf("has a runbook URL which cannot be fetched: %v", |
a9b3aa7
to
fd08894
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one last suggestion
} | ||
}) | ||
|
||
g.It("should have valid runbook URLs", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) to be more specific and distinguish from the next test
g.It("should have valid runbook URLs", func() { | |
g.It("should link to an HTTP(S) location if the runbook_url annotation is defined", func() { |
fd08894
to
34c3ae0
Compare
@vimalk78 you need to regenerate some files to fix |
… alerts" test case from flaky to failure The following critical alerts do not have have runbook_url annotation added as exceptions: - OVNKubernetesNorthboundDatabaseClusterIDError - OVNKubernetesSouthboundDatabaseClusterIDError - OVNKubernetesNorthboundDatabaseLeaderError - OVNKubernetesSouthboundDatabaseLeaderError - OVNKubernetesNorthboundDatabaseMultipleLeadersError - OVNKubernetesSouthboundDatabaseMultipleLeadersError - OVNKubernetesNorthdInactive - KubeSchedulerDown - MultipleDefaultStorageClasses - MachineAPIOperatorMetricsCollectionFailing - MCDRebootError - ExtremelyHighIndividualControlPlaneMemory - HAProxyDown - ClusterOperatorDown - ClusterVersionOperatorDown The check for runbook_url validity has been separated out into two new tests - test which checks for URL validity. This fails the test if URL is invalid - test which fetches the URL to check its existance. This check makes test flaky, and does not fail the test if a URL cannot be fetched.
34c3ae0
to
681db23
Compare
/skip |
/retest-required |
@vimalk78: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: simonpasquier, vimalk78 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The following critical alerts do not have have runbook_url annotation added as exceptions:
The check for runbook_url validity has been separated out into a new flaky test