USHIFT-741: Microshift invariant adaptations #27647

pacevedom · 2023-01-10T15:36:08Z

No description provided.

openshift-ci · 2023-01-10T15:36:27Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

pacevedom · 2023-01-10T16:30:02Z

/cc @ingvagabund

ingvagabund · 2023-01-12T10:14:07Z

pkg/synthetictests/disruption.go

@@ -33,6 +35,11 @@ func testServerAvailability(
 	backendName := fmt.Sprintf("%s-%s-connections", disruptionName, connType)
 	jobType, err := platformidentification.GetJobType(context.TODO(), restConfig)
 	if err != nil {
+		if apierrors.IsNotFound(err) {


Not found error can be in some cases a transitional error. I don't think testServerAvailability takes that into account. Nevertheless, seeing platformidentification.GetJobType as a black box, it's not safe to deduce non-existence of a certain API from a NotFoundErr. Worth considering a different approach or to refactore platformidentification.GetJobType to provide more output.

Changed it to be a bit more robust now. But could that happen? The tests only start after the cluster has been deemed ready, which means all resources and componentes are ready too: https://github.com/openshift/release/blob/master/ci-operator/step-registry/openshift/e2e/test/openshift-e2e-test-commands.sh#L294-L418

ingvagabund · 2023-01-12T10:16:07Z

pkg/synthetictests/networking.go

@@ -104,6 +93,19 @@ func testPodSandboxCreation(events monitorapi.Intervals, clientConfig *rest.Conf
 			continue
 		}
 		if strings.Contains(event.Message, "pinging container registry") && strings.Contains(event.Message, "i/o timeout") {
+			if platform == "" {


platform will be always an empty string here.

It is defined here and then grabs a value here for the next for loop iterations.

Right. It's populated at most once.

pacevedom · 2023-01-13T15:06:58Z

/retest-required

pacevedom · 2023-01-16T11:16:31Z

/retest-required

ingvagabund · 2023-01-16T11:50:15Z

pkg/synthetictests/alerts.go

 	"github.com/openshift/origin/pkg/monitor/monitorapi"
 	"github.com/openshift/origin/pkg/synthetictests/allowedalerts"
 )

 func testAlerts(events monitorapi.Intervals, restConfig *rest.Config, duration time.Duration, recordedResource *monitorapi.ResourcesMap) []*junitapi.JUnitTestCase {
 	ret := []*junitapi.JUnitTestCase{}

+	kubeClient, err := kubernetes.NewForConfig(restConfig)
+	if err != nil {
+		ret = append(ret, &junitapi.JUnitTestCase{


nit: you could return ret directly. I.e.

return []*junitapi.JUnitTestCase{ { Name: "Alert setup, kube client", FailureOutput: &junitapi.FailureOutput{ Output: err.Error(), }, SystemOut: err.Error(), }, }

pkg/synthetictests/alerts.go

ingvagabund · 2023-01-16T11:55:47Z

pkg/synthetictests/disruption.go

@@ -26,6 +26,22 @@ func testServerAvailability(

 	testName := fmt.Sprintf("[%s] %s should be available throughout the test", owner, locator)

+	skip, err := platformidentification.CanExtractPlatform(restConfig)


s/CanExtractPlatform/CanExtractJobType/ is more preferable. So it's clear why the check is done at the first place.

This change is important as well. So it's more obvious for a reader who sees the code for the first time why we check this here.

ingvagabund · 2023-01-16T12:01:41Z

pkg/synthetictests/disruption.go

+		}
+	}
+	if !skip {
+		return []*junitapi.JUnitTestCase{}


return []*junitapi.JUnitTestCase{ { Name: testName, Duration: jobRunDuration.Seconds(), FailureOutput: &junitapi.FailureOutput{ Output: fmt.Sprintf("skipping test due to missing API groups in either of clusterversions, infrastructures or networks groups"), }, }, }

Just realized the name of the skip variable is actually the opposite. Changed it.

ingvagabund · 2023-01-16T12:04:19Z

pkg/synthetictests/networking.go

@@ -104,6 +93,19 @@ func testPodSandboxCreation(events monitorapi.Intervals, clientConfig *rest.Conf
 			continue
 		}
 		if strings.Contains(event.Message, "pinging container registry") && strings.Contains(event.Message, "i/o timeout") {
+			if platform == "" {


Right. It's populated at most once.

pacevedom · 2023-01-18T08:30:05Z

/retest-required

ingvagabund · 2023-01-18T09:10:13Z

pkg/synthetictests/alerts.go

+
+	kubeClient, err := kubernetes.NewForConfig(restConfig)
+	if err != nil {
+		return &junitapi.JUnitTestCase{


This needs to be

return []*junitapi.JUnitTestCase{ { Name: "Alert setup, kube client", FailureOutput: &junitapi.FailureOutput{ Output: err.Error(), }, SystemOut: err.Error(), }, }

Oops, done.

pacevedom · 2023-01-18T19:51:14Z

/retest-required

openshift-ci · 2023-01-18T22:25:39Z

@pacevedom: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-aws-ovn-cgroupsv2	`d27e199`	link	false	`/test e2e-aws-ovn-cgroupsv2`
ci/prow/e2e-gcp-ovn-etcd-scaling	`d27e199`	link	false	`/test e2e-gcp-ovn-etcd-scaling`
ci/prow/e2e-vsphere-ovn-etcd-scaling	`d27e199`	link	false	`/test e2e-vsphere-ovn-etcd-scaling`
ci/prow/e2e-aws-ovn-upgrade	`d27e199`	link	false	`/test e2e-aws-ovn-upgrade`
ci/prow/e2e-aws-ovn-etcd-scaling	`d27e199`	link	false	`/test e2e-aws-ovn-etcd-scaling`
ci/prow/e2e-aws-csi	`d27e199`	link	false	`/test e2e-aws-csi`
ci/prow/e2e-aws-ovn-single-node-serial	`d27e199`	link	false	`/test e2e-aws-ovn-single-node-serial`
ci/prow/e2e-azure-ovn-etcd-scaling	`d27e199`	link	false	`/test e2e-azure-ovn-etcd-scaling`
ci/prow/e2e-aws-ovn-single-node-upgrade	`d27e199`	link	false	`/test e2e-aws-ovn-single-node-upgrade`
ci/prow/e2e-metal-ipi-sdn	`d27e199`	link	false	`/test e2e-metal-ipi-sdn`
ci/prow/e2e-aws-ovn-single-node	`d27e199`	link	false	`/test e2e-aws-ovn-single-node`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

ingvagabund · 2023-01-19T09:11:51Z

/lgtm

openshift-ci · 2023-01-19T09:14:47Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ingvagabund, pacevedom
Once this PR has been reviewed and has the lgtm label, please assign bparees for approval by writing /assign @bparees in a comment. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

pkg/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

deads2k · 2023-01-23T22:43:51Z

pkg/synthetictests/alerts.go

+	if err != nil {
+		return []*junitapi.JUnitTestCase{
+			{
+				Name: "Alert setup, kube client",


This results in an "only failed" test. it needs to set to "pass" in the majority of cases so that aggregation works properly. I'm surprised we don't have clients we can pass in here.

deads2k · 2023-01-23T22:47:46Z

pkg/synthetictests/disruption.go

@@ -26,6 +26,22 @@ func testServerAvailability(

 	testName := fmt.Sprintf("[%s] %s should be available throughout the test", owner, locator)

+	canDetermineJobType, err := platformidentification.CanExtractJobType(restConfig)
+	if err != nil {
+		return []*junitapi.JUnitTestCase{


same "only failure" test here.

deads2k · 2023-01-23T22:48:29Z

pkg/synthetictests/networking.go

@@ -104,6 +93,19 @@ func testPodSandboxCreation(events monitorapi.Intervals, clientConfig *rest.Conf
 			continue
 		}
 		if strings.Contains(event.Message, "pinging container registry") && strings.Contains(event.Message, "i/o timeout") {
+			if platform == "" {


if len(platform) == 0 is kube and openshift canonical

deads2k · 2023-01-23T22:51:10Z

pkg/synthetictests/networking.go

+						platform = infra.Status.PlatformStatus.Type
+					}
+				}
+			}
 			if platform == v1.AzurePlatformType {


I think you may be better off always failing if you cannot determine the platform, but this amounts to the same thing with a bad message if TRT can stomach the factorization.

openshift-bot · 2023-04-24T01:00:38Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-merge-robot · 2023-04-24T01:00:48Z

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-bot · 2023-05-24T08:30:46Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

chiragkyal · 2023-06-21T06:13:44Z

/lifecycle frozen

openshift-ci · 2023-06-21T06:13:48Z

@chiragkyal: The lifecycle/frozen label cannot be applied to Pull Requests.

In response to this:

/lifecycle frozen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

swghosh · 2023-06-21T14:03:10Z

/remove-lifecycle rotten

pacevedom · 2023-08-09T16:04:28Z

Included in #28136

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 10, 2023

pacevedom changed the title ~~USHIFT-741: Microshift invariant adaptations~~ USHIFT-741: Microshift alert invariant adaptations Jan 10, 2023

pacevedom marked this pull request as ready for review January 10, 2023 16:17

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 10, 2023

openshift-ci bot requested review from bparees and jwforres January 10, 2023 16:18

openshift-ci bot requested a review from ingvagabund January 10, 2023 16:30

pacevedom force-pushed the USHIFT-741 branch from 2ed4076 to 37d53b9 Compare January 11, 2023 09:35

pacevedom changed the title ~~USHIFT-741: Microshift alert invariant adaptations~~ USHIFT-741: Microshift invariant adaptations Jan 11, 2023

pacevedom force-pushed the USHIFT-741 branch from f3f04a4 to 4687e8e Compare January 11, 2023 12:05

ingvagabund reviewed Jan 12, 2023

View reviewed changes

pacevedom force-pushed the USHIFT-741 branch from 4687e8e to ebe5569 Compare January 13, 2023 10:03

ingvagabund suggested changes Jan 16, 2023

View reviewed changes

ushift: skip alert invariants if there is no monitoring

08c19d5

pacevedom force-pushed the USHIFT-741 branch from ebe5569 to 0599b50 Compare January 16, 2023 13:25

ushift: skip disruption invariants

83d0cc4

pacevedom force-pushed the USHIFT-741 branch from 0599b50 to 183dccf Compare January 16, 2023 16:56

ingvagabund reviewed Jan 18, 2023

View reviewed changes

ushift: adjust sandboxes synthetic test for azure and OCP platforms

d27e199

pacevedom force-pushed the USHIFT-741 branch from 183dccf to d27e199 Compare January 18, 2023 16:58

openshift-ci bot assigned ingvagabund Jan 19, 2023

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 19, 2023

deads2k reviewed Jan 23, 2023

View reviewed changes

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 24, 2023

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 24, 2023

openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 24, 2023

openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jun 21, 2023

pacevedom closed this Aug 9, 2023

		@@ -26,6 +26,22 @@ func testServerAvailability(

		testName := fmt.Sprintf("[%s] %s should be available throughout the test", owner, locator)

		skip, err := platformidentification.CanExtractPlatform(restConfig)

USHIFT-741: Microshift invariant adaptations #27647

USHIFT-741: Microshift invariant adaptations #27647

Conversation

pacevedom commented Jan 10, 2023

openshift-ci bot commented Jan 10, 2023

pacevedom commented Jan 10, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pacevedom commented Jan 13, 2023

pacevedom commented Jan 16, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pacevedom commented Jan 18, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pacevedom commented Jan 18, 2023

openshift-ci bot commented Jan 18, 2023

ingvagabund commented Jan 19, 2023

openshift-ci bot commented Jan 19, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openshift-bot commented Apr 24, 2023

openshift-merge-robot commented Apr 24, 2023

openshift-bot commented May 24, 2023

chiragkyal commented Jun 21, 2023

openshift-ci bot commented Jun 21, 2023

swghosh commented Jun 21, 2023

pacevedom commented Aug 9, 2023