Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

USHIFT-741: Microshift invariant adaptations #27647

Closed
wants to merge 3 commits into from

Conversation

pacevedom
Copy link
Contributor

No description provided.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 10, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 10, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@pacevedom pacevedom changed the title USHIFT-741: Microshift invariant adaptations USHIFT-741: Microshift alert invariant adaptations Jan 10, 2023
@pacevedom pacevedom marked this pull request as ready for review January 10, 2023 16:17
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 10, 2023
@pacevedom
Copy link
Contributor Author

/cc @ingvagabund

@pacevedom pacevedom changed the title USHIFT-741: Microshift alert invariant adaptations USHIFT-741: Microshift invariant adaptations Jan 11, 2023
@@ -33,6 +35,11 @@ func testServerAvailability(
backendName := fmt.Sprintf("%s-%s-connections", disruptionName, connType)
jobType, err := platformidentification.GetJobType(context.TODO(), restConfig)
if err != nil {
if apierrors.IsNotFound(err) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not found error can be in some cases a transitional error. I don't think testServerAvailability takes that into account. Nevertheless, seeing platformidentification.GetJobType as a black box, it's not safe to deduce non-existence of a certain API from a NotFoundErr. Worth considering a different approach or to refactore platformidentification.GetJobType to provide more output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to be a bit more robust now. But could that happen? The tests only start after the cluster has been deemed ready, which means all resources and componentes are ready too: https://github.com/openshift/release/blob/master/ci-operator/step-registry/openshift/e2e/test/openshift-e2e-test-commands.sh#L294-L418

@@ -104,6 +93,19 @@ func testPodSandboxCreation(events monitorapi.Intervals, clientConfig *rest.Conf
continue
}
if strings.Contains(event.Message, "pinging container registry") && strings.Contains(event.Message, "i/o timeout") {
if platform == "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

platform will be always an empty string here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is defined here and then grabs a value here for the next for loop iterations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. It's populated at most once.

@pacevedom
Copy link
Contributor Author

/retest-required

1 similar comment
@pacevedom
Copy link
Contributor Author

/retest-required

"github.com/openshift/origin/pkg/monitor/monitorapi"
"github.com/openshift/origin/pkg/synthetictests/allowedalerts"
)

func testAlerts(events monitorapi.Intervals, restConfig *rest.Config, duration time.Duration, recordedResource *monitorapi.ResourcesMap) []*junitapi.JUnitTestCase {
ret := []*junitapi.JUnitTestCase{}

kubeClient, err := kubernetes.NewForConfig(restConfig)
if err != nil {
ret = append(ret, &junitapi.JUnitTestCase{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you could return ret directly. I.e.

return []*junitapi.JUnitTestCase{
	{
		Name: "Alert setup, kube client",
		FailureOutput: &junitapi.FailureOutput{
			Output: err.Error(),
		},
		SystemOut: err.Error(),
	},
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

pkg/synthetictests/alerts.go Show resolved Hide resolved
@@ -26,6 +26,22 @@ func testServerAvailability(

testName := fmt.Sprintf("[%s] %s should be available throughout the test", owner, locator)

skip, err := platformidentification.CanExtractPlatform(restConfig)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/CanExtractPlatform/CanExtractJobType/ is more preferable. So it's clear why the check is done at the first place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is important as well. So it's more obvious for a reader who sees the code for the first time why we check this here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
}
if !skip {
return []*junitapi.JUnitTestCase{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return []*junitapi.JUnitTestCase{
	{
		Name:     testName,
		Duration: jobRunDuration.Seconds(),
		FailureOutput: &junitapi.FailureOutput{
			Output: fmt.Sprintf("skipping test due to missing API groups in either of clusterversions, infrastructures or networks groups"),
		},
	},
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized the name of the skip variable is actually the opposite. Changed it.

@@ -104,6 +93,19 @@ func testPodSandboxCreation(events monitorapi.Intervals, clientConfig *rest.Conf
continue
}
if strings.Contains(event.Message, "pinging container registry") && strings.Contains(event.Message, "i/o timeout") {
if platform == "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. It's populated at most once.

@pacevedom
Copy link
Contributor Author

/retest-required


kubeClient, err := kubernetes.NewForConfig(restConfig)
if err != nil {
return &junitapi.JUnitTestCase{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be

return []*junitapi.JUnitTestCase{
	{
		Name: "Alert setup, kube client",
		FailureOutput: &junitapi.FailureOutput{
			Output: err.Error(),
		},
		SystemOut: err.Error(),
	},
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, done.

@pacevedom
Copy link
Contributor Author

/retest-required

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 18, 2023

@pacevedom: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-cgroupsv2 d27e199 link false /test e2e-aws-ovn-cgroupsv2
ci/prow/e2e-gcp-ovn-etcd-scaling d27e199 link false /test e2e-gcp-ovn-etcd-scaling
ci/prow/e2e-vsphere-ovn-etcd-scaling d27e199 link false /test e2e-vsphere-ovn-etcd-scaling
ci/prow/e2e-aws-ovn-upgrade d27e199 link false /test e2e-aws-ovn-upgrade
ci/prow/e2e-aws-ovn-etcd-scaling d27e199 link false /test e2e-aws-ovn-etcd-scaling
ci/prow/e2e-aws-csi d27e199 link false /test e2e-aws-csi
ci/prow/e2e-aws-ovn-single-node-serial d27e199 link false /test e2e-aws-ovn-single-node-serial
ci/prow/e2e-azure-ovn-etcd-scaling d27e199 link false /test e2e-azure-ovn-etcd-scaling
ci/prow/e2e-aws-ovn-single-node-upgrade d27e199 link false /test e2e-aws-ovn-single-node-upgrade
ci/prow/e2e-metal-ipi-sdn d27e199 link false /test e2e-metal-ipi-sdn
ci/prow/e2e-aws-ovn-single-node d27e199 link false /test e2e-aws-ovn-single-node

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@ingvagabund
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 19, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 19, 2023

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ingvagabund, pacevedom
Once this PR has been reviewed and has the lgtm label, please assign bparees for approval by writing /assign @bparees in a comment. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

if err != nil {
return []*junitapi.JUnitTestCase{
{
Name: "Alert setup, kube client",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This results in an "only failed" test. it needs to set to "pass" in the majority of cases so that aggregation works properly. I'm surprised we don't have clients we can pass in here.

@@ -26,6 +26,22 @@ func testServerAvailability(

testName := fmt.Sprintf("[%s] %s should be available throughout the test", owner, locator)

canDetermineJobType, err := platformidentification.CanExtractJobType(restConfig)
if err != nil {
return []*junitapi.JUnitTestCase{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same "only failure" test here.

@@ -104,6 +93,19 @@ func testPodSandboxCreation(events monitorapi.Intervals, clientConfig *rest.Conf
continue
}
if strings.Contains(event.Message, "pinging container registry") && strings.Contains(event.Message, "i/o timeout") {
if platform == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if len(platform) == 0 is kube and openshift canonical

platform = infra.Status.PlatformStatus.Type
}
}
}
if platform == v1.AzurePlatformType {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you may be better off always failing if you cannot determine the platform, but this amounts to the same thing with a bad message if TRT can stomach the factorization.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 24, 2023
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 24, 2023
@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 24, 2023
@chiragkyal
Copy link
Member

/lifecycle frozen

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 21, 2023

@chiragkyal: The lifecycle/frozen label cannot be applied to Pull Requests.

In response to this:

/lifecycle frozen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@swghosh
Copy link
Member

swghosh commented Jun 21, 2023

/remove-lifecycle rotten

@openshift-ci openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jun 21, 2023
@pacevedom
Copy link
Contributor Author

Included in #28136

@pacevedom pacevedom closed this Aug 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm Indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants