OCPBUGS-9925: Fix tests for OpenStack platform #306

mandre · 2023-12-21T07:27:39Z

There were several issues, the more serious one was a problem where cleanup would wipe all existing machinesets (the existing ones, before the tests ran) when the tests were skipped.

Moving the cleanup code to a DeferCleanup() function fixes it.

openshift-ci-robot · 2023-12-21T07:27:43Z

@mandre: This pull request references Jira Issue OCPBUGS-9925, which is invalid:

expected the bug to target either version "4.16." or "openshift-4.16.", but it targets "4.15.0" instead
expected the bug to be in one of the following states: NEW, ASSIGNED, POST, but it is MODIFIED instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

There were several issues, the more serious one was a problem where cleanup would wipe all existing machinesets (the existing ones, before the tests ran) when the tests were skipped.

Moving the cleanup code to a DeferCleanup() function fixes it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mandre · 2023-12-21T08:28:01Z

/jira refresh

openshift-ci-robot · 2023-12-21T08:28:04Z

@mandre: This pull request references Jira Issue OCPBUGS-9925, which is invalid:

expected the bug to be in one of the following states: NEW, ASSIGNED, POST, but it is MODIFIED instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mandre · 2023-12-21T08:28:40Z

/jira refresh

openshift-ci-robot · 2023-12-21T08:28:45Z

@mandre: This pull request references Jira Issue OCPBUGS-9925, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.16.0) matches configured target version for branch (4.16.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @eurijon

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mandre · 2023-12-21T13:45:23Z

The CI is finding real bugs now 😅

Fix at kubernetes-sigs/cluster-api-provider-openstack#1803

EmilienM · 2023-12-21T14:02:19Z

/lgtm

mandre · 2024-01-08T08:00:20Z

/test ci/prow/e2e-openstack-operator

openshift-ci · 2024-01-08T08:00:30Z

@mandre: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test e2e-aws-operator
/test e2e-azure-operator
/test e2e-vsphere-operator
/test images
/test lint
/test unit

The following commands are available to trigger optional jobs:

/test e2e-aws-periodic-pre
/test e2e-gcp-operator
/test e2e-gcp-periodic-pre
/test e2e-openstack-operator

Use /test all to run the following jobs that were automatically triggered:

pull-ci-openshift-cluster-api-actuator-pkg-master-e2e-aws-operator
pull-ci-openshift-cluster-api-actuator-pkg-master-e2e-azure-operator
pull-ci-openshift-cluster-api-actuator-pkg-master-e2e-gcp-operator
pull-ci-openshift-cluster-api-actuator-pkg-master-e2e-openstack-operator
pull-ci-openshift-cluster-api-actuator-pkg-master-e2e-vsphere-operator
pull-ci-openshift-cluster-api-actuator-pkg-master-images
pull-ci-openshift-cluster-api-actuator-pkg-master-lint
pull-ci-openshift-cluster-api-actuator-pkg-master-unit

In response to this:

/test ci/prow/e2e-openstack-operator

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mandre · 2024-01-08T08:05:08Z

Hey @elmiko, since you inquired about the openstack jobs, PTAL whenever you get a chance.

/assign elmiko

mandre · 2024-01-08T08:05:11Z

/test e2e-openstack-operator

mandre · 2024-01-16T08:12:59Z

/test e2e-vsphere-operator

mandre · 2024-01-16T09:34:37Z

e2e-vsphere-operator has never passed, and fails on cluster installation, which is outside of the scope of this PR.

elmiko · 2024-01-16T14:42:51Z

thanks @mandre , updates look nice to me and that is wild about the machinesets getting deleted on skips.

/approve
/lgtm

openshift-ci · 2024-01-16T14:43:58Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: elmiko

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [elmiko]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

elmiko · 2024-01-16T14:48:27Z

/override ci/prow/e2e-vsphere-operator

openshift-ci · 2024-01-16T14:49:16Z

@elmiko: Overrode contexts on behalf of elmiko: ci/prow/e2e-vsphere-operator

In response to this:

/override ci/prow/e2e-vsphere-operator

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

JoelSpeed · 2024-01-16T16:32:38Z

/hold

I want to review this

JoelSpeed

I agree in general with this change, however, I think this is in some places making us more prone to failure.

For example, if the BeforeEach fails mid way through, the AfterEach would always run anyway, and we would always clean up.

In the current scenario, with the DeferCleanup where they are, in some cases, we risk leaking resources out of each test if something breaks mid-way in the BeforeEach, which may then interfere with later test cases.

We are swinging from being overly aggressive on the clean up, to not quite aggressive enough. I think with the changes I've suggested we can get this just right though (goldilocks zone for cleanup?)

/approve cancel

JoelSpeed · 2024-01-16T16:50:30Z

pkg/infra/infra.go

+		DeferCleanup(func() {
+			if machineSet != nil {
+				By("Deleting the new MachineSet")
+				Expect(client.Delete(ctx, machineSet)).To(Succeed(), "MachineSet should be able to be deleted")
+				framework.WaitForMachineSetsDeleted(ctx, client, machineSet)
+			}
+		})
+


This still has a chance to error. machineSet is not cleared by the before each, so the machineSet could come from a previous test run and be non-nil. Will a delete call fail if the resource doesn't exist? I expect so.

Wouldn't changing this to the below make it safer?

Suggested change

DeferCleanup(func() {

if machineSet != nil {

By("Deleting the new MachineSet")

Expect(client.Delete(ctx, machineSet)).To(Succeed(), "MachineSet should be able to be deleted")

framework.WaitForMachineSetsDeleted(ctx, client, machineSet)

}

})

{

// Reset the machineSet between each test

machineSet = nil

// Make sure to clean up the machineSet, if we create one.

DeferCleanup(func() {

if machineSet != nil {

By("Deleting the new MachineSet")

Expect(client.Delete(ctx, machineSet)).To(Succeed(), "MachineSet should be able to be deleted")

framework.WaitForMachineSetsDeleted(ctx, client, machineSet)

}

}

})

JoelSpeed · 2024-01-16T16:53:10Z

pkg/infra/lifecyclehooks.go

+		DeferCleanup(func() {
+			By("Deleting the machineset")
+			cascadeDelete := metav1.DeletePropagationForeground
+			Expect(client.Delete(context.Background(), machineSet, &runtimeclient.DeleteOptions{
+				PropagationPolicy: &cascadeDelete,
+			})).To(Succeed(), "MachineSet should be able to be deleted")
+
+			By("Waiting for the MachineSet to be deleted...")
+			framework.WaitForMachineSetsDeleted(ctx, client, machineSet)
+
+			By("Deleting workload job")
+			Expect(client.Delete(context.Background(), workload, &runtimeclient.DeleteOptions{
+				PropagationPolicy: &cascadeDelete,
+			})).To(Succeed(), "Workload job should be able to be deleted")
+		})


It would be preferable to put the cleanup next to the thing that creates it, I would have split this into several cleanups so that, if something breaks mid setup, we tear down the bits we have already created

The flow should be:

Create MachineSet

Defer remove MachineSet

Wait for MachineSet

Create workload Job

Defer remove workload Job

Wait for workload job running

JoelSpeed · 2024-01-16T16:54:30Z

pkg/infra/spot.go

-		if specReport.Failed() {
-			Expect(gatherer.WithSpecReport(specReport).GatherAll()).To(Succeed(), "StateGatherer should be able to gather resources")
-		}
+		DeferCleanup(func() {


For the sake of making this maintainable, would it make sense to pair this with the line where we clear delObjects at the beginning of the before each? Do we risk otherwise not always clearing up if setup breaks mid way through?

JoelSpeed · 2024-01-16T16:57:08Z

pkg/machinehealthcheck/machinehealthcheck.go

+		DeferCleanup(func() {
+			By("Deleting the MachineHealthCheck resource")
+			Expect(client.Delete(context.Background(), machinehealthcheck)).To(Succeed(), "failed to delete MHC")
+
+			By("Deleting the new MachineSet")
+			Expect(client.Delete(context.Background(), machineSet)).To(Succeed(), "failed to delete machineSet")
+
+			framework.WaitForMachineSetsDeleted(ctx, client, machineSet)
+		})


This should come before the WaitForMachineSet else it won't run if the wait fails

As opposed to the AfterEach() function, the DeferCleanup() is not executed when the test is skipped. Previously, the cleanup in the webhook tests would wipe all the machinesets existing in the cluster on unsupported platforms (OpenStack) due to an unitialized selector, even when the tests were skipped.

mandre · 2024-01-18T13:41:23Z

Thanks for the thorough review @JoelSpeed. I believe I have addressed all your comments, PTAL.

JoelSpeed · 2024-01-18T14:03:32Z

/lgtm
/hold cancel

Thanks for fixing that up

openshift-ci · 2024-01-18T16:03:17Z

@mandre: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-openstack-operator	`d009a74`	link	false	`/test e2e-openstack-operator`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-ci-robot · 2024-01-18T16:05:57Z

@mandre: Jira Issue OCPBUGS-9925: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-9925 has been moved to the MODIFIED state.

In response to this:

There were several issues, the more serious one was a problem where cleanup would wipe all existing machinesets (the existing ones, before the tests ran) when the tests were skipped.

Moving the cleanup code to a DeferCleanup() function fixes it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

mandre · 2024-04-16T14:53:22Z

/cherry-pick release-4.15

openshift-cherrypick-robot · 2024-04-16T14:54:13Z

@mandre: new pull request created: #313

In response to this:

/cherry-pick release-4.15

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Dec 21, 2023

openshift-ci bot requested review from elmiko and racheljpg December 21, 2023 07:28

openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Dec 21, 2023

openshift-ci-robot removed the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Dec 21, 2023

openshift-ci bot requested a review from eurijon December 21, 2023 08:28

openshift-ci bot assigned EmilienM Dec 21, 2023

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 21, 2023

openshift-ci-robot mentioned this pull request Dec 21, 2023

OCPBUGS-9925: OpenStack: use compact clusters for cluster-api-actuator-pkg jobs openshift/release#47022

Merged

openshift-ci bot assigned elmiko Jan 8, 2024

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 16, 2024

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 16, 2024

JoelSpeed reviewed Jan 16, 2024

View reviewed changes

JoelSpeed removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 16, 2024

mandre added 2 commits January 18, 2024 14:38

Skip webhook test on unsupported platforms

d009a74

mandre force-pushed the fix_tests branch from 601332b to d009a74 Compare January 18, 2024 13:39

openshift-ci bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed lgtm Indicates that a PR is ready to be merged. labels Jan 18, 2024

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 18, 2024

openshift-ci bot assigned JoelSpeed Jan 18, 2024

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 18, 2024

openshift-merge-bot bot merged commit 5d1eca8 into openshift:master Jan 18, 2024
8 of 9 checks passed

mandre deleted the fix_tests branch April 11, 2024 09:40

openshift-cherrypick-robot mentioned this pull request Apr 16, 2024

[release-4.15] OCPBUGS-32310: Fix tests for OpenStack platform #313

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OCPBUGS-9925: Fix tests for OpenStack platform #306

OCPBUGS-9925: Fix tests for OpenStack platform #306

mandre commented Dec 21, 2023

openshift-ci-robot commented Dec 21, 2023

mandre commented Dec 21, 2023

openshift-ci-robot commented Dec 21, 2023

mandre commented Dec 21, 2023

openshift-ci-robot commented Dec 21, 2023

mandre commented Dec 21, 2023

EmilienM commented Dec 21, 2023

mandre commented Jan 8, 2024

openshift-ci bot commented Jan 8, 2024

mandre commented Jan 8, 2024

mandre commented Jan 8, 2024

mandre commented Jan 16, 2024

mandre commented Jan 16, 2024

elmiko commented Jan 16, 2024

openshift-ci bot commented Jan 16, 2024

elmiko commented Jan 16, 2024

openshift-ci bot commented Jan 16, 2024

JoelSpeed commented Jan 16, 2024

JoelSpeed left a comment

JoelSpeed Jan 16, 2024

JoelSpeed Jan 16, 2024 •

edited

JoelSpeed Jan 16, 2024

JoelSpeed Jan 16, 2024

mandre commented Jan 18, 2024

JoelSpeed commented Jan 18, 2024

openshift-ci bot commented Jan 18, 2024

openshift-ci-robot commented Jan 18, 2024

mandre commented Apr 16, 2024

openshift-cherrypick-robot commented Apr 16, 2024

OCPBUGS-9925: Fix tests for OpenStack platform #306

OCPBUGS-9925: Fix tests for OpenStack platform #306

Conversation

mandre commented Dec 21, 2023

openshift-ci-robot commented Dec 21, 2023

mandre commented Dec 21, 2023

openshift-ci-robot commented Dec 21, 2023

mandre commented Dec 21, 2023

openshift-ci-robot commented Dec 21, 2023

mandre commented Dec 21, 2023

EmilienM commented Dec 21, 2023

mandre commented Jan 8, 2024

openshift-ci bot commented Jan 8, 2024

mandre commented Jan 8, 2024

mandre commented Jan 8, 2024

mandre commented Jan 16, 2024

mandre commented Jan 16, 2024

elmiko commented Jan 16, 2024

openshift-ci bot commented Jan 16, 2024

elmiko commented Jan 16, 2024

openshift-ci bot commented Jan 16, 2024

JoelSpeed commented Jan 16, 2024

JoelSpeed left a comment

Choose a reason for hiding this comment

JoelSpeed Jan 16, 2024

Choose a reason for hiding this comment

JoelSpeed Jan 16, 2024 • edited

Choose a reason for hiding this comment

JoelSpeed Jan 16, 2024

Choose a reason for hiding this comment

JoelSpeed Jan 16, 2024

Choose a reason for hiding this comment

mandre commented Jan 18, 2024

JoelSpeed commented Jan 18, 2024

openshift-ci bot commented Jan 18, 2024

openshift-ci-robot commented Jan 18, 2024

mandre commented Apr 16, 2024

openshift-cherrypick-robot commented Apr 16, 2024

JoelSpeed Jan 16, 2024 •

edited