-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-9925: Fix tests for OpenStack platform #306
Conversation
@mandre: This pull request references Jira Issue OCPBUGS-9925, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/jira refresh |
@mandre: This pull request references Jira Issue OCPBUGS-9925, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/jira refresh |
@mandre: This pull request references Jira Issue OCPBUGS-9925, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The CI is finding real bugs now 😅 |
/lgtm |
/test ci/prow/e2e-openstack-operator |
@mandre: The specified target(s) for
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hey @elmiko, since you inquired about the openstack jobs, PTAL whenever you get a chance. /assign elmiko |
/test e2e-openstack-operator |
/test e2e-vsphere-operator |
e2e-vsphere-operator has never passed, and fails on cluster installation, which is outside of the scope of this PR. |
thanks @mandre , updates look nice to me and that is wild about the machinesets getting deleted on skips. /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: elmiko The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/override ci/prow/e2e-vsphere-operator |
@elmiko: Overrode contexts on behalf of elmiko: ci/prow/e2e-vsphere-operator In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/hold I want to review this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree in general with this change, however, I think this is in some places making us more prone to failure.
For example, if the BeforeEach
fails mid way through, the AfterEach
would always run anyway, and we would always clean up.
In the current scenario, with the DeferCleanup
where they are, in some cases, we risk leaking resources out of each test if something breaks mid-way in the BeforeEach
, which may then interfere with later test cases.
We are swinging from being overly aggressive on the clean up, to not quite aggressive enough. I think with the changes I've suggested we can get this just right though (goldilocks zone for cleanup?)
/approve cancel
pkg/infra/infra.go
Outdated
DeferCleanup(func() { | ||
if machineSet != nil { | ||
By("Deleting the new MachineSet") | ||
Expect(client.Delete(ctx, machineSet)).To(Succeed(), "MachineSet should be able to be deleted") | ||
framework.WaitForMachineSetsDeleted(ctx, client, machineSet) | ||
} | ||
}) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still has a chance to error. machineSet
is not cleared by the before each, so the machineSet
could come from a previous test run and be non-nil. Will a delete call fail if the resource doesn't exist? I expect so.
Wouldn't changing this to the below make it safer?
DeferCleanup(func() { | |
if machineSet != nil { | |
By("Deleting the new MachineSet") | |
Expect(client.Delete(ctx, machineSet)).To(Succeed(), "MachineSet should be able to be deleted") | |
framework.WaitForMachineSetsDeleted(ctx, client, machineSet) | |
} | |
}) | |
{ | |
// Reset the machineSet between each test | |
machineSet = nil | |
// Make sure to clean up the machineSet, if we create one. | |
DeferCleanup(func() { | |
if machineSet != nil { | |
By("Deleting the new MachineSet") | |
Expect(client.Delete(ctx, machineSet)).To(Succeed(), "MachineSet should be able to be deleted") | |
framework.WaitForMachineSetsDeleted(ctx, client, machineSet) | |
} | |
} | |
}) | |
pkg/infra/lifecyclehooks.go
Outdated
DeferCleanup(func() { | ||
By("Deleting the machineset") | ||
cascadeDelete := metav1.DeletePropagationForeground | ||
Expect(client.Delete(context.Background(), machineSet, &runtimeclient.DeleteOptions{ | ||
PropagationPolicy: &cascadeDelete, | ||
})).To(Succeed(), "MachineSet should be able to be deleted") | ||
|
||
By("Waiting for the MachineSet to be deleted...") | ||
framework.WaitForMachineSetsDeleted(ctx, client, machineSet) | ||
|
||
By("Deleting workload job") | ||
Expect(client.Delete(context.Background(), workload, &runtimeclient.DeleteOptions{ | ||
PropagationPolicy: &cascadeDelete, | ||
})).To(Succeed(), "Workload job should be able to be deleted") | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be preferable to put the cleanup next to the thing that creates it, I would have split this into several cleanups so that, if something breaks mid setup, we tear down the bits we have already created
The flow should be:
- Create MachineSet
- Defer remove MachineSet
- Wait for MachineSet
- Create workload Job
- Defer remove workload Job
- Wait for workload job running
pkg/infra/spot.go
Outdated
if specReport.Failed() { | ||
Expect(gatherer.WithSpecReport(specReport).GatherAll()).To(Succeed(), "StateGatherer should be able to gather resources") | ||
} | ||
DeferCleanup(func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the sake of making this maintainable, would it make sense to pair this with the line where we clear delObjects
at the beginning of the before each? Do we risk otherwise not always clearing up if setup breaks mid way through?
DeferCleanup(func() { | ||
By("Deleting the MachineHealthCheck resource") | ||
Expect(client.Delete(context.Background(), machinehealthcheck)).To(Succeed(), "failed to delete MHC") | ||
|
||
By("Deleting the new MachineSet") | ||
Expect(client.Delete(context.Background(), machineSet)).To(Succeed(), "failed to delete machineSet") | ||
|
||
framework.WaitForMachineSetsDeleted(ctx, client, machineSet) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should come before the WaitForMachineSet
else it won't run if the wait fails
As opposed to the AfterEach() function, the DeferCleanup() is not executed when the test is skipped. Previously, the cleanup in the webhook tests would wipe all the machinesets existing in the cluster on unsupported platforms (OpenStack) due to an unitialized selector, even when the tests were skipped.
Thanks for the thorough review @JoelSpeed. I believe I have addressed all your comments, PTAL. |
/lgtm Thanks for fixing that up |
@mandre: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
5d1eca8
into
openshift:master
@mandre: Jira Issue OCPBUGS-9925: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-9925 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/cherry-pick release-4.15 |
@mandre: new pull request created: #313 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There were several issues, the more serious one was a problem where cleanup would wipe all existing machinesets (the existing ones, before the tests ran) when the tests were skipped.
Moving the cleanup code to a
DeferCleanup()
function fixes it.