Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Eventually() missing Should() statement and sync error #11101

Merged
merged 1 commit into from
Feb 3, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
22 changes: 15 additions & 7 deletions tests/vm_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -1957,24 +1957,32 @@ status:

Context("[Serial] when node becomes unhealthy", Serial, func() {
const componentName = "virt-handler"
var nodeName string

AfterEach(func() {
libpod.DeleteKubernetesApiBlackhole(getHandlerNodePod(virtClient, nodeName), componentName)
Eventually(func(g Gomega) {
g.Expect(getHandlerNodePod(virtClient, nodeName).Items[0]).To(HaveConditionTrue(k8sv1.PodReady))
}, 120*time.Second, time.Second).Should(Succeed())

tests.WaitForConfigToBePropagatedToComponent("kubevirt.io=virt-handler", util.GetCurrentKv(virtClient).ResourceVersion,
tests.ExpectResourceVersionToBeLessEqualThanConfigVersion, 120*time.Second)
})
Comment on lines +1962 to +1970
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not flaky?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to find out if this may lead to flaky. The purpose is to wait until configuration versions are in sync. I ran out of ideas in this regard. So I think we should rerun a couple of times the pull-kubevirt-check-tests-for-flakes to find it out. Sorry for the brute force approach 😞


It("[Serial] the VMs running in that node should be respawned", func() {
By("Starting VM")
vm := startVM(virtClient, createVM(virtClient, libvmi.NewCirros()))
vmi, err := virtClient.VirtualMachineInstance(vm.Namespace).Get(context.Background(), vm.Name, &k8smetav1.GetOptions{})
Expect(err).ToNot(HaveOccurred())

nodeName := vmi.Status.NodeName
nodeName = vmi.Status.NodeName
oldUID := vmi.UID

By("Blocking virt-handler from reconciling the VMI")
libpod.AddKubernetesApiBlackhole(getHandlerNodePod(virtClient, nodeName), componentName)
Eventually(getHandlerNodePod(virtClient, nodeName).Items[0], 120*time.Second, time.Second, HaveConditionFalse(k8sv1.PodReady))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about?

Eventually(func(g Gomega) {
	g.Expect(getHandlerNodePod(virtClient, nodeName).Items[0]).To(HaveConditionFalse(k8sv1.PodReady))
}, 120*time.Second, time.Second).Should(Succeed())

DeferCleanup(func() {
	libpod.DeleteKubernetesApiBlackhole(getHandlerNodePod(virtClient, nodeName), componentName)
	Eventually(func(g Gomega) {
		g.Expect(getHandlerNodePod(virtClient, nodeName).Items[0]).To(HaveConditionTrue(k8sv1.PodReady))
	}, 120*time.Second, time.Second).Should(Succeed())
})

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with DeferCleanup is that it is executed after the resetToDefaultConfig() inside the global AfterEach. So when the AfterEach tries to reset kubevirt, the virt-handler is offline (unable to communicate with the kubernetes API). That creates this error: "resource & config versions (5548 and 4736 respectively) are not as expected. component: \"virt-handler\", pod: \"virt-handler-zdv7f\" " .

I like the Eventually proposal, looks elegant. Also implemented in the blackhole deleting check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we revert the defer or JustAfterEach?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it.
The defer is something that we can avoid as solution to cleanup. The JustAfterEach is generally used for different scopes.
Here, the problem was the DeferCleanup. After some searches I found this onsi/ginkgo#1284 (comment)

In reality Ginkgo runs the After* family followed by the Defer* family.

which explain the source of the issue.
(We should check our other DeferCleanup usages IMHO).
Can you explain your doubts?
Thanks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcanocan also observed that an AfterEach is only run after the global AfterEach afaik.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fossedihelm for this nice piece of information. This explains all the issues faced.

As far as I was able to test, the nearest AfterEach (the one inside the Context) is always executed before the global AfterEach. However, we need a way to wait until the handler and kubevirt versions are in sync. That's why I've observed in the past the error mentioned in the description. Sorry for the confusion.


DeferCleanup(func() {
libpod.DeleteKubernetesApiBlackhole(getHandlerNodePod(virtClient, nodeName), componentName)
Eventually(getHandlerNodePod(virtClient, nodeName).Items[0], 120*time.Second, time.Second, HaveConditionTrue(k8sv1.PodReady))
})
Eventually(func(g Gomega) {
g.Expect(getHandlerNodePod(virtClient, nodeName).Items[0]).To(HaveConditionFalse(k8sv1.PodReady))
}, 120*time.Second, time.Second).Should(Succeed())

pod, err := libvmi.GetPodByVirtualMachineInstance(vmi, vmi.Namespace)
Expect(err).ToNot(HaveOccurred())
Expand Down