Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[e2e]: fix CPU hotplug flakes #10835

Merged
merged 3 commits into from
Jan 30, 2024
Merged

Conversation

enp0s3
Copy link
Contributor

@enp0s3 enp0s3 commented Dec 5, 2023

Signed-off-by: Igor Bezukh ibezukh@redhat.com

What this PR does / why we need it:
CPU hotplug involves live-migration and since GetRunningPodByLabel was used in the test, sometimes
it returned the wrong running pod, thus resulted in a flake. The PR aims to fix GetRunningPodByLabel and
do some code cleanup on the way.

Fixes #
#10690

Checklist

This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.

Release note:

NONE

@kubevirt-bot kubevirt-bot added dco-signoff: yes Indicates the PR's author has DCO signed all their commits. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/S labels Dec 5, 2023
@kubevirt-bot kubevirt-bot added release-note-none Denotes a PR that doesn't merit a release note. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Dec 5, 2023
@enp0s3 enp0s3 marked this pull request as draft December 5, 2023 11:48
@kubevirt-bot kubevirt-bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 5, 2023
@enp0s3
Copy link
Contributor Author

enp0s3 commented Dec 5, 2023

/test pull-kubevirt-e2e-k8s-1.28-sig-compute-migrations

@enp0s3
Copy link
Contributor Author

enp0s3 commented Dec 5, 2023

/test pull-kubevirt-e2e-k8s-1.28-sig-compute-migrations

@enp0s3
Copy link
Contributor Author

enp0s3 commented Dec 5, 2023

/test all

@enp0s3 enp0s3 marked this pull request as ready for review December 11, 2023 08:23
@kubevirt-bot kubevirt-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 11, 2023
@fossedihelm
Copy link
Contributor

/cc

@fossedihelm
Copy link
Contributor

I think this is a great opportunity to fix this TODO

kubevirt/tests/utils.go

Lines 470 to 475 in 7e3aa9f

for _, pod := range pods.Items {
// TODO: This needs to be reworked.
// During migration there can be more than one pod
readyPod = &pod
break
}

I don't think its trivial to rework the GetRunningPodByLabel function to return the correct running
pod in case it was called during a live-migration, because there can be multiple migrations in the test
in the same namespace, and we can hit a race when we get 2 running pods, and all the listed migrations
are at succeeded phase.

IMHO there could not be multiple migrations in the test in the same namespace, because tests runs in parallel in different namespace. So I think this is a non-problem.
It should not be enough something like this?

        for _, pod := range pods.Items {
		// During migration there can be more than one pod
		// In this case we want the target pod
		if readyPod != nil || readyPod.CreationTimestamp.Before(&pod.CreationTimestamp) {
			continue
		}

		readyPod = &pod
	}

Thanks!
In any case, I agree with you that we can be more precise in the test. WDYT?

@fossedihelm
Copy link
Contributor

/kind flake

@kubevirt-bot kubevirt-bot added the kind/flake Categorizes issue or PR as related to a flaky test. label Dec 11, 2023
@xpivarc
Copy link
Member

xpivarc commented Dec 12, 2023

@fossedihelm
I think what Federico suggests, fixing the function, makes total sense. In this fix, we rely on implementation details and we will need to change the tests once again in the future.

@enp0s3 enp0s3 force-pushed the cpu-hotplug-flakes-fix branch 2 times, most recently from 74cfdcf to e220b23 Compare January 24, 2024 09:38
@enp0s3
Copy link
Contributor Author

enp0s3 commented Jan 24, 2024

@xpivarc @fossedihelm Can you PTAL?

@enp0s3 enp0s3 marked this pull request as ready for review January 24, 2024 13:54
@kubevirt-bot kubevirt-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 24, 2024
Comment on lines 61 to 65
if readyPod != nil && readyPod.CreationTimestamp.Before(&pod.CreationTimestamp) {
continue
}

readyPod = &pod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad for the previous wrong suggestion.
Now that I am reading better, shouldn't this be reversed?

Suggested change
if readyPod != nil && readyPod.CreationTimestamp.Before(&pod.CreationTimestamp) {
continue
}
readyPod = &pod
if readyPod != nil && pod.CreationTimestamp.Before(&readyPod.CreationTimestamp) {
continue
}
readyPod = &pod

We want to skip this pod if the "current" readyPod was created before it.
Or am I totally confused?
Thanks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xpivarc @fossedihelm Thanks folks!

@enp0s3 enp0s3 force-pushed the cpu-hotplug-flakes-fix branch 2 times, most recently from 909ac7c to b69d77e Compare January 24, 2024 14:46
Copy link
Contributor

@fossedihelm fossedihelm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patience!

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Jan 24, 2024
continue
}

readyPod = &pod
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't you do

pod := pod
readyPod = &pod

see https://go.dev/play/p/5EJYJjoIh3K

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! Nice one!
but maybe is it better to index from range?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it does not matter much but I think pod := pod was mentioned in "effective Go".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fossedihelm @xpivarc Hey. I have added a function that sorts by creation ts, please share your thoughts about the updated approach

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My motivation was not re-inventing the wheel when its about a sorting, and eliminate redundant boilerplate with pointers

@kubevirt-bot kubevirt-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 28, 2024
this is done as part of the effort to reduce the utils.go
boilerplate.

Signed-off-by: Igor Bezukh <ibezukh@redhat.com>
the improvement gives the function a sense of library function.

Signed-off-by: Igor Bezukh <ibezukh@redhat.com>
@kubevirt-bot kubevirt-bot removed lgtm Indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jan 28, 2024
import v1 "k8s.io/api/core/v1"

// PodsByCreationTimestamp sorts a list of Pods by creation timestamp, using their names as a tie breaker.
type PodsByCreationTimestamp []v1.Pod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! But honestly I am wondering if it is worth it to make it more general that can be used elsewhere, with smth like:

import (
	k8smetav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// KMetaObject sorts a list of k8s meta objects by creation timestamp.
type KMetaObject []k8smetav1.ObjectMeta

func (o KMetaObject) Len() int      { return len(o) }
func (o KMetaObject) Swap(i, j int) { o[i], o[j] = o[j], o[i] }
func (o KMetaObject) Less(i, j int) bool {
	if o[i].CreationTimestamp.Equal(&o[j].CreationTimestamp) {
		return o[i].Name < o[j].Name
	}
	return o[i].CreationTimestamp.Before(&o[j].CreationTimestamp)
}

I see that in the migration controller, we do basically the same thing, but for migrations

type vmimCollection []*virtv1.VirtualMachineInstanceMigration
func (c vmimCollection) Len() int {
return len(c)
}
func (c vmimCollection) Less(i, j int) bool {
t1 := &c[i].CreationTimestamp
t2 := &c[j].CreationTimestamp
return t1.Before(t2)
}
func (c vmimCollection) Swap(i, j int) {
c[i], c[j] = c[j], c[i]
}

I don't know if it is worth it. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fossedihelm Thanks for the feedback. The problem here is that the casting won't work: sort.Reverse(PodsByCreationTimestamp(pods.Items)

But we can solve it with Go generics. However IMO we would need some place like test-infra to put generic stuff there, and the libs would implement their things on top of that, it would require a separate PR though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine! Thanks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: It would be good to backport this as well.

Copy link
Member

@xpivarc xpivarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 3rd commit is now redundant, the sort is a little more than we need but fine by me.

tests/libpod/query.go Outdated Show resolved Hide resolved
it wasn't compatible with VM migration case where there
could be 2 running virt-laucnher pods. With the fix the
function will return the target pod when its ready.

also added library function that sorts pods by creation ts

Signed-off-by: Igor Bezukh <ibezukh@redhat.com>
Copy link
Contributor

@fossedihelm fossedihelm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Jan 29, 2024
@enp0s3
Copy link
Contributor Author

enp0s3 commented Jan 29, 2024

/retest-required

Copy link
Member

@xpivarc xpivarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: xpivarc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 29, 2024
@kubevirt-commenter-bot
Copy link

Required labels detected, running phase 2 presubmits:
/test pull-kubevirt-e2e-kind-1.27-sriov
/test pull-kubevirt-e2e-k8s-1.29-ipv6-sig-network
/test pull-kubevirt-e2e-k8s-1.27-sig-network
/test pull-kubevirt-e2e-k8s-1.27-sig-storage
/test pull-kubevirt-e2e-k8s-1.27-sig-compute
/test pull-kubevirt-e2e-k8s-1.27-sig-operator
/test pull-kubevirt-e2e-k8s-1.28-sig-network
/test pull-kubevirt-e2e-k8s-1.28-sig-storage
/test pull-kubevirt-e2e-k8s-1.28-sig-compute
/test pull-kubevirt-e2e-k8s-1.28-sig-operator

@kubevirt-bot
Copy link
Contributor

kubevirt-bot commented Jan 30, 2024

@enp0s3: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubevirt-e2e-k8s-1.29-sig-monitoring b69d77e link true /test pull-kubevirt-e2e-k8s-1.29-sig-monitoring
pull-kubevirt-check-tests-for-flakes 0991bae link false /test pull-kubevirt-check-tests-for-flakes
pull-kubevirt-e2e-k8s-1.29-sig-storage 0991bae link unknown /test pull-kubevirt-e2e-k8s-1.29-sig-storage

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@kubevirt-commenter-bot
Copy link

/retest-required
This bot automatically retries required jobs that failed/flaked on approved PRs.
Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@kubevirt-bot kubevirt-bot merged commit 72bb713 into kubevirt:main Jan 30, 2024
34 of 36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/flake Categorizes issue or PR as related to a flaky test. lgtm Indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants