-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
e2e: fix CPU manager methods to be more flexible to different CPU topology #98373
e2e: fix CPU manager methods to be more flexible to different CPU topology #98373
Conversation
…ology - fix the issue when the test runs on the node with the single CPU - fix the issue when the CPU topology has only one core per socket, it can be easily reproduced by configuring VM with multi NUMA, but when each socket has only one core Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
@cynepco3hahue: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hi @cynepco3hahue. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test pull-kubernetes-node-kubelet-serial-cpu-manager |
/test pull-kubernetes-node-kubelet-serial-topology-manager |
@cynepco3hahue: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@cynepco3hahue: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
@@ -283,7 +283,9 @@ func runGuPodTest(f *framework.Framework) { | |||
cpu1 = cpuList[1] | |||
} else if isMultiNUMA() { | |||
cpuList = cpuset.MustParse(getCoreSiblingList(0)).ToSlice() | |||
cpu1 = cpuList[1] | |||
if len(cpuList) > 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is inside the isMultiNUMA
branch, meaning, there are more than 1 cpu, correct? Not following why it may be just 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it calls getCoreSiblingList(0)
to get all cores that placed under the same NUMA node as the CPU 0, so if the NUMA node has only one core, that unusual but possible(with fake NUMA for example), it will return the list with only one element.
@@ -359,7 +364,9 @@ func runMultipleGuNonGuPods(f *framework.Framework, cpuCap int64, cpuAlloc int64 | |||
cpu1 = cpuList[1] | |||
} else if isMultiNUMA() { | |||
cpuList = cpuset.MustParse(getCoreSiblingList(0)).ToSlice() | |||
cpu1 = cpuList[1] | |||
if len(cpuList) > 1 { | |||
cpu1 = cpuList[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
barely understand the logic of this test. Comment from the common sense: is the default value 1
be working in later call to cpuset.NewCPUSet(cpu1)
? Curious why default value is not 0
.
If nobody else will comment with knowledge of these tests I will take a deeper look later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the test assumes that the CPU 0 is a reserved one, and it some logic under the CPU manager that takes CPUs by NUMA nodes, for example, if NUMA node 0 has enough CPUs to satisfy the container request it will use CPUs from the NUMA node 0, otherwise, it will pass to the next NUMA node
give me know if you need a better explanation
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the patch per se is correct, but the main factor is these tests don't really want to run on environments which don't have enough core to actuallty guarantee a meanongful environment.
Otherwise, the test actually runs, but in environments so constrainted that actually gives us no signal.
This is the reason why in the topology manager e2e tests we have explicit check about running on a system with enough resources (and truth to be told, this check can probably be made a bit smarter) https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/topology_manager_test.go#L375
I think that in a followup PR we should add a similar check here. I can help with that.
@@ -313,6 +315,10 @@ func runNonGuPodTest(f *framework.Framework, cpuCap int64) { | |||
|
|||
ginkgo.By("checking if the expected cpuset was assigned") | |||
expAllowedCPUsListRegex = fmt.Sprintf("^0-%d\n$", cpuCap-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the assumption that cpuCap is a single digit number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, it passed to the method as cpuCap int64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as long as this assumption is OK
/lgtm I suggest to follow @fromanirh suggestion on checking the test host specs in the beginning of the test. Maybe follow up PR |
I think the cpumanager (and to a lesser extent the topology manager) e2e testsuite can use some improvements and cleanups and I'd be happy to help with this effort. |
Yes, it can be great! |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cynepco3hahue, mrunalp The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind bug
/kind failing-test
What this PR does / why we need it:
be easily reproduced by configuring VM with multi NUMA, but when each socket
has only one core
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
Signed-off-by: Artyom Lukianov alukiano@redhat.com