e2e: fix CPU manager methods to be more flexible to different CPU topology #98373

cynepco3hahue · 2021-01-25T16:09:42Z

What type of PR is this?

/kind bug
/kind failing-test

What this PR does / why we need it:

fix the issue when the test runs on the node with the single CPU
fix the issue when the CPU topology has only one core per socket, it can
be easily reproduced by configuring VM with multi NUMA, but when each socket
has only one core

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Signed-off-by: Artyom Lukianov alukiano@redhat.com

…ology - fix the issue when the test runs on the node with the single CPU - fix the issue when the CPU topology has only one core per socket, it can be easily reproduced by configuring VM with multi NUMA, but when each socket has only one core Signed-off-by: Artyom Lukianov <alukiano@redhat.com>

k8s-ci-robot · 2021-01-25T16:09:50Z

@cynepco3hahue: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2021-01-25T16:09:51Z

Hi @cynepco3hahue. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

cynepco3hahue · 2021-01-25T16:10:18Z

/test pull-kubernetes-node-kubelet-serial-cpu-manager

cynepco3hahue · 2021-01-25T16:10:24Z

/test pull-kubernetes-node-kubelet-serial-topology-manager

k8s-ci-robot · 2021-01-25T16:10:32Z

@cynepco3hahue: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/test pull-kubernetes-node-kubelet-serial-cpu-manager

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2021-01-25T16:10:37Z

@cynepco3hahue: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/test pull-kubernetes-node-kubelet-serial-topology-manager

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

SergeyKanzhelev · 2021-01-26T17:32:09Z

/ok-to-test

SergeyKanzhelev · 2021-01-26T17:33:07Z

test/e2e_node/cpu_manager_test.go

@@ -283,7 +283,9 @@ func runGuPodTest(f *framework.Framework) {
 		cpu1 = cpuList[1]
 	} else if isMultiNUMA() {
 		cpuList = cpuset.MustParse(getCoreSiblingList(0)).ToSlice()
-		cpu1 = cpuList[1]
+		if len(cpuList) > 1 {


it is inside the isMultiNUMA branch, meaning, there are more than 1 cpu, correct? Not following why it may be just 1.

it calls getCoreSiblingList(0) to get all cores that placed under the same NUMA node as the CPU 0, so if the NUMA node has only one core, that unusual but possible(with fake NUMA for example), it will return the list with only one element.

SergeyKanzhelev · 2021-01-26T17:36:10Z

test/e2e_node/cpu_manager_test.go

@@ -359,7 +364,9 @@ func runMultipleGuNonGuPods(f *framework.Framework, cpuCap int64, cpuAlloc int64
 		cpu1 = cpuList[1]
 	} else if isMultiNUMA() {
 		cpuList = cpuset.MustParse(getCoreSiblingList(0)).ToSlice()
-		cpu1 = cpuList[1]
+		if len(cpuList) > 1 {
+			cpu1 = cpuList[1]


barely understand the logic of this test. Comment from the common sense: is the default value 1 be working in later call to cpuset.NewCPUSet(cpu1)? Curious why default value is not 0.

If nobody else will comment with knowledge of these tests I will take a deeper look later

the test assumes that the CPU 0 is a reserved one, and it some logic under the CPU manager that takes CPUs by NUMA nodes, for example, if NUMA node 0 has enough CPUs to satisfy the container request it will use CPUs from the NUMA node 0, otherwise, it will pass to the next NUMA node
give me know if you need a better explanation

cynepco3hahue · 2021-01-27T10:13:15Z

/retest

ffromani

the patch per se is correct, but the main factor is these tests don't really want to run on environments which don't have enough core to actuallty guarantee a meanongful environment.
Otherwise, the test actually runs, but in environments so constrainted that actually gives us no signal.
This is the reason why in the topology manager e2e tests we have explicit check about running on a system with enough resources (and truth to be told, this check can probably be made a bit smarter) https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/topology_manager_test.go#L375

I think that in a followup PR we should add a similar check here. I can help with that.

SergeyKanzhelev · 2021-02-02T00:31:22Z

test/e2e_node/cpu_manager_test.go

@@ -313,6 +315,10 @@ func runNonGuPodTest(f *framework.Framework, cpuCap int64) {

 	ginkgo.By("checking if the expected cpuset was assigned")
 	expAllowedCPUsListRegex = fmt.Sprintf("^0-%d\n$", cpuCap-1)


is the assumption that cpuCap is a single digit number?

yep, it passed to the method as cpuCap int64

as long as this assumption is OK

SergeyKanzhelev · 2021-02-02T16:30:53Z

/lgtm

I suggest to follow @fromanirh suggestion on checking the test host specs in the beginning of the test. Maybe follow up PR

ffromani · 2021-02-02T16:38:40Z

/lgtm

I suggest to follow @fromanirh suggestion on checking the test host specs in the beginning of the test. Maybe follow up PR

I think the cpumanager (and to a lesser extent the topology manager) e2e testsuite can use some improvements and cleanups and I'd be happy to help with this effort.

cynepco3hahue · 2021-02-02T16:40:51Z

I think the cpumanager (and to a lesser extent the topology manager) e2e testsuite can use some improvements and cleanups and I'd be happy to help with this effort.

Yes, it can be great!

mrunalp · 2021-02-24T18:17:10Z

/approve

k8s-ci-robot · 2021-02-24T18:17:40Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cynepco3hahue, mrunalp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~test/e2e_node/OWNERS~~ [mrunalp]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 25, 2021

k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Jan 25, 2021

k8s-ci-robot requested review from derekwaynecarr and odinuge January 25, 2021 16:10

k8s-ci-robot added area/test sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 25, 2021

SergeyKanzhelev added this to Triage in SIG Node CI/Test Board Jan 25, 2021

SergeyKanzhelev moved this from Triage to PRs - Review in progress in SIG Node CI/Test Board Jan 25, 2021

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 26, 2021

SergeyKanzhelev reviewed Jan 26, 2021

View reviewed changes

ffromani reviewed Feb 1, 2021

View reviewed changes

SergeyKanzhelev reviewed Feb 2, 2021

View reviewed changes

k8s-ci-robot assigned SergeyKanzhelev Feb 2, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 2, 2021

SergeyKanzhelev moved this from PRs - Review in progress to PRs - Reviewer lgtm'd in SIG Node CI/Test Board Feb 8, 2021

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 24, 2021

k8s-ci-robot merged commit 6235ad8 into kubernetes:master Feb 24, 2021

SIG Node CI/Test Board automation moved this from PRs - Reviewer lgtm'd to Done Feb 24, 2021

k8s-ci-robot added this to the v1.21 milestone Feb 24, 2021

ffromani mentioned this pull request Mar 10, 2021

REQUEST: New membership for fromanirh kubernetes/org#2560

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

e2e: fix CPU manager methods to be more flexible to different CPU topology #98373

e2e: fix CPU manager methods to be more flexible to different CPU topology #98373

cynepco3hahue commented Jan 25, 2021

k8s-ci-robot commented Jan 25, 2021

k8s-ci-robot commented Jan 25, 2021

cynepco3hahue commented Jan 25, 2021

cynepco3hahue commented Jan 25, 2021

k8s-ci-robot commented Jan 25, 2021

k8s-ci-robot commented Jan 25, 2021

SergeyKanzhelev commented Jan 26, 2021

SergeyKanzhelev Jan 26, 2021

cynepco3hahue Jan 26, 2021

SergeyKanzhelev Jan 26, 2021

cynepco3hahue Jan 26, 2021

cynepco3hahue commented Jan 27, 2021

ffromani left a comment

SergeyKanzhelev Feb 2, 2021

cynepco3hahue Feb 2, 2021

SergeyKanzhelev Feb 2, 2021

SergeyKanzhelev commented Feb 2, 2021

ffromani commented Feb 2, 2021

cynepco3hahue commented Feb 2, 2021

mrunalp commented Feb 24, 2021

k8s-ci-robot commented Feb 24, 2021

		@@ -313,6 +315,10 @@ func runNonGuPodTest(f *framework.Framework, cpuCap int64) {

		ginkgo.By("checking if the expected cpuset was assigned")
		expAllowedCPUsListRegex = fmt.Sprintf("^0-%d\n$", cpuCap-1)

e2e: fix CPU manager methods to be more flexible to different CPU topology #98373

e2e: fix CPU manager methods to be more flexible to different CPU topology #98373

Conversation

cynepco3hahue commented Jan 25, 2021

k8s-ci-robot commented Jan 25, 2021

k8s-ci-robot commented Jan 25, 2021

cynepco3hahue commented Jan 25, 2021

cynepco3hahue commented Jan 25, 2021

k8s-ci-robot commented Jan 25, 2021

k8s-ci-robot commented Jan 25, 2021

SergeyKanzhelev commented Jan 26, 2021

SergeyKanzhelev Jan 26, 2021

Choose a reason for hiding this comment

cynepco3hahue Jan 26, 2021

Choose a reason for hiding this comment

SergeyKanzhelev Jan 26, 2021

Choose a reason for hiding this comment

cynepco3hahue Jan 26, 2021

Choose a reason for hiding this comment

cynepco3hahue commented Jan 27, 2021

ffromani left a comment

Choose a reason for hiding this comment

SergeyKanzhelev Feb 2, 2021

Choose a reason for hiding this comment

cynepco3hahue Feb 2, 2021

Choose a reason for hiding this comment

SergeyKanzhelev Feb 2, 2021

Choose a reason for hiding this comment

SergeyKanzhelev commented Feb 2, 2021

ffromani commented Feb 2, 2021

cynepco3hahue commented Feb 2, 2021

mrunalp commented Feb 24, 2021

k8s-ci-robot commented Feb 24, 2021