Ensure optimal CPU pinning with dedicated CPUs #6251

rmohr · 2021-08-16T14:28:12Z

What this PR does / why we need it:

The algorithm first creates buckets based on the following cpu
attributes:

thread junks in the cpuSet
cores per numa node

Then the algorithm will assign threads to vCPU cores with the following
priority:

try to assign a consecutive junk of threads from a singl host core.
if no full set of sibling threads could be assigned, assign threads
from a single numa node.
If numa passthrough is requested, fail on this step, since we must
not cross numa node boundaries.
if not enough threads are available on any numa node, try to assign
threads from different numa nodes to form a full vCPU core.
go back to (1) and repeat until all threads are assigned.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #6159
Fixes #4687
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1987329
Replaces #4757

Special notes for your reviewer:

Release note:

Better place vcpu threads on host cpus to form more efficient passthrough architectures

kubevirt-bot · 2021-08-16T14:28:16Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

rmohr · 2021-08-16T14:28:37Z

/cc @vasiliy-ul

rmohr · 2021-08-16T14:43:32Z

/test pull-kubevirt-e2e-k8s-1.20-sig-compute

rmohr · 2021-08-16T14:49:59Z

/test pull-kubevirt-unit-test

rmohr · 2021-08-17T10:19:16Z

Should be ready for review.

pkg/virt-launcher/virtwrap/converter/converter.go

kwiesmueller · 2021-08-17T16:33:53Z

pkg/virt-launcher/virtwrap/converter/vcpu/vcpu.go

+		thread := c.smallJunks[0]
+		c.smallJunks = c.smallJunks[1:]
+		return &thread
+	} else if len(c.bigThreadJunks) > 0 {


The else here is not needed.
You could also invert the check and return nil if len(c.bigThreadJunks) < 1 to remove one level of complexity.

I think I keep the logical flow in the code.

pkg/virt-launcher/virtwrap/converter/vcpu/vcpu.go

kwiesmueller · 2021-08-17T16:58:45Z

pkg/virt-launcher/virtwrap/converter/vcpu/vcpu.go

+				// go to the next cell
+				break
+			}
+			if requested == 0 {


Are you checking for negative values somewhere or could requested ever be negative?
If so I'd prefer a < 1 check here.

I added a panic since that must never happen. Should at least block any hot-looping.

You added the panic here: https://github.com/kubevirt/kubevirt/pull/6251/files#diff-770be27cf2d6dd3b86265236daa60480efc09ce100e00ec7c003051a8572dc9bR262
But not on this line, should it be both places?

I think in this case it is pretty clear that we can't hot-loop. we only ever do request-- in the same loop.

Alright, if we are sure requested can never be 0 from the start.

pkg/virt-launcher/virtwrap/converter/vcpu/vcpu.go

pkg/virt-handler/node-labeller/api/capabilities.go

pkg/virt-launcher/virtwrap/converter/vcpu/vcpu.go

davidvossel · 2021-08-17T21:51:30Z

pkg/virt-launcher/virtwrap/converter/vcpu/vcpu.go

+	if remaining > 0 {
+		if p.allowCellCrossing {
+			return nil, fmt.Errorf("not enough exclusive threads provided, could not fit %v core(s)", remaining)


what would be a scenario where allowCellCrossing == true but remaining cores > 0. Is this only possible when someone requests threads per core where the number of threads can't possibly match the number of cores?

I'm just trying to understand if this is something that can be caught in validation webhook.

This is one of the main issues with all this. We don't know what we get until the kubelet is done and everything is started. We do not know up to this point if we can map cpu threads in a reasonable way. Note that for cpu pinning (so no numa mapping), we should always get a working pinning but it can be inefficient. For numa passthrough it is more likely that it will fail, because there are cases where we can't create a correct mapping out of the assigned CPUs because we can't form via libvirt topologies where cpus have different amounts of threads.

rmohr · 2021-08-18T08:24:50Z

@kwiesmueller @davidvossel PTAL

kwiesmueller · 2021-08-18T14:14:19Z

Just got some of the small nits remaining from before.
Feel free to resolve if necessary.

/retest

davidvossel

/approve

kubevirt-bot · 2021-08-19T18:47:33Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: davidvossel

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [davidvossel]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

rmohr · 2021-09-06T15:22:57Z

/retest

rmohr · 2021-09-07T07:43:34Z

/retest

vladikr

Overall, it all looks very good to me.
I think we should add the number of threads validation for dedicatedCPUs as well.

I would also suggest renaming the bigThreadChunks and smallChunks to fullCoresList or fragmentedCoresList - or anything similar. I think it would make it easier for anyone who will read this code to faster understand the context - but it's up to you :)

vladikr · 2021-09-09T02:02:21Z

pkg/virt-api/webhooks/validating-webhook/admitters/vmi-create-admitter.go

@@ -1015,6 +1015,16 @@ func validateNUMA(field *k8sfield.Path, spec *v1.VirtualMachineInstanceSpec, con
 				Field: field.Child("domain", "cpu", "numa", "guestMappingPassthrough").String(),
 			})
 		}
+		if spec.Domain.CPU != nil && spec.Domain.CPU.Threads > 2 {


Can we also add this validation to validateCpuPinning?

vladikr · 2021-09-09T02:02:47Z

pkg/virt-handler/node-labeller/api/capabilities.go

+
+func (b *CPUSiblings) UnmarshalXMLAttr(attr xml.Attr) error {
+	if attr.Value != "" {
+		if list, err := hwutil.ParseCPUSetLine(attr.Value, 100); err == nil {


Can't this easily be more than 100?
I think the limit is much higher, around 8192

100 is just the limit for cpu threads. For GetPodCPUSet where we are reading what we assign to the pods it is set to 50000.

Signed-off-by: Roman Mohr <rmohr@redhat.com>

Siblings could be reported as ranges on some CPUs where more than two threads can exist per core. Deal with that situation. Further add a safety limit to the CPU parsing utility function to avoid arbitrary sized expansions. Signed-off-by: Roman Mohr <rmohr@redhat.com>

vladikr · 2021-09-09T14:52:37Z

/lgtm

Remove the term `chunk` to avoid any confusions and talk about fragmented and not fragmented threads instead. Signed-off-by: Roman Mohr <rmohr@redhat.com>

rmohr · 2021-09-09T15:06:30Z

I would also suggest renaming the bigThreadChunks and smallChunks to fullCoresList or fragmentedCoresList - or anything similar. I think it would make it easier for anyone who will read this code to faster understand the context - but it's up to you :)

Done.

rmohr · 2021-09-09T15:06:34Z

/unhold

vladikr · 2021-09-09T15:32:20Z

👍
/lgtm

kubevirt-commenter-bot · 2021-09-09T18:25:47Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs.
Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

kubevirt-commenter-bot · 2021-09-10T01:28:13Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs.
Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

rmohr · 2021-09-10T09:00:51Z

/retest

rmohr · 2021-09-10T10:21:59Z

/retest

kubevirt-commenter-bot · 2021-09-10T13:31:17Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs.
Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

kubevirt-bot · 2021-09-10T16:12:51Z

@rmohr: #6251 failed to apply on top of branch "release-0.44":

Applying: Pass thread information to virt-launcher for pinning decisions
Applying: Add an algorithm to improve CPU pinning
Applying: Make use of the new cpu pinning assignment algorithm
Using index info to reconstruct a base tree...
M	pkg/virt-launcher/virtwrap/converter/converter.go
M	pkg/virt-launcher/virtwrap/converter/converter_test.go
M	pkg/virt-launcher/virtwrap/converter/network.go
M	pkg/virt-launcher/virtwrap/manager.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/virt-launcher/virtwrap/manager.go
Removing pkg/virt-launcher/virtwrap/converter/vcpu_placement.go
Auto-merging pkg/virt-launcher/virtwrap/converter/network.go
Auto-merging pkg/virt-launcher/virtwrap/converter/converter_test.go
CONFLICT (content): Merge conflict in pkg/virt-launcher/virtwrap/converter/converter_test.go
Auto-merging pkg/virt-launcher/virtwrap/converter/converter.go
CONFLICT (content): Merge conflict in pkg/virt-launcher/virtwrap/converter/converter.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0003 Make use of the new cpu pinning assignment algorithm
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherrypick release-0.44
/cherrypick release-0.45

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

kubevirt-bot · 2021-09-10T16:13:33Z

@rmohr: new pull request created: #6384

In response to this:

/cherrypick release-0.45

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Ensure optimal CPU pinning with dedicated CPUs (cherry picked from commit b9ee32a) Signed-off-by: Roman Mohr <rmohr@redhat.com>

kubevirt-bot added the size/XL label Aug 16, 2021

rmohr requested a review from vladikr August 16, 2021 14:28

kubevirt-bot requested review from AlonaKaplan and stu-gott August 16, 2021 14:28

kubevirt-bot requested a review from vasiliy-ul August 16, 2021 14:28

rmohr removed request for stu-gott and AlonaKaplan August 16, 2021 14:28

rmohr force-pushed the cpu-pinning branch from 3bf3381 to 9e5d897 Compare August 17, 2021 10:17

kubevirt-bot added size/XXL and removed size/XL labels Aug 17, 2021

rmohr marked this pull request as ready for review August 17, 2021 10:17

kubevirt-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 17, 2021

kubevirt-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 17, 2021

kwiesmueller reviewed Aug 17, 2021

View reviewed changes

davidvossel reviewed Aug 17, 2021

View reviewed changes

rmohr force-pushed the cpu-pinning branch from 9e5d897 to 563b16d Compare August 18, 2021 07:41

kubevirt-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 18, 2021

rmohr force-pushed the cpu-pinning branch from 563b16d to 80803af Compare August 18, 2021 08:21

davidvossel reviewed Aug 19, 2021

View reviewed changes

vladikr reviewed Sep 9, 2021

View reviewed changes

rmohr added 2 commits September 9, 2021 14:15

Limit guest CPUs to two threads if NUMA passthrough is selected

8f507ab

Signed-off-by: Roman Mohr <rmohr@redhat.com>

rmohr force-pushed the cpu-pinning branch from 5919afb to 740f6e0 Compare September 9, 2021 14:19

kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Sep 9, 2021

Give methods and variables more appropriate names

c3f8d20

Remove the term `chunk` to avoid any confusions and talk about fragmented and not fragmented threads instead. Signed-off-by: Roman Mohr <rmohr@redhat.com>

kubevirt-bot removed the lgtm Indicates that a PR is ready to be merged. label Sep 9, 2021

kubevirt-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 9, 2021

kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Sep 9, 2021

kubevirt-bot merged commit b9ee32a into kubevirt:main Sep 10, 2021

kubevirt-bot mentioned this pull request Sep 10, 2021

[release-0.45] Ensure optimal CPU pinning with dedicated CPUs #6384

Merged

vasiliy-ul mentioned this pull request Sep 10, 2021

Fix pinning of vCPU threads #4757

Closed

rmohr pushed a commit to rmohr/kubevirt that referenced this pull request Sep 13, 2021

Merge pull request kubevirt#6251 from rmohr/cpu-pinning

469ad0c

Ensure optimal CPU pinning with dedicated CPUs (cherry picked from commit b9ee32a) Signed-off-by: Roman Mohr <rmohr@redhat.com>

rmohr mentioned this pull request Sep 13, 2021

[release-0.44] Ensure optimal CPU pinning with dedicated CPUs #6392

Merged

rmohr pushed a commit to rmohr/kubevirt that referenced this pull request Sep 13, 2021

Merge pull request kubevirt#6251 from rmohr/cpu-pinning

c8e0e36

Ensure optimal CPU pinning with dedicated CPUs (cherry picked from commit b9ee32a) Signed-off-by: Roman Mohr <rmohr@redhat.com>

rmohr mentioned this pull request Sep 15, 2021

[WIP] [Bugfix] Update dedicated CPUs on migration #6200

Closed

omeryahud mentioned this pull request Sep 19, 2021

VMI migration with dedicated CPUs nil pointer dereference #6432

Closed

Ensure optimal CPU pinning with dedicated CPUs #6251

Ensure optimal CPU pinning with dedicated CPUs #6251

Conversation

rmohr commented Aug 16, 2021 • edited

kubevirt-bot commented Aug 16, 2021

rmohr commented Aug 16, 2021

rmohr commented Aug 16, 2021

rmohr commented Aug 16, 2021

rmohr commented Aug 17, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rmohr commented Aug 18, 2021

kwiesmueller commented Aug 18, 2021

davidvossel left a comment

Choose a reason for hiding this comment

kubevirt-bot commented Aug 19, 2021

rmohr commented Sep 6, 2021

rmohr commented Sep 7, 2021

vladikr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vladikr commented Sep 9, 2021

rmohr commented Sep 9, 2021

rmohr commented Sep 9, 2021

vladikr commented Sep 9, 2021

kubevirt-commenter-bot commented Sep 9, 2021

kubevirt-commenter-bot commented Sep 10, 2021

rmohr commented Sep 10, 2021

rmohr commented Sep 10, 2021

kubevirt-commenter-bot commented Sep 10, 2021

kubevirt-bot commented Sep 10, 2021

kubevirt-bot commented Sep 10, 2021

rmohr commented Aug 16, 2021 •

edited