Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure optimal CPU pinning with dedicated CPUs #6251

Merged
merged 6 commits into from
Sep 10, 2021

Conversation

rmohr
Copy link
Member

@rmohr rmohr commented Aug 16, 2021

What this PR does / why we need it:

The algorithm first creates buckets based on the following cpu
attributes:

  • thread junks in the cpuSet
  • cores per numa node

Then the algorithm will assign threads to vCPU cores with the following
priority:

  1. try to assign a consecutive junk of threads from a singl host core.
  2. if no full set of sibling threads could be assigned, assign threads
    from a single numa node.
  3. If numa passthrough is requested, fail on this step, since we must
    not cross numa node boundaries.
  4. if not enough threads are available on any numa node, try to assign
    threads from different numa nodes to form a full vCPU core.
  5. go back to (1) and repeat until all threads are assigned.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #6159
Fixes #4687
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1987329
Replaces #4757

Special notes for your reviewer:

Release note:

Better place vcpu threads on host cpus to form more efficient passthrough architectures

@kubevirt-bot kubevirt-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Aug 16, 2021
@kubevirt-bot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@rmohr
Copy link
Member Author

rmohr commented Aug 16, 2021

/cc @vasiliy-ul

@rmohr
Copy link
Member Author

rmohr commented Aug 16, 2021

/test pull-kubevirt-e2e-k8s-1.20-sig-compute

@rmohr
Copy link
Member Author

rmohr commented Aug 16, 2021

/test pull-kubevirt-unit-test

@rmohr rmohr marked this pull request as ready for review August 17, 2021 10:17
@kubevirt-bot kubevirt-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 17, 2021
@rmohr
Copy link
Member Author

rmohr commented Aug 17, 2021

Should be ready for review.

@kubevirt-bot kubevirt-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 17, 2021
pkg/virt-launcher/virtwrap/converter/converter.go Outdated Show resolved Hide resolved
pkg/virt-launcher/virtwrap/converter/converter.go Outdated Show resolved Hide resolved
pkg/virt-launcher/virtwrap/converter/converter.go Outdated Show resolved Hide resolved
thread := c.smallJunks[0]
c.smallJunks = c.smallJunks[1:]
return &thread
} else if len(c.bigThreadJunks) > 0 {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The else here is not needed.
You could also invert the check and return nil if len(c.bigThreadJunks) < 1 to remove one level of complexity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I keep the logical flow in the code.

pkg/virt-launcher/virtwrap/converter/vcpu/vcpu.go Outdated Show resolved Hide resolved
pkg/virt-launcher/virtwrap/converter/vcpu/vcpu.go Outdated Show resolved Hide resolved
// go to the next cell
break
}
if requested == 0 {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you checking for negative values somewhere or could requested ever be negative?
If so I'd prefer a < 1 check here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a panic since that must never happen. Should at least block any hot-looping.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this case it is pretty clear that we can't hot-loop. we only ever do request-- in the same loop.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, if we are sure requested can never be 0 from the start.

pkg/virt-launcher/virtwrap/converter/vcpu/vcpu.go Outdated Show resolved Hide resolved
pkg/virt-launcher/virtwrap/converter/vcpu/vcpu.go Outdated Show resolved Hide resolved
pkg/virt-launcher/virtwrap/converter/vcpu/vcpu.go Outdated Show resolved Hide resolved
Comment on lines 163 to 173
if remaining > 0 {
if p.allowCellCrossing {
return nil, fmt.Errorf("not enough exclusive threads provided, could not fit %v core(s)", remaining)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would be a scenario where allowCellCrossing == true but remaining cores > 0. Is this only possible when someone requests threads per core where the number of threads can't possibly match the number of cores?

I'm just trying to understand if this is something that can be caught in validation webhook.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the main issues with all this. We don't know what we get until the kubelet is done and everything is started. We do not know up to this point if we can map cpu threads in a reasonable way. Note that for cpu pinning (so no numa mapping), we should always get a working pinning but it can be inefficient. For numa passthrough it is more likely that it will fail, because there are cases where we can't create a correct mapping out of the assigned CPUs because we can't form via libvirt topologies where cpus have different amounts of threads.

@kubevirt-bot kubevirt-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 18, 2021
@rmohr
Copy link
Member Author

rmohr commented Aug 18, 2021

@kwiesmueller @davidvossel PTAL

@kwiesmueller
Copy link

Just got some of the small nits remaining from before.
Feel free to resolve if necessary.

/retest

Copy link
Member

@davidvossel davidvossel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: davidvossel

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rmohr
Copy link
Member Author

rmohr commented Sep 6, 2021

/retest

1 similar comment
@rmohr
Copy link
Member Author

rmohr commented Sep 7, 2021

/retest

Copy link
Member

@vladikr vladikr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, it all looks very good to me.
I think we should add the number of threads validation for dedicatedCPUs as well.

I would also suggest renaming the bigThreadChunks and smallChunks to fullCoresList or fragmentedCoresList - or anything similar. I think it would make it easier for anyone who will read this code to faster understand the context - but it's up to you :)

@@ -1015,6 +1015,16 @@ func validateNUMA(field *k8sfield.Path, spec *v1.VirtualMachineInstanceSpec, con
Field: field.Child("domain", "cpu", "numa", "guestMappingPassthrough").String(),
})
}
if spec.Domain.CPU != nil && spec.Domain.CPU.Threads > 2 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add this validation to validateCpuPinning?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


func (b *CPUSiblings) UnmarshalXMLAttr(attr xml.Attr) error {
if attr.Value != "" {
if list, err := hwutil.ParseCPUSetLine(attr.Value, 100); err == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this easily be more than 100?
I think the limit is much higher, around 8192

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100 is just the limit for cpu threads. For GetPodCPUSet where we are reading what we assign to the pods it is set to 50000.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I see.

Signed-off-by: Roman Mohr <rmohr@redhat.com>
Siblings could be reported as ranges on some CPUs where more than two
threads can exist per core. Deal with that situation.

Further add a safety limit to the CPU parsing utility function to avoid
arbitrary sized expansions.

Signed-off-by: Roman Mohr <rmohr@redhat.com>
@vladikr
Copy link
Member

vladikr commented Sep 9, 2021

/lgtm

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Sep 9, 2021
Remove the term `chunk` to avoid any confusions and talk about
fragmented and not fragmented threads instead.

Signed-off-by: Roman Mohr <rmohr@redhat.com>
@kubevirt-bot kubevirt-bot removed the lgtm Indicates that a PR is ready to be merged. label Sep 9, 2021
@rmohr
Copy link
Member Author

rmohr commented Sep 9, 2021

I would also suggest renaming the bigThreadChunks and smallChunks to fullCoresList or fragmentedCoresList - or anything similar. I think it would make it easier for anyone who will read this code to faster understand the context - but it's up to you :)

Done.

@rmohr
Copy link
Member Author

rmohr commented Sep 9, 2021

/unhold

@kubevirt-bot kubevirt-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 9, 2021
@vladikr
Copy link
Member

vladikr commented Sep 9, 2021

👍
/lgtm

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Sep 9, 2021
@kubevirt-commenter-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs.
Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

1 similar comment
@kubevirt-commenter-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs.
Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@rmohr
Copy link
Member Author

rmohr commented Sep 10, 2021

/retest

1 similar comment
@rmohr
Copy link
Member Author

rmohr commented Sep 10, 2021

/retest

@kubevirt-commenter-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs.
Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@kubevirt-bot kubevirt-bot merged commit b9ee32a into kubevirt:main Sep 10, 2021
@kubevirt-bot
Copy link
Contributor

@rmohr: #6251 failed to apply on top of branch "release-0.44":

Applying: Pass thread information to virt-launcher for pinning decisions
Applying: Add an algorithm to improve CPU pinning
Applying: Make use of the new cpu pinning assignment algorithm
Using index info to reconstruct a base tree...
M	pkg/virt-launcher/virtwrap/converter/converter.go
M	pkg/virt-launcher/virtwrap/converter/converter_test.go
M	pkg/virt-launcher/virtwrap/converter/network.go
M	pkg/virt-launcher/virtwrap/manager.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/virt-launcher/virtwrap/manager.go
Removing pkg/virt-launcher/virtwrap/converter/vcpu_placement.go
Auto-merging pkg/virt-launcher/virtwrap/converter/network.go
Auto-merging pkg/virt-launcher/virtwrap/converter/converter_test.go
CONFLICT (content): Merge conflict in pkg/virt-launcher/virtwrap/converter/converter_test.go
Auto-merging pkg/virt-launcher/virtwrap/converter/converter.go
CONFLICT (content): Merge conflict in pkg/virt-launcher/virtwrap/converter/converter.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0003 Make use of the new cpu pinning assignment algorithm
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherrypick release-0.44
/cherrypick release-0.45

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kubevirt-bot
Copy link
Contributor

@rmohr: new pull request created: #6384

In response to this:

/cherrypick release-0.45

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

rmohr pushed a commit to rmohr/kubevirt that referenced this pull request Sep 13, 2021
Ensure optimal CPU pinning with dedicated CPUs

(cherry picked from commit b9ee32a)
Signed-off-by: Roman Mohr <rmohr@redhat.com>
rmohr pushed a commit to rmohr/kubevirt that referenced this pull request Sep 13, 2021
Ensure optimal CPU pinning with dedicated CPUs

(cherry picked from commit b9ee32a)
Signed-off-by: Roman Mohr <rmohr@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

v0.43 - CPU pinning return wrong CPU topology on VMI Hyperthreads are pinned incorrectly
7 participants