-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure optimal CPU pinning with dedicated CPUs #6251
Conversation
Skipping CI for Draft Pull Request. |
/cc @vasiliy-ul |
/test pull-kubevirt-e2e-k8s-1.20-sig-compute |
/test pull-kubevirt-unit-test |
Should be ready for review. |
thread := c.smallJunks[0] | ||
c.smallJunks = c.smallJunks[1:] | ||
return &thread | ||
} else if len(c.bigThreadJunks) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The else here is not needed.
You could also invert the check and return nil if len(c.bigThreadJunks) < 1
to remove one level of complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I keep the logical flow in the code.
// go to the next cell | ||
break | ||
} | ||
if requested == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you checking for negative values somewhere or could requested ever be negative?
If so I'd prefer a < 1
check here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a panic since that must never happen. Should at least block any hot-looping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You added the panic here: https://github.com/kubevirt/kubevirt/pull/6251/files#diff-770be27cf2d6dd3b86265236daa60480efc09ce100e00ec7c003051a8572dc9bR262
But not on this line, should it be both places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in this case it is pretty clear that we can't hot-loop. we only ever do request--
in the same loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, if we are sure requested can never be 0 from the start.
if remaining > 0 { | ||
if p.allowCellCrossing { | ||
return nil, fmt.Errorf("not enough exclusive threads provided, could not fit %v core(s)", remaining) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what would be a scenario where allowCellCrossing == true but remaining cores > 0. Is this only possible when someone requests threads per core where the number of threads can't possibly match the number of cores?
I'm just trying to understand if this is something that can be caught in validation webhook.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one of the main issues with all this. We don't know what we get until the kubelet is done and everything is started. We do not know up to this point if we can map cpu threads in a reasonable way. Note that for cpu pinning (so no numa mapping), we should always get a working pinning but it can be inefficient. For numa passthrough it is more likely that it will fail, because there are cases where we can't create a correct mapping out of the assigned CPUs because we can't form via libvirt topologies where cpus have different amounts of threads.
Just got some of the small nits remaining from before. /retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: davidvossel The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
1 similar comment
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, it all looks very good to me.
I think we should add the number of threads validation for dedicatedCPUs as well.
I would also suggest renaming the bigThreadChunks and smallChunks to fullCoresList or fragmentedCoresList - or anything similar. I think it would make it easier for anyone who will read this code to faster understand the context - but it's up to you :)
@@ -1015,6 +1015,16 @@ func validateNUMA(field *k8sfield.Path, spec *v1.VirtualMachineInstanceSpec, con | |||
Field: field.Child("domain", "cpu", "numa", "guestMappingPassthrough").String(), | |||
}) | |||
} | |||
if spec.Domain.CPU != nil && spec.Domain.CPU.Threads > 2 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also add this validation to validateCpuPinning
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
func (b *CPUSiblings) UnmarshalXMLAttr(attr xml.Attr) error { | ||
if attr.Value != "" { | ||
if list, err := hwutil.ParseCPUSetLine(attr.Value, 100); err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this easily be more than 100?
I think the limit is much higher, around 8192
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100 is just the limit for cpu threads. For GetPodCPUSet
where we are reading what we assign to the pods it is set to 50000
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, I see.
Signed-off-by: Roman Mohr <rmohr@redhat.com>
Siblings could be reported as ranges on some CPUs where more than two threads can exist per core. Deal with that situation. Further add a safety limit to the CPU parsing utility function to avoid arbitrary sized expansions. Signed-off-by: Roman Mohr <rmohr@redhat.com>
/lgtm |
Remove the term `chunk` to avoid any confusions and talk about fragmented and not fragmented threads instead. Signed-off-by: Roman Mohr <rmohr@redhat.com>
Done. |
/unhold |
👍 |
/retest |
1 similar comment
/retest |
/retest |
1 similar comment
/retest |
/retest |
@rmohr: #6251 failed to apply on top of branch "release-0.44":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@rmohr: new pull request created: #6384 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Ensure optimal CPU pinning with dedicated CPUs (cherry picked from commit b9ee32a) Signed-off-by: Roman Mohr <rmohr@redhat.com>
Ensure optimal CPU pinning with dedicated CPUs (cherry picked from commit b9ee32a) Signed-off-by: Roman Mohr <rmohr@redhat.com>
What this PR does / why we need it:
The algorithm first creates buckets based on the following cpu
attributes:
Then the algorithm will assign threads to vCPU cores with the following
priority:
from a single numa node.
not cross numa node boundaries.
threads from different numa nodes to form a full vCPU core.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #6159
Fixes #4687
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1987329
Replaces #4757
Special notes for your reviewer:
Release note: