-
Notifications
You must be signed in to change notification settings - Fork 5.3k
KEP: Per QoS-Class CPU Affinity Hints #2739
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @ipuustin. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: If they are not already assigned, you can assign the PR to them by writing The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
vishh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At a high level, I'd recommend rethinking design considerations.
- Can nodes be dedicated for AVX workloads?
- Can AVX be disabled in clusters where they don't expect them to be consumed? - how many users run both AVX workloads today?
- Can CPUs be dynamically re-assigned? - We can then group AVX apps into separate sockets dynamically (when possible)
- Can applications be moved to other nodes if they cannot be locally optimized due to noisy-neighbor problems? - What if we expose an application performance score that could inform the de-scheduler (or re-scheduler) to move the application based on PDB?
|
|
||
| #### Variant 2: Workloads targeting AVX, run as Burstable or Guaranteed | ||
|
|
||
| Enable the existing CPU manager static policy. Use existing ability to advertise extended resources to allow a limited number of "AVX cores". Enforce that if a container consumes the "AVX cores" resource: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how will a user choose a limited number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the mechanism for advertising extended resources for a node: https://kubernetes.io/docs/tasks/administer-cluster/extended-resource-node/ . The user can check the node CPU capacity, and then based on that number and some policy choose a suitable number of AVX cores. An example policy might say that "half of the cores should be AVX cores" or something similar.
|
|
||
| Enable the existing CPU manager static policy. Use existing ability to advertise extended resources to allow a limited number of "AVX cores". Enforce that if a container consumes the "AVX cores" resource: | ||
| - AVX Cores = CPU | ||
| - QoS class is Guaranteed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What guarantees that a Gu pods consuming Avx will not disrupt another Avx pod not consuming Avx?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They still might, but at least the AVX and non-AVX pods wouldn't run on the same core. The current algorithm for choosing cores (takeByTopology() function) tries to first allocate full cores and sockets based on the request size.
|
Thanks for the comments @vishh ! Regarding the points you made:
|
|
/kind kep |
9546d0a to
630acc4
Compare
|
REMINDER: KEPs are moving to k/enhancements on November 30. Please attempt to merge this KEP before then to signal consensus. Any questions regarding this move should be directed to that thread and not asked on GitHub. |
|
KEPs have moved to k/enhancements. Any questions regarding this move should be directed to that thread and not asked on GitHub. |
|
@justaugustus: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This KEP addresses partitioning the CPUs present in the node to help mitigate the side effects caused by SIMD instructions (such as AVX instruction sets). The current version is meant to be a basis for discussion rather than a finalized proposal.
@ConnorDoyle
@kad
@klihub