[KEP] Topology Manager - Update KEP for pod spec extension #1674

swhan91 · 2020-04-09T10:24:39Z

Pod Spec extension to describe topology policy
Improve TM to allow binding Pods with multiple topology policy to the same node.
Introduce pod-level-single-numa policy
Deprecation plan of the topology policy flag
The way to mitigate the issue from non-NUMA aware Pod scheduling

Signed-off-by: sw.han sw.han@samsung.com

* Separate Pod-level topology policy from node-level topology policy. * Extend Pod spec to describe a Pod-level topology policy. * Add Node-level `dynamic` policy. * Add Pod-level `pod-level-single-numa-node` policy. Signed-off-by: sw.han <sw.han@samsung.com>

Signed-off-by: Byonggon Chun <bg.chun@samsung.com>

k8s-ci-robot · 2020-04-09T10:24:44Z

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.

If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.
If you have done the above and are still having issues with the CLA being reported as unsigned, please log a ticket with the Linux Foundation Helpdesk: https://support.linuxfoundation.org/
Should you encounter any issues with the Linux Foundation Helpdesk, send a message to the backup e-mail support address at: login-issues@jira.linuxfoundation.org

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-ci-robot · 2020-04-09T10:24:47Z

Hi @swhan91. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2020-04-09T10:25:02Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: swhan91
To complete the pull request process, please assign derekwaynecarr
You can assign the PR to them by writing /assign @derekwaynecarr in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

keps/sig-node/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k-wiatrzyk · 2020-04-09T10:42:11Z

I signed it

bg-chun · 2020-04-09T12:55:27Z

/ok-to-test

Signed-off-by: Byonggon Chun <bg.chun@samsung.com>

bg-chun · 2020-04-09T13:21:04Z

@swhan91
Can you rewrite summary of PR?
Something like this:

Pod Spec extension to describe topology policy
Improve TM to allow binding Pods with multiple topology policy to the same node.
Introduce pod-level-single-numa policy
Deprecation plan of the topology policy flag
The way to mitigate the issue from non-NUMA aware Pod scheduling

AlexeyPerevalov

it looks good for me in general

AlexeyPerevalov · 2020-04-09T15:11:51Z

keps/sig-node/0035-20190130-topology-manager.md

-    1. Add a functional Topology Manager that queries hint providers in order
+     feature gate is disabled.
+    2. Add a functional Topology Manager that queries hint providers in order
       to compute a preferred socket mask for each container.


I think word socket is inappropriate here. I remember it was a mess (socket/NUMA node).

It seems clear to change to a NUMA node.

AlexeyPerevalov · 2020-04-09T15:21:21Z

keps/sig-node/0035-20190130-topology-manager.md


 ### New Component: Topology Manager

-This proposal is focused on a new component in the Kubelet called the


misprint and and

AlexeyPerevalov · 2020-04-09T15:38:51Z

keps/sig-node/0035-20190130-topology-manager.md

+  * The Topology Manager reads `v1core.PodSpec.Topology.Policy` field to coordinate assignment of resources for a Pod.

+## Practical challenges
+Non-NUMA aware scheduler may bind a pod to a node, where topology requirements cannot be satisfied. Topology Manager will reject a pod admission with TopologyError and pod will result in a `Terminated` state on a node.


that problem exists in current implementation with per node defined policies.

Technically, that problem originated from the concept of pod and its lifecycle.
In general, Pods remain until a human or controller process explicitly removes them.
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-lifetime

AlexeyPerevalov · 2020-04-10T07:29:02Z

keps/sig-node/0035-20190130-topology-manager.md

-The node-level topology policy takes precedence over the pod-level topology policy.
-The node-level policies will be deprecated later, because the utilization is better for servers to have dynamic policies.
+#### Topology Policy in Pod Spec
+The value of `v1core.PodSpec.Topology.Policy` field indicates the desired topology of Pod's resource assignment.


if only policy planed right now why not simplify it in following way:
v1core.PodSpec.nodeTopologyPolicy

I think if we have a plan to extend Topology (from my point of view just Topology too general, since there are Topology* in PodSpec), need to mention about it here, in this context it sill be clear why such hierarchy was chosen.

I see what you are talking about.
We don't have plan to introduce topology.cpuManagerPolicy or topology.cpuSiblings now.
So it seems to introduce v1core.PodSpec.topologyPolicy is a better way.
(nodeTopologyPolicy, I think it is a little bit wired because we are talking about topology of Pod not a node.)

…o topologyPolicy The proposed topology field was hierarchical(`v1core.PodSpec.Topology.Policy`). It is changed to `v1core.PodSpec.topologyPolicy` because there is no need to divide hierarchy at the moment. Signed-off-by: sw.han <sw.han@samsung.com>

ffromani · 2020-04-14T17:48:13Z

keps/sig-node/0035-20190130-topology-manager.md

+1. Using a controller that guarantees to bind a Pod to a node where the desired topology of Pod's resource assignment can be satisfied.
+    1. A new object is required to describe the desired topology, the object can be a new workload type `TopologySet` or CRD.
+2. A Topology-Aware Scheduler that would consider the topology of available resources on nodes.
+    1. Custom Scheduler could be provisioned to take care of the scheduling for high-performance applications. Detection of node's resources topology can be based on Node Feature Discovery with the usage of user-specified features.


Regarding using Node Feature Discovery here, a couple of questions: my understanding is a custome scheduler should take into account not only the node resource topology (e.g. "this node has two NUMA nodes") but also the node resource availability (e.g. "this node has two NUMA nodes, and 4 cores free on NUMA node 0 and 6 core free on NUMA node 1") at the moment of the scheduling decision, right?

Right, taking node resource availability into account would be very important during scheduling, this could prevent assigning a pod to the node where each particular NUMA node doesn't have enough CPUs but in common they have enough.
Example here (only CPU but there are also other resources):

pod request: 4 exclusive CPUs from one NUMA node

NUMA 1: 3 free CPUs, NUMA 2: 2 free CPUs, Together: 5 free CPUs

Additionally topologyPolicy field specified in Pod Spec could be used to inform the scheduler if it should look on NUMA resources availability or not, depending on selected policy.

Yep, hence my second question: is really NFD a good place to expose this information? Reasons I'm asking:

AFAIU node-feature-labels aren't supposed to change very often; surely the node allocation changes very frequently, so I wonder if this is the right approach to convey this information. Maybe we should move in the direction of the prometheus node_exporter ?

did you planned already about how to extract the current allocation values (e.g. how many cores per-NUMA are actually being allocated) and expose them?

Last: while important in the big picture, I'm not sure this KEP PR is the right place to discuss this sub-topic, should we move in another venue?

Here is an Issue #90204 created to discuss this:
kubernetes/kubernetes#90204

fejta-bot · 2020-07-15T11:56:19Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

k8s-ci-robot · 2020-07-15T11:56:26Z

@swhan91: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

) * feat: initial arbiter cluster enhancement Signed-off-by: ehila <ehila@redhat.com> * upkeep: updated with suggestions Signed-off-by: ehila <ehila@redhat.com> * upkeep: add clear HA goal Signed-off-by: ehila <ehila@redhat.com> * feat: updated implementation detail updated CEO changes to include idea for wrapping informer and lister to minimize CEO code changes updated MCO changes to include bootstrap files for arbiter node added installer changes needed to include new machine types and install config changes Signed-off-by: ehila <ehila@redhat.com> * update: reviewers and component changes added appropriate reviewers for the component changes added more detail around the specific component changes Signed-off-by: ehila <ehila@redhat.com> * upkeep: add more details around test plan Signed-off-by: ehila <ehila@redhat.com> * upkeep: added input from comments added more components that need to be changed updated wording to correctly reflect whats being changed during installer file generations added clearity around the use and purpose of the currently identified components added more test cases that should be covered Signed-off-by: ehila <ehila@redhat.com> * upkeep: add olm filtering wording and olm reviewer Signed-off-by: ehila <ehila@redhat.com> * upkeep: updated wording and focused on baremetal installs added explicit focus on baremetal updated installer flow for baremetal updated wording around MCO expectations and validations added config naming suggestions for installer config Signed-off-by: ehila <ehila@redhat.com> * feat: wording fixes fixed wording around infrastructure topology Signed-off-by: ehila <ehila@redhat.com> * upkeep: add wording around testing for telco configurations Signed-off-by: ehila <ehila@redhat.com> * upkeep: added suggestions for baremetal changes added expectations for assisted installer and agent based installer fix wording and spelling Signed-off-by: ehila <ehila@redhat.com> --------- Signed-off-by: ehila <ehila@redhat.com>

swhan91 and others added 2 commits April 8, 2020 18:31

add details of pod spec extension and deprecation plan of policy flag

cad6e51

Signed-off-by: Byonggon Chun <bg.chun@samsung.com>

k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 9, 2020

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels Apr 9, 2020

k8s-ci-robot requested review from dchen1107 and derekwaynecarr April 9, 2020 10:25

k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Apr 9, 2020

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Apr 9, 2020

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 9, 2020

Add Practical Limitations Section & fixing language

369e58c

Signed-off-by: Byonggon Chun <bg.chun@samsung.com>

bg-chun force-pushed the new-policy branch from 24d268d to 369e58c Compare April 9, 2020 13:07

AlexeyPerevalov approved these changes Apr 9, 2020

View reviewed changes

AlexeyPerevalov reviewed Apr 10, 2020

View reviewed changes

ffromani reviewed Apr 14, 2020

View reviewed changes

k-wiatrzyk mentioned this pull request Apr 16, 2020

NUMA-aware scheduling (integration with Topology Manager) kubernetes/kubernetes#90204

Closed

k8s-ci-robot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jul 15, 2020

swhan91 closed this Jul 16, 2020


		### New Component: Topology Manager

		This proposal is focused on a new component in the Kubelet called the

[KEP] Topology Manager - Update KEP for pod spec extension #1674

[KEP] Topology Manager - Update KEP for pod spec extension #1674

Uh oh!

Conversation

swhan91 commented Apr 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Apr 9, 2020

Uh oh!

k8s-ci-robot commented Apr 9, 2020

Uh oh!

k8s-ci-robot commented Apr 9, 2020

Uh oh!

k-wiatrzyk commented Apr 9, 2020

Uh oh!

bg-chun commented Apr 9, 2020

Uh oh!

bg-chun commented Apr 9, 2020

Uh oh!

AlexeyPerevalov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

k-wiatrzyk Apr 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fejta-bot commented Jul 15, 2020

Uh oh!

k8s-ci-robot commented Jul 15, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

swhan91 commented Apr 9, 2020 •

edited

Loading

k-wiatrzyk Apr 15, 2020 •

edited

Loading