Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Topology Manager Scope (container | pod) Feature #92967

Merged
merged 18 commits into from
Nov 12, 2020

Conversation

cezaryzukowski
Copy link
Contributor

@cezaryzukowski cezaryzukowski commented Jul 10, 2020

What type of PR is this?
/kind feature

What this PR does / why we need it:
To enable pod-level resource affinity (details in the update to the KEP).

Special notes for your reviewer:
Add support for pod-level affinity next to the existing container-level affinity. Add the "--topology-manager-scope" flag to the kubelet binary and the "topologyManagerScope" field in the kubelet v1beta1 configuration. The value for both can be either "pod" or "container" (defaults to "container").

Does this PR introduce a user-facing change?:

New flag is introduced, i.e. --topology-manager-scope=container|pod. 
The default value is the "container" scope.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
[KEP]: kubernetes/enhancements#1752

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jul 10, 2020
@k8s-ci-robot
Copy link
Contributor

Welcome @cezaryzukowski!

It looks like this is your first PR to kubernetes/kubernetes 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/kubernetes has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

Hi @cezaryzukowski. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 10, 2020
@k8s-ci-robot k8s-ci-robot added area/kubelet kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jul 10, 2020
@fejta-bot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@k-wiatrzyk
Copy link
Contributor

/sig node

@k-wiatrzyk k-wiatrzyk force-pushed the tm-scope branch 2 times, most recently from 97d9b81 to c237f8f Compare July 14, 2020 09:49
@k-wiatrzyk
Copy link
Contributor

/cc @klueska

@@ -530,6 +530,8 @@ func AddKubeletConfigFlags(mainfs *pflag.FlagSet, c *kubeletconfig.KubeletConfig
fs.Int32Var(&c.PodsPerCore, "pods-per-core", c.PodsPerCore, "Number of Pods per core that can run on this Kubelet. The total number of Pods on this Kubelet cannot exceed max-pods, so max-pods will be used if this calculation results in a larger number of Pods allowed on the Kubelet. A value of 0 disables this limit.")
fs.BoolVar(&c.ProtectKernelDefaults, "protect-kernel-defaults", c.ProtectKernelDefaults, "Default kubelet behaviour for kernel tuning. If set, kubelet errors if any of kernel tunables is different than kubelet defaults.")
fs.StringVar(&c.ReservedSystemCPUs, "reserved-cpus", c.ReservedSystemCPUs, "A comma-separated list of CPUs or CPU ranges that are reserved for system and kubernetes usage. This specific list will supersede cpu counts in --system-reserved and --kube-reserved.")
fs.StringVar(&c.TopologyManagerScope, "topology-manager-scope", c.TopologyManagerScope, "Topology Manager Scope represents the scope of topology hint generation that topology manager requests and hint providers generates. Possible values: 'container', 'pod'.")
Copy link
Member

@SergeyKanzhelev SergeyKanzhelev Jul 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mention default value:

Suggested change
fs.StringVar(&c.TopologyManagerScope, "topology-manager-scope", c.TopologyManagerScope, "Topology Manager Scope represents the scope of topology hint generation that topology manager requests and hint providers generates. Possible values: 'container', 'pod'.")
fs.StringVar(&c.TopologyManagerScope, "topology-manager-scope", c.TopologyManagerScope, "Topology Manager Scope represents the scope of topology hint generation that topology manager requests and hint providers generates. Possible values: 'container' (default), 'pod'.")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -530,6 +530,8 @@ func AddKubeletConfigFlags(mainfs *pflag.FlagSet, c *kubeletconfig.KubeletConfig
fs.Int32Var(&c.PodsPerCore, "pods-per-core", c.PodsPerCore, "Number of Pods per core that can run on this Kubelet. The total number of Pods on this Kubelet cannot exceed max-pods, so max-pods will be used if this calculation results in a larger number of Pods allowed on the Kubelet. A value of 0 disables this limit.")
fs.BoolVar(&c.ProtectKernelDefaults, "protect-kernel-defaults", c.ProtectKernelDefaults, "Default kubelet behaviour for kernel tuning. If set, kubelet errors if any of kernel tunables is different than kubelet defaults.")
fs.StringVar(&c.ReservedSystemCPUs, "reserved-cpus", c.ReservedSystemCPUs, "A comma-separated list of CPUs or CPU ranges that are reserved for system and kubernetes usage. This specific list will supersede cpu counts in --system-reserved and --kube-reserved.")
fs.StringVar(&c.TopologyManagerScope, "topology-manager-scope", c.TopologyManagerScope, "Topology Manager Scope represents the scope of topology hint generation that topology manager requests and hint providers generates. Possible values: 'container', 'pod'.")
Copy link
Member

@SergeyKanzhelev SergeyKanzhelev Jul 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this sentence is really hard to read and parse. Try to rephrase. Maybe like this:

Suggested change
fs.StringVar(&c.TopologyManagerScope, "topology-manager-scope", c.TopologyManagerScope, "Topology Manager Scope represents the scope of topology hint generation that topology manager requests and hint providers generates. Possible values: 'container', 'pod'.")
fs.StringVar(&c.TopologyManagerScope, "topology-manager-scope", c.TopologyManagerScope, "Scope to which topology hints applied. Topology Manager collects hints from Hint Providers and applies them to defined scope to ensure the pod admission. Possible values: 'container' (default), 'pod'.")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -67,6 +67,12 @@ const (
// SingleNumaNodeTopologyManager Policy iis a mode in which kubelet only allows
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated to this PR, but mentioning anyway:

Suggested change
// SingleNumaNodeTopologyManager Policy iis a mode in which kubelet only allows
// SingleNumaNodeTopologyManager Policy is a mode in which kubelet only allows

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -117,6 +117,9 @@ func ValidateKubeletConfiguration(kc *kubeletconfig.KubeletConfiguration) error
if kc.TopologyManagerPolicy != kubeletconfig.NoneTopologyManagerPolicy && !localFeatureGate.Enabled(features.TopologyManager) {
allErrors = append(allErrors, fmt.Errorf("invalid configuration: TopologyManager %v requires feature gate TopologyManager", kc.TopologyManagerPolicy))
}
if kc.TopologyManagerScope != kubeletconfig.ContainerScopeTopology && !localFeatureGate.Enabled(features.TopologyManager) {
allErrors = append(allErrors, fmt.Errorf("invalid configuration: TopologyScope %v requires feature gate TopologyManager", kc.TopologyManagerScope))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
allErrors = append(allErrors, fmt.Errorf("invalid configuration: TopologyScope %v requires feature gate TopologyManager", kc.TopologyManagerScope))
allErrors = append(allErrors, fmt.Errorf("invalid configuration: TopologyManagerScope %v requires feature gate TopologyManager", kc.TopologyManagerScope))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -221,6 +227,12 @@ type KubeletConfiguration struct {
// TopologyManagerPolicy is the name of the policy to use.
// Policies other than "none" require the TopologyManager feature gate to be enabled.
TopologyManagerPolicy string
// Topology Manager Scope represents the scope of topology hint generation
// that topology manager requests and hint providers generates.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// that topology manager requests and hint providers generates.
// that topology manager requests and hint providers generate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -77,6 +77,11 @@ type Manager interface {
// and is consulted to achieve NUMA aware resource alignment among this
// and other resource controllers.
GetTopologyHints(*v1.Pod, *v1.Container) map[string][]topologymanager.TopologyHint

// GetTopologyHints implements the topologymanager. HintProvider Interface
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name in comment should match the name of the method

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// GetTopologyHints implements the topologymanager. HintProvider Interface
// GetPodLevelTopologyHints implements the topologymanager.HintProvider Interface

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed for GetPodTopologyHints

@@ -34,4 +34,8 @@ type Policy interface {
// and is consulted to achieve NUMA aware resource alignment among this
// and other resource controllers.
GetTopologyHints(s state.State, pod *v1.Pod, container *v1.Container) map[string][]topologymanager.TopologyHint
// GetTopologyHints implements the topologymanager. HintProvider Interface
Copy link
Member

@SergeyKanzhelev SergeyKanzhelev Jul 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name should match

Suggested change
// GetTopologyHints implements the topologymanager. HintProvider Interface
// GetPodLevelTopologyHints implements the topologymanager.HintProvider Interface

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed for GetPodTopologyHints

@@ -78,6 +78,12 @@ func (m *ManagerImpl) GetTopologyHints(pod *v1.Pod, container *v1.Container) map
return deviceHints
}

// GetPodLevelTopologyHints implements the TopologyManager HintProvider Interface which
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// GetPodLevelTopologyHints implements the TopologyManager HintProvider Interface which
// GetPodLevelTopologyHints implements the topologymanager.HintProvider Interface which

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -37,6 +37,17 @@ const (
// present on a machine and the TopologyManager is enabled, an error will
// be returned and the TopologyManager will not be loaded.
maxAllowableNUMANodes = 8
// containerScopeTopology specifies the TopologyManagerScope per container.
containerScopeTopology = "container"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it scopetopology, not topologyscope?

Suggested change
containerScopeTopology = "container"
containerTopologyScope = "container"

I'd also question dropping the Manager from the name. Why change terminology? Is the shorter name better here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

swhan91 and others added 12 commits November 12, 2020 12:25
Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
* Add podDevices() func
* Add getPodDeviceRequest() func

Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
* Extract common tests cases that will be used for both GetTopologyHints()
and GetPodTopologyHints()
* Extract machineInfo as it will be used for both functions as well

Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
* Add tests for getPodRequestedCPU()

Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
Pod object is more flexible to use and construct
* Update TestGetTopologyHints() to work according to new test cases
* Update topologyHintTestCase{} to include proper field

Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
* Add additional test cases returned by getPodScopeTestCases()

Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
A suite of e2e tests was created for Topology Manager
so as to test pod scope alignment feature.

Co-authored-by: Pawel Rapacz <p.rapacz@partner.samsung.com>
Co-authored-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
Signed-off-by: Cezary Zukowski <c.zukowski@samsung.com>
Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cezaryzukowski, derekwaynecarr, klueska, smarterclayton

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k-wiatrzyk

This comment has been minimized.

@k8s-ci-robot k8s-ci-robot removed sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Nov 12, 2020
@k-wiatrzyk
Copy link
Contributor

/test pull-kubernetes-e2e-kind-ipv6

@klueska
Copy link
Contributor

klueska commented Nov 12, 2020

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 12, 2020
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Nov 12, 2020

@cezaryzukowski: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubernetes-e2e-gce-storage-snapshot 80b891f9a734e46dc525393123110fb787f37ac3 link /test pull-kubernetes-e2e-gce-storage-snapshot
pull-kubernetes-e2e-gce-storage-slow 80b891f9a734e46dc525393123110fb787f37ac3 link /test pull-kubernetes-e2e-gce-storage-slow
pull-kubernetes-e2e-azure-disk-windows 80b891f9a734e46dc525393123110fb787f37ac3 link /test pull-kubernetes-e2e-azure-disk-windows
pull-kubernetes-e2e-aks-engine-azure 80b891f9a734e46dc525393123110fb787f37ac3 link /test pull-kubernetes-e2e-aks-engine-azure
pull-kubernetes-e2e-azure-file 80b891f9a734e46dc525393123110fb787f37ac3 link /test pull-kubernetes-e2e-azure-file
pull-kubernetes-e2e-azure-file-windows 80b891f9a734e46dc525393123110fb787f37ac3 link /test pull-kubernetes-e2e-azure-file-windows
pull-kubernetes-e2e-gce-csi-serial 80b891f9a734e46dc525393123110fb787f37ac3 link /test pull-kubernetes-e2e-gce-csi-serial
pull-kubernetes-e2e-azure-disk 80b891f9a734e46dc525393123110fb787f37ac3 link /test pull-kubernetes-e2e-azure-disk
pull-kubernetes-e2e-azure-disk-vmss 80b891f9a734e46dc525393123110fb787f37ac3 link /test pull-kubernetes-e2e-azure-disk-vmss

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k-wiatrzyk
Copy link
Contributor

/test pull-kubernetes-e2e-gce-ubuntu-containerd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-review Categorizes an issue or PR as actively needing an API review. approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver area/code-generation area/e2e-test-framework Issues or PRs related to refactoring the kubernetes e2e test framework area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
Status: API review completed, 1.20
Development

Successfully merging this pull request may close these issues.

None yet