diff --git a/keps/prod-readiness/sig-node/4622.yaml b/keps/prod-readiness/sig-node/4622.yaml new file mode 100644 index 000000000000..da66d8345df8 --- /dev/null +++ b/keps/prod-readiness/sig-node/4622.yaml @@ -0,0 +1,3 @@ +kep-number: 4622 +beta: + approver: "@johnbelamaric" diff --git a/keps/sig-node/4622-topologymanager-max-allowable-numa-nodes/README.md b/keps/sig-node/4622-topologymanager-max-allowable-numa-nodes/README.md new file mode 100644 index 000000000000..db83816fbf32 --- /dev/null +++ b/keps/sig-node/4622-topologymanager-max-allowable-numa-nodes/README.md @@ -0,0 +1,790 @@ + + +# KEP-4622: New TopologyManager Policy which configure the value of maxAllowableNUMANodes + + + + + + +- [Release Signoff Checklist](#release-signoff-checklist) +- [Summary](#summary) +- [Goals](#goals) +- [Non-Goals](#non-goals) +- [Proposal](#proposal) + - [User Stories (Optional)](#user-stories-optional) + - [Story 1 Bytedance Database Performance Optimization](#story-1-bytedance-database-performance-optimization) + - [Story 2](#story-2) + - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional) + - [Risks and Mitigations](#risks-and-mitigations) +- [Design Details](#design-details) + - [Test Plan](#test-plan) + - [Prerequisite testing updates](#prerequisite-testing-updates) + - [Unit tests](#unit-tests) + - [Integration tests](#integration-tests) + - [e2e tests](#e2e-tests) + - [Graduation Criteria](#graduation-criteria) + - [Alpha](#alpha) + - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) + - [Version Skew Strategy](#version-skew-strategy) +- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) + - [Feature Enablement and Rollback](#feature-enablement-and-rollback) + - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) + - [Monitoring Requirements](#monitoring-requirements) + - [Dependencies](#dependencies) + - [Scalability](#scalability) + - [Troubleshooting](#troubleshooting) +- [Implementation History](#implementation-history) +- [Drawbacks](#drawbacks) +- [Alternatives](#alternatives) +- [Infrastructure Needed (Optional)](#infrastructure-needed-optional) + + +## Release Signoff Checklist + + + +Items marked with (R) are required *prior to targeting to a milestone / release*. + +- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) +- [ ] (R) KEP approvers have approved the KEP status as `implementable` +- [ ] (R) Design details are appropriately documented +- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) + - [ ] e2e Tests for all Beta API Operations (endpoints) + - [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) + - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free +- [ ] (R) Graduation criteria is in place + - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) +- [ ] (R) Production readiness review completed +- [ ] (R) Production readiness review approved +- [ ] "Implementation History" section is up-to-date for milestone +- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] +- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes + + + +[kubernetes.io]: https://kubernetes.io/ +[kubernetes/enhancements]: https://git.k8s.io/enhancements +[kubernetes/kubernetes]: https://git.k8s.io/kubernetes +[kubernetes/website]: https://git.k8s.io/website + +## Summary + + + +In this KEP, we propose a new TopologyManager Policy Option called `max-allowable-numa-nodes` to configure the value of maxAllowableNUMANodes in the TopologyManager. the current hard-coded value of 8 was added as a stop-gap 4 years ago, a configurable option policy option can improve the topology manager to remove the state explosion that occurs when trying to enumerate the possible NUMA affinities and generating their hints. and should be sufficient to allow users to increase the limit when appropriate. + +what's the maxAllowableNUMANodes? maxAllowableNUMANodes specifies the maximum number of NUMA Nodes that the TopologyManager supports on the underlying machine. At present, having more than this number of NUMA Nodes will result in a state explosion when trying to enumerate possible NUMAAffinity masks and generate hints for them. As such, if more NUMA Nodes than this are present on a machine and the TopologyManager is enabled, an error will be returned and the TopologyManager will not be loaded. + +## Goals + + +- Introduce a new TopologyManager Policy Option called `max-allowable-numa-nodes`. +- Improve the topology manager to remove the state explosion. + +## Non-Goals + + + +- This proposal does not aim to modify the existing TopologyManager Policies. It focuses solely on introducing a new policy for spreading the max allowable numa nodes. +- It does not address other resource allocation or management aspects within Kubernetes. + +## Proposal + + + +We propose to add a new `TopologyManager` policy option called `max-allowable-numa-nodes` to the static TopologyManager policy. It can configure the value of maxAllowableNUMANodes in the TopologyManager. the current hard-coded value of 8 was added as a stop-gap 4 years ago, a configurable option policy option can improve the topology manager to remove the state explosion that occurs when trying to enumerate the possible NUMA affinities and generating their hints. + +### User Stories (Optional) + + + +#### Story 1 + +### Notes/Constraints/Caveats (Optional) + + + +### Risks and Mitigations + + + +The risk associated with implementing this new proposal is minimal. It pertains only to a distinct policy option within the `TopologyManager` and is safeguarded by the option's inherent security measures, in addition to the default deactivation of the `TopologyManagerPolicyBetaOptions` feature gate. + +| Risk | Impact | Mitigation | +| -------------------------------------------------| -------| ---------- | +| setting a value too small causes kubelet crash | High | do not set it or set a larger value | + + +## Design Details + + + +Users can configure the value of maxAllowableNUMANodes in the TopologyManager when the kubelet starts up, If they are not set or are set to 0, this will fall back to using the default recommended value of 8. + +```go + case MaxAllowableNUMANodes: + optValue, err := strconv.Atoi(value) + if err != nil { + return opts, fmt.Errorf("bad value for option %q: %w", name, err) + } + opts.MaxAllowableNUMANodes = optValue + ... + + if opts.MaxAllowableNUMANodes == 0 { + opts.MaxAllowableNUMANodes = defaultMaxAllowableNUMANodes + } +``` + +### Test Plan + + + +[x] I/we understand the owners of the involved components may require updates to +existing tests to make this code solid enough prior to committing the changes necessary +to implement this enhancement. + +##### Prerequisite testing updates + + + +##### Unit tests + + + + + +- `k8s.io/kubernetes/pkg/kubelet/cm/topologymanager`: `20240405` - `91.5%` + +##### Integration tests + + + + + +No new integration tests for kubelet are planned. + +##### e2e tests + + + +TBD + +### Graduation Criteria + +#### Beta + +- Feature implemented behind the existing static policy feature flag +- Initial unit tests completed and coverage is improved +- Documents is improved and enough guidance and examples can be given to potential users. + +### Upgrade / Downgrade Strategy + + + +We anticipate no repercussions. The new policy option is voluntary and operates independently from the current selections. + +### Version Skew Strategy + +No changes needed. + + +## Production Readiness Review Questionnaire + + + +### Feature Enablement and Rollback + + + +###### How can this feature be enabled / disabled in a live cluster? + + + +- [x] Feature gate (also fill in values in `kep.yaml`) + - Feature gate name: `TopologyManagerPolicyBetaOptions` + - Components depending on the feature gate: `kubelet` +- [x] Change the kubelet configuration to set a TopologyManager policy of static and a TopologyManager policy option of `max-allowable-numa-nodes` + - Will enabling / disabling the feature require downtime of the control plane? No + - Will enabling / disabling the feature require downtime or reprovisioning of a node? (Do not assume Dynamic Kubelet Config feature is enabled). Yes -- a kubelet restart is required. + +###### Does enabling the feature change any default behavior? + + + +No. + +###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? + + + +Yes, When it is disabled once (i.e. no value is set), this falls back to the default behavior. + +###### What happens if we reenable the feature if it was previously rolled back? + +If we reactivate the feature after a rollback, the outcome remains unchanged. Current containers will retain their allocations, while newly created containers will be affected. + +###### Are there any tests for feature enablement/disablement? + +This new `TopologyManager` policy option start from beta stage. The unit test will test whether the configured value of max-allowable-numa-nodes is as expected and whether it is the default recommended value when it is not configured. + + + +### Rollout, Upgrade and Rollback Planning + + + +###### How can a rollout or rollback fail? Can it impact already running workloads? + + + +###### What specific metrics should inform a rollback? + + + + +###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? + + +We manually test it in our internal environment and it works. It's worth doing automated upgrade/rollback tests in the future. + +###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? + + +No. + +### Monitoring Requirements + + + +###### How can an operator determine if the feature is in use by workloads? + + + +Examine the kubelet configuration of a node to verify the existence of the feature gate and the utilization of the new policy option. + +###### How can someone using this feature know that it is working for their instance? + + + +- [ ] Events + - Event Reason: +- [ ] API .status + - Condition name: + - Other field: +- [x] Other (treat as last resort) + - Details: Inspect the kubelet configuration of the nodes: check feature gate and usage of the new option. + +###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? + + + +It won't cause any performance regression. So we don't need to define any SLOs for this feature. + + +###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? + + + +- [ ] Metrics + - Metric name: + - [Optional] Aggregation method: + - Components exposing the metric: +- [ ] Other (treat as last resort) + - Details: + +###### Are there any missing metrics that would be useful to have to improve observability of this feature? + + + +N/A + +### Dependencies + + + +N/A + +###### Does this feature depend on any specific services running in the cluster? + + + +No. It doesn't rely on other Kubernetes components. + +### Scalability + + + +###### Will enabling / using this feature result in any new API calls? + + +No + +###### Will enabling / using this feature result in introducing new API types? + + +No + +###### Will enabling / using this feature result in any new calls to the cloud provider? + + +No + +###### Will enabling / using this feature result in increasing size or count of the existing API objects? + + +No + +###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? + + +No + +###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? + + +No + +###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)? + + +No + +### Troubleshooting + + + +###### How does this feature react if the API server and/or etcd is unavailable? +N/A + +###### What are other known failure modes? + +N/A + +###### What steps should be taken if SLOs are not being met to determine the problem? + +## Implementation History + + + +## Drawbacks + +N/A + +## Alternatives + + + +## Infrastructure Needed (Optional) + + \ No newline at end of file diff --git a/keps/sig-node/4622-topologymanager-max-allowable-numa-nodes/kep.yaml b/keps/sig-node/4622-topologymanager-max-allowable-numa-nodes/kep.yaml new file mode 100644 index 000000000000..f2af562e2228 --- /dev/null +++ b/keps/sig-node/4622-topologymanager-max-allowable-numa-nodes/kep.yaml @@ -0,0 +1,39 @@ +title: New TopologyManager Policy which configure the value of maxAllowableNUMANodes +kep-number: 4622 +authors: + - "@cyclinder" +owning-sig: sig-node +participating-sigs: [] +status: implementable +creation-date: "2024-05-08" +reviewers: + - "@klueska" + - "@ffromani" +approvers: + - "@sig-node-tech-leads" +see-also: [] +replaces: [] + +# The target maturity stage in the current dev cycle for this KEP. +stage: beta + +# The most recent milestone for which work toward delivery of this KEP has been +# done. This can be the current (upcoming) milestone, if it is being actively +# worked on. +latest-milestone: "v1.31" + +# The milestone at which this feature was, or is targeted to be, at each stage. +milestone: + beta: "v1.31" + stable: "v1.32" + +# The following PRR answers are required at alpha release +# List the feature gate name and the components for which it must be enabled +feature-gates: + - name: "TopologyManagerPolicyBetaOptions" + components: + - kubelet +disable-supported: true + +# The following PRR answers are required at beta release +metrics: []