Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet: Add a flag "MaxAllowableNUMANodes" #124148

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

cyclinder
Copy link
Contributor

add a flag MaxAllowableNUMANodes for kubelet, which can make maxAllowableNUMANodes in the kubelet's TopologyManager configurable.

What type of PR is this?

/kind feature

What this PR does / why we need it:

add a flag MaxAllowableNUMANodes for kubelet, which configures maxAllowableNUMANodes in the kubelet's TopologyManager.

Which issue(s) this PR fixes:

Fixes #124144

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Add a flag "MaxAllowableNUMANodes" to configures maxAllowableNUMANodes in the kubelet's TopologyManager

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 2, 2024
@k8s-ci-robot
Copy link
Contributor

Please note that we're already in Test Freeze for the release-1.30 branch. This means every merged PR will be automatically fast-forwarded via the periodic ci-fast-forward job to the release branch of the upcoming v1.30.0 release.

Fast forwards are scheduled to happen every 6 hours, whereas the most recent run was: Tue Apr 2 08:19:47 UTC 2024.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Apr 2, 2024
@cyclinder
Copy link
Contributor Author

/sig-node

@k8s-ci-robot k8s-ci-robot added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 2, 2024
@@ -90,6 +90,16 @@ type KubeletFlags struct {
// additional configs to overwrite what is provided by default and in the KubeletConfigFile flag
KubeletDropinConfigDirectory string

// maxAllowableNUMANodes specifies the maximum number of NUMA Nodes that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be moved to the kubelet config I think. We are aiming to avoid adding parameters to the kubelet CLI. I'd add it both in the staging directory and pkg/kubelet/apis/config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the review.

We are aiming to avoid adding parameters to the kubelet CLI.

yes, makes sense, I'll add it later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't think this should be a top-level flag, but rather a topology-manager policy option. See the following for the PR that added policy options, including the very first option: #112914

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A policy option would indeed let us plug in the existing framework without new API reviews. Perhaps this is the way to go for all the new flags and features for topology manager?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't think this should be a top-level flag, but rather a topology-manager policy option. See the following for the PR that added policy options, including the very first option:
#112914.

Interesting, I would have thought that this seems more apt as a top level kubelet flag behind a feature gate given its nature and relevance across all policies and policy options. If the value is not specified and feature gate not enabled, we can default to the old value of 8 and preserve backward compatibility with minimal code changes.

We would have to go through graduation process irrespective of its implementation as a top level flag or a policy option. What is the rationale behind proposing this as a policy option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what's the conclusion here? We only need a policy option or a feature-gate. I prefer to have a policy option here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still believe it should be a policy option. It scopes it beneath the TopologyManager better than having a top level flag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on your comments, I've made some updates, Could you retake a review? thanks!

Copy link
Contributor

@kannon92 kannon92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kannon92
Copy link
Contributor

kannon92 commented Apr 2, 2024

/cc @klueska @ffromani

@kannon92
Copy link
Contributor

kannon92 commented Apr 2, 2024

/triage accepted
/priority important-longterm

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Apr 2, 2024
cmd/kubelet/app/options/options.go Outdated Show resolved Hide resolved
// generate hints for them. As such, if more NUMA Nodes than this are
// present on a machine and the TopologyManager is enabled, an error will
// be returned and the TopologyManager will not be loaded.
maxAllowableNUMANodes = 8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd keep this constant but renamed like DefaultMaxAllowableNUMANodes - if user doesn't specify a new value, this is the recommended default. The user should probably not be enabled to set less (what's the point?) but if they set more, they should IMO get a warning both in the logs and from the docs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That make sense, thanks for the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but if they set more, they should IMO get a warning both in the logs and from the docs

Can you tell me where this warning or log or docs would be better if they set more? thanks!

@swatisehgal
Copy link
Contributor

/cc

@bart0sh bart0sh added this to Triage in SIG Node PR Triage Apr 3, 2024
@bart0sh bart0sh moved this from Triage to Needs Reviewer in SIG Node PR Triage Apr 3, 2024
Copy link
Contributor

@swatisehgal swatisehgal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would need a KEP for this work so please ensure that this is captured in the Node 1.31 planning document: https://docs.google.com/document/d/1U10J0WwgWXkdYrqWGGvO8iH2HKeerQAlygnqgDgWv4E/edit.

@@ -90,6 +90,16 @@ type KubeletFlags struct {
// additional configs to overwrite what is provided by default and in the KubeletConfigFile flag
KubeletDropinConfigDirectory string

// maxAllowableNUMANodes specifies the maximum number of NUMA Nodes that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't think this should be a top-level flag, but rather a topology-manager policy option. See the following for the PR that added policy options, including the very first option:
#112914.

Interesting, I would have thought that this seems more apt as a top level kubelet flag behind a feature gate given its nature and relevance across all policies and policy options. If the value is not specified and feature gate not enabled, we can default to the old value of 8 and preserve backward compatibility with minimal code changes.

We would have to go through graduation process irrespective of its implementation as a top level flag or a policy option. What is the rationale behind proposing this as a policy option?

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: cyclinder
Once this PR has been reviewed and has the lgtm label, please assign klueska for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

…Options

add a flag MaxAllowableNUMANodes for kubelet, which can make maxAllowableNUMANodes in the kubelet's TopologyManager configurable.

Signed-off-by: cyclidner <kuocyclinder@gmail.com>
@cyclinder
Copy link
Contributor Author

Hi @PiotrProkop @ffromani Could you please take a look?

@PiotrProkop
Copy link
Contributor

The changes looks good to me. I just think we need KEP for this effort(should be pretty straightforward, I can help with the KEP if needed) and when it will be approved we can track this change for k8s 1.31 😄

@ffromani
Copy link
Contributor

The changes looks good to me. I just think we need KEP for this effort(should be pretty straightforward, I can help with the KEP if needed) and when it will be approved we can track this change for k8s 1.31 😄

I agree (this is also something @swatisehgal highlighted e.g. in #124148 (comment)). Either approaches - a new flag or a policy option are fine to me, but either will require some extra process as it stands today

@cyclinder
Copy link
Contributor Author

require some extra process as it stands today

Is the extra process a KEP or something else? I am willing to submit a KEP for this 😄

@ffromani
Copy link
Contributor

ffromani commented Apr 30, 2024

require some extra process as it stands today

Is the extra process a KEP or something else? I am willing to submit a KEP for this 😄

Regarding the extra process I was thinking to the API review for the flag addition. AFAIK the flag addition won't need a Feature Gate nor a KEP, at least not in the general case. If so, it will emerge during the API review I think.
For the policy option, it will require a KEP as the process stands today AFAIK.

EDIT: OTOH the new option can likely start from beta stage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
SIG Node PR Triage
Needs Reviewer
Development

Successfully merging this pull request may close these issues.

Make maxAllowableNUMANodes in the kubelet's TopologyManager configurable
7 participants