Skip to content

Conversation

@eggfoobar
Copy link
Contributor

/hold
Needs PR: openshift/api#1284
EP: openshift/enhancements#1213

Signed-off-by: ehila ehila@redhat.com

- What I did

Add support for generating a machine config file that will configure kubelet for workload partitioning based off of the infrastructure flag. After some discussion with the installer team, MCO seems to be the best place to enforce the CPU Partitioning configuration cluster wide.

- How to verify it
When the Infrastructure status resource is set to cpuPartitioning: AllNodes, we will generate two new MCs for workload partitioning named 01-master-cpu-partitioning and 01-worker-cpu-partitioning that will create a configuration file for kubelet to use on start up.

apiVersion: config.openshift.io/v1
kind: Infrastructure
metadata:
  name: cluster
status:
  cpuPartitioning: AllNodes

- Description for the changelog

Add new generated MCs for workload partitioning based off of infrastructure status.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 30, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 30, 2022

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: eggfoobar
Once this PR has been reviewed and has the lgtm label, please assign jkyros for approval by writing /assign @jkyros in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@eggfoobar eggfoobar force-pushed the cpu_partitioning_bootstrap branch from 35cc6fa to aa16592 Compare September 30, 2022 07:10
Signed-off-by: ehila <ehila@redhat.com>
@eggfoobar
Copy link
Contributor Author

/cc @yuqi-zhang

Hey Jerry, this is an updated PR from the previously closed one #3335 , the intention is the same but I took your suggestion and made it so MCO takes ownership of the MC for workload partitioning. While we wait for the PR for the API to go through, any thoughts or concerns on the implementation here?

@openshift-ci openshift-ci bot requested a review from yuqi-zhang October 3, 2022 17:30
@eggfoobar
Copy link
Contributor Author

/cc @rphillips @mrunalp

@openshift-ci openshift-ci bot requested review from mrunalp and rphillips October 4, 2022 17:54
Copy link
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this generally looks like an ok direction, although I always hesitate towards adding additional sub-controllers and managed MCs, at least this one doesn't intersect with other kubelet configuration as far as I can tell.

Will review in-depth once you're ready. Maybe someone from the node team should take a look as well since they technically own the kubeletconfigcontroller

}

func cpuPartitionMCName(role string) string {
return fmt.Sprintf("01-%s-cpu-partitioning", role)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this has a 01 prefix which means it is easily overwriteable by other MCs, which should be fine, since if I understand correctly, /etc/kubernetes/openshift-workload-pinning is the only file this will touch? And nothing else should be touching that file today?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, that file will be overwritten by other MCs later on. The intention is to set that file up originally to configure workload partitioning on the kubelet, then afterwords, when the user configures the PerformanceProfile via NTO, that file will be overwritten with CPUSet information that is defined in the PerformanceProfile. NTO is the one that will generate MCs to control that file afterwords.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 6, 2022

@eggfoobar: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws 964d12c link true /test e2e-aws
ci/prow/okd-scos-images 964d12c link true /test okd-scos-images

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 5, 2023
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 5, 2023
@openshift-merge-robot
Copy link
Contributor

@eggfoobar: PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@yuqi-zhang
Copy link
Contributor

@eggfoobar do you have any updates for this PR? Is this something we are still looking to do?

@eggfoobar
Copy link
Contributor Author

Hey @yuqi-zhang, apologies, closing this out now that we've changed the approach in the EP.

@eggfoobar eggfoobar closed this Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants