[KEP-5246] Migrate to systemd's cgroup v1 CPU shares to v2 CPU weight formula #5247

iholder101 · 2025-04-16T09:14:21Z

One-line PR description: Migrate to systemd's cgroup v1 CPU shares to v2 CPU weight formula.
Issue link: Migrate to systemd's cgroup v1 CPU shares to v2 CPU weight formula #5246
Other comments:
See discussion on Conversion of cgroup v1 CPU shares to v2 CPU weight causes workloads to have low CPU priority kubernetes#131216

KEP in a human-readable format: https://github.com/iholder101/kubernetes-enhancements/blob/kep/systemd_cpu_cgroup_conversion/keps/sig-node/5246-cgroup-cpu-share-to-weight-conversion/README.md

Signed-off-by: Itamar Holder <iholder@redhat.com>

iholder101 · 2025-04-16T10:36:20Z

FYI @vladikr

pacoxu · 2025-04-27T03:37:58Z

/cc @giuseppe
for #2254
/assign @yujuhong @dchen1107 @derekwaynecarr

giuseppe

LGTM

k8s-ci-robot · 2025-04-28T08:13:48Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: giuseppe, iholder101
Once this PR has been reviewed and has the lgtm label, please assign dchen1107, soltysh for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

keps/prod-readiness/OWNERS
keps/sig-node/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

yujuhong · 2025-04-28T21:53:35Z

keps/sig-node/5246-cgroup-cpu-share-to-weight-conversion/README.md

+
+- A significant amount of the work would need to land in other layers, mainly OCI runtimes and the CRI.
+- We'll probably need a CRI configuration to ensure coordination between the CRI and the OCI runtimes implementations,
+and to ensure it lands at the same version, as suggested


Should we expand this in the design section to understand what needs to be changed and how the rollout will happen?

Hey @yujuhong!

What I have in mind is:

Change implementation in OCIs.

Expose a configuration the CRI can use.

Ensure CRI supports the new OCI configuration.

Test k8s with the new CRI configuration.

In the same k8s release:

Change the k8s code related to this formula, mainly pod-resources resize etc.

Ensure CRIs enable the OCI config.

Does that sound right? WDYT?

yujuhong · 2025-04-28T21:54:47Z

keps/sig-node/5246-cgroup-cpu-share-to-weight-conversion/README.md

+
+That being said, the formula in entirely an implementation detail that's most probably not being counted
+to have certain concrete values. In any way, we should ensure that the new formula is well documented
+and that the change is properly communicated to the users.


Will there be a way for the users to configure back to the previous behavior, in the case of unexpected issues resulted from the change?

I don't think this mechanism will be implemented by the OCI runtimes, the entire mechanism is a workaround for using cgroup v1 settings in a cgroup v2 environment.

If someone wants to have full control of the cgroup v2 values, then the must use the native cgroupv2 unified map to pass the correct value down the stack

I mainly asked because if we simply switched the implementation, it may have unexpected impact on user workloads. Having an option to preserve the original behavior can be important

If the change was purely at the OCI/CRI level, I'd say we can delegate this decision to these layers.
However, since there's a little k8s code that's also effected (e.g. pod-level resources) if a user would want to roll-back to the previous behavior we would need such a configurable.

Saying that, I'm not entirely sure it will be valuable and that it's worth introducing a config that will most likely be deprecated very quickly.

In the end of the day, the only users I can think of that will be negatively affected by this are users who expect concrete CPU weight values that would now be changed. I think this use-case is very rare if exists at all. In any case, I'm open to this approach.

cartermckinnon · 2025-04-29T18:59:36Z

/cc

k8s-triage-robot · 2025-08-02T09:47:32Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2025-09-01T09:55:33Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle rotten
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

tallclair

It doesn't change the discussion, but IMO we should stop converting between CPU shares & weights. Rather, we should use k8s native resource type (milli-CPU), and only convert to the cgroup types at the point where the cgroup value is really needed. In other words, we would only ever convert directly to and from milli-cpu to shares, or milli-cpu to weight, but not shares to weight.

tallclair · 2025-09-13T03:51:07Z

keps/sig-node/5246-cgroup-cpu-share-to-weight-conversion/README.md

+[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
+[kubernetes/website]: https://git.k8s.io/website
+
+## Summary


This section is all background. I would add a ### Background heading over it, and add a summary of the proposed changes here.

tallclair · 2025-09-13T04:04:41Z

keps/sig-node/5246-cgroup-cpu-share-to-weight-conversion/README.md

+`cpu.weight = (1 + ((cpu.shares - 2) * 9999) / 262142)`
+
+While systemd's formula is:
+`cpu.weight = 1 + ((cpu.shares – 2) × 99) / (1024 – 2)`


This differs from the new opencontainers formula - wouldn't it be better to match that?

SergeyKanzhelev · 2025-10-06T23:25:13Z

The KEP was closed so we do not need this PR

/close

k8s-ci-robot · 2025-10-06T23:25:20Z

@SergeyKanzhelev: Closed this PR.

In response to this:

The KEP was closed so we do not need this PR

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy KEP template

0b43e10

Signed-off-by: Itamar Holder <iholder@redhat.com>

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels Apr 16, 2025

k8s-ci-robot requested review from dchen1107 and mrunalp April 16, 2025 09:14

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 16, 2025

This was referenced Apr 16, 2025

Migrate to systemd's cgroup v1 CPU shares to v2 CPU weight formula #5246

Closed

Conversion of cgroup v1 CPU shares to v2 CPU weight causes workloads to have low CPU priority kubernetes/kubernetes#131216

Closed

iholder101 force-pushed the kep/systemd_cpu_cgroup_conversion branch from 9285303 to 12f5084 Compare April 16, 2025 09:32

Fill up KEP

8020ae0

Signed-off-by: Itamar Holder <iholder@redhat.com>

iholder101 force-pushed the kep/systemd_cpu_cgroup_conversion branch from 12f5084 to 8020ae0 Compare April 16, 2025 09:37

k8s-ci-robot requested a review from giuseppe April 27, 2025 03:38

giuseppe approved these changes Apr 28, 2025

View reviewed changes

yujuhong reviewed Apr 28, 2025

View reviewed changes

k8s-ci-robot requested a review from cartermckinnon April 29, 2025 18:59

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 2, 2025

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 1, 2025

tallclair reviewed Sep 13, 2025

View reviewed changes

k8s-ci-robot closed this Oct 6, 2025

[KEP-5246] Migrate to systemd's cgroup v1 CPU shares to v2 CPU weight formula #5247

[KEP-5246] Migrate to systemd's cgroup v1 CPU shares to v2 CPU weight formula #5247

Uh oh!

Conversation

iholder101 commented Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

iholder101 commented Apr 16, 2025

Uh oh!

pacoxu commented Apr 27, 2025

Uh oh!

giuseppe left a comment

Choose a reason for hiding this comment

Uh oh!

k8s-ci-robot commented Apr 28, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cartermckinnon commented Apr 29, 2025

Uh oh!

k8s-triage-robot commented Aug 2, 2025

Uh oh!

k8s-triage-robot commented Sep 1, 2025

Uh oh!

tallclair left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SergeyKanzhelev commented Oct 6, 2025

Uh oh!

k8s-ci-robot commented Oct 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

iholder101 commented Apr 16, 2025 •

edited

Loading