Skip to content

Conversation

@iholder101
Copy link
Contributor

@iholder101 iholder101 commented Apr 16, 2025

Signed-off-by: Itamar Holder <iholder@redhat.com>
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels Apr 16, 2025
@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 16, 2025
@iholder101 iholder101 force-pushed the kep/systemd_cpu_cgroup_conversion branch from 9285303 to 12f5084 Compare April 16, 2025 09:32
Signed-off-by: Itamar Holder <iholder@redhat.com>
@iholder101 iholder101 force-pushed the kep/systemd_cpu_cgroup_conversion branch from 12f5084 to 8020ae0 Compare April 16, 2025 09:37
@iholder101
Copy link
Contributor Author

FYI @vladikr

@pacoxu
Copy link
Member

pacoxu commented Apr 27, 2025

/cc @giuseppe
for #2254
/assign @yujuhong @dchen1107 @derekwaynecarr

@k8s-ci-robot k8s-ci-robot requested a review from giuseppe April 27, 2025 03:38
Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: giuseppe, iholder101
Once this PR has been reviewed and has the lgtm label, please assign dchen1107, soltysh for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment


- A significant amount of the work would need to land in other layers, mainly OCI runtimes and the CRI.
- We'll probably need a CRI configuration to ensure coordination between the CRI and the OCI runtimes implementations,
and to ensure it lands at the same version, as suggested
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we expand this in the design section to understand what needs to be changed and how the rollout will happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @yujuhong!

What I have in mind is:

  1. Change implementation in OCIs.
  2. Expose a configuration the CRI can use.
  3. Ensure CRI supports the new OCI configuration.
  4. Test k8s with the new CRI configuration.
  5. In the same k8s release:
    1. Change the k8s code related to this formula, mainly pod-resources resize etc.
    2. Ensure CRIs enable the OCI config.

Does that sound right? WDYT?


That being said, the formula in entirely an implementation detail that's most probably not being counted
to have certain concrete values. In any way, we should ensure that the new formula is well documented
and that the change is properly communicated to the users.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be a way for the users to configure back to the previous behavior, in the case of unexpected issues resulted from the change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this mechanism will be implemented by the OCI runtimes, the entire mechanism is a workaround for using cgroup v1 settings in a cgroup v2 environment.

If someone wants to have full control of the cgroup v2 values, then the must use the native cgroupv2 unified map to pass the correct value down the stack

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mainly asked because if we simply switched the implementation, it may have unexpected impact on user workloads. Having an option to preserve the original behavior can be important

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the change was purely at the OCI/CRI level, I'd say we can delegate this decision to these layers.
However, since there's a little k8s code that's also effected (e.g. pod-level resources) if a user would want to roll-back to the previous behavior we would need such a configurable.

Saying that, I'm not entirely sure it will be valuable and that it's worth introducing a config that will most likely be deprecated very quickly.

In the end of the day, the only users I can think of that will be negatively affected by this are users who expect concrete CPU weight values that would now be changed. I think this use-case is very rare if exists at all. In any case, I'm open to this approach.

@cartermckinnon
Copy link

/cc

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 2, 2025
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle rotten
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 1, 2025
Copy link
Member

@tallclair tallclair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't change the discussion, but IMO we should stop converting between CPU shares & weights. Rather, we should use k8s native resource type (milli-CPU), and only convert to the cgroup types at the point where the cgroup value is really needed. In other words, we would only ever convert directly to and from milli-cpu to shares, or milli-cpu to weight, but not shares to weight.

[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
[kubernetes/website]: https://git.k8s.io/website

## Summary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is all background. I would add a ### Background heading over it, and add a summary of the proposed changes here.

`cpu.weight = (1 + ((cpu.shares - 2) * 9999) / 262142)`

While systemd's formula is:
`cpu.weight = 1 + ((cpu.shares – 2) × 99) / (1024 – 2)`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This differs from the new opencontainers formula - wouldn't it be better to match that?

@SergeyKanzhelev
Copy link
Member

The KEP was closed so we do not need this PR

/close

@k8s-ci-robot
Copy link
Contributor

@SergeyKanzhelev: Closed this PR.

In response to this:

The KEP was closed so we do not need this PR

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/node Categorizes an issue or PR as relevant to SIG Node. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants