Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support memory qos with cgroups v2 #2570

Open
4 tasks done
xiaoxubeii opened this issue Mar 14, 2021 · 23 comments
Open
4 tasks done

Support memory qos with cgroups v2 #2570

xiaoxubeii opened this issue Mar 14, 2021 · 23 comments
Assignees
Labels
sig/node Categorizes an issue or PR as relevant to SIG Node. stage/alpha Denotes an issue tracking an enhancement targeted for Alpha status tracked/no Denotes an enhancement issue is NOT actively being tracked by the Release Team
Milestone

Comments

@xiaoxubeii
Copy link
Member

xiaoxubeii commented Mar 14, 2021

Enhancement Description

@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Mar 14, 2021
@xiaoxubeii
Copy link
Member Author

xiaoxubeii commented Mar 14, 2021

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 14, 2021
@xiaoxubeii
Copy link
Member Author

xiaoxubeii commented Mar 14, 2021

/assign @xiaoxubeii

@MadhavJivrajani
Copy link
Contributor

MadhavJivrajani commented Mar 25, 2021

Hi! This sounds really interesting and I'd love to help out, please let me know how I can help out with this!

@ehashman
Copy link
Member

ehashman commented May 4, 2021

/stage alpha
/milestone v1.22

@k8s-ci-robot k8s-ci-robot added the stage/alpha Denotes an issue tracking an enhancement targeted for Alpha status label May 4, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.22 milestone May 4, 2021
@JamesLaverack JamesLaverack added the tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team label May 5, 2021
@xiaoxubeii xiaoxubeii changed the title Support memory qos using cgroups v2 Support memory qos with cgroups v2 May 7, 2021
@gracenng
Copy link
Member

gracenng commented May 10, 2021

Hi @xiaoxubeii 👋 1.22 Enhancements Shadow here.

This enhancement is in good shape, some minor change requests in light of Enhancement Freeze on Thursday May 13th:

  • Update kep.yaml file to the latest template
  • In kep.yaml, status is currently provisional instead of implementable
  • Alpha graduation criteria missing
  • KEP not merged to master

Thanks!

@gracenng
Copy link
Member

gracenng commented May 11, 2021

Hi @xiaoxubeii 👋 1.22 Enhancements shadow here.

To help SIG's be aware of their workload, I just wanted to check to see if SIG-Node will need to do anything for this enhancement and if so, are they OK with it?
Thanks!

@xiaoxubeii
Copy link
Member Author

xiaoxubeii commented May 12, 2021

@gracenng Hey grace, I have updated necessary contents as follows:

  • update kep.yaml for prr approval
  • add Alpha graduation criteria

sig-node approvers @derekwaynecarr @mrunalp are reviewing for that. I am waiting for lgtm/approve and merge as implementable.

@xiaoxubeii
Copy link
Member Author

xiaoxubeii commented May 13, 2021

@gracenng sig-node approvers(Derek and Mrunal) have gave lgtm/approve. There are few prr review requests, I have updated and am waiting for next review round. I think we can catch up with the freeze day :)

@gracenng
Copy link
Member

gracenng commented May 13, 2021

Hi @xiaoxubeii , looks like your PRR was approved and the requested changes are all here. I have updated the status of this enhancement to tracked
Thank you for keeping me updated!

@xiaoxubeii
Copy link
Member Author

xiaoxubeii commented May 13, 2021

Thanks for your help. Also thanks very much to a lot of valuable review suggestions and helps from @derekwaynecarr @mrunalp @bobbypage @giuseppe @odinuge @johnbelamaric @ehashman
Really appreciate that :)

@ritpanjw
Copy link

ritpanjw commented May 19, 2021

Hello @xiaoxubeii 👋 , 1.22 Docs Shadow here.

This enhancement is marked as Needs Docs for 1.22 release.
Please follow the steps detailed in the documentation to open a PR against dev-1.22 branch in the k/website repo. This PR can be just a placeholder at this time and must be created before Fri July 9, 11:59 PM PDT.
Also, take a look at Documenting for a release to familiarize yourself with the docs requirement for the release.

Thank you!

@xiaoxubeii
Copy link
Member Author

xiaoxubeii commented May 19, 2021

@ritpanjw OK, thanks for reminding.

@gracenng
Copy link
Member

gracenng commented Jun 23, 2021

Hi @xiaoxubeii 🌞 1.22 enhancements shadow here.

In light of Code Freeze on July 8th, this enhancement current status is tracked, and we're currently tracking kubernetes/kubernetes#102578 kubernetes/kubernetes/pull/102970

Please let me know if there is other code PR associated with this enhancement.

Thanks

@xiaoxubeii
Copy link
Member Author

xiaoxubeii commented Jun 24, 2021

Hi @xiaoxubeii 🌞 1.22 enhancements shadow here.

In light of Code Freeze on July 8th, this enhancement current status is tracked, and we're currently tracking kubernetes/kubernetes#102578 kubernetes/kubernetes/pull/102970

Please let me know if there is other code PR associated with this enhancement.

Thanks

@gracenng It is all here, thanks.

@salaxander salaxander added tracked/no Denotes an enhancement issue is NOT actively being tracked by the Release Team and removed tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team labels Aug 19, 2021
@k8s-triage-robot
Copy link

k8s-triage-robot commented Nov 17, 2021

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 17, 2021
@xiaoxubeii
Copy link
Member Author

xiaoxubeii commented Nov 18, 2021

This kep has been released.

@pacoxu
Copy link
Member

pacoxu commented Oct 12, 2022

This feature is alpha in v1.22. Any plan to make it beta?

BTW, I have some questions about the KEP.

  1. about the memoryThrottlingFactor factor, this is tricky. The kernel ability is not fully exposed to user API(I mean memoryThrottlingFactor is a node level setting; the app owner cannot set it in pod level.). memory.high is like a soft limit and memory.max is a hard limit.
    Currently, there is only resource.limit. No resource.softlimit.
    KEP-2570: Support memory qos with cgroups v2 #2571 (comment) @derekwaynecarr 's comment on it.

Meanwhile, the OOM is controlled by the kernel. If kubelet handles a pod that uses memory that exceeds the limit, it can easily add an OOMKilled event. For kernel killing, kubelet cannot get that directly. If memory.high==resource.lmit, kubelet can kill the pod instead of kernel oom killing.

Is there a performance issue if the throttle factor is too small? For example, some pods like Java will always use ~85% memory and memory.high will take effect continuously. The processes of the cgroup are throttled and put under heavy reclaim pressure. Will this be a risk of this feature?

  1. memory.low vs memory.min: The kernel ability is not fully exposed to user API here as well.
    memory.low is like a soft limit and memory.request is a hard request.
    There is a comment in KEP-2570: Support memory qos with cgroups v2 #2571 (review) @mrunalp.
    Currently, there is only resource.request. No resource.softrequest.

@ehashman
Copy link
Member

ehashman commented Oct 13, 2022

This should not have been closed as the feature is merely alpha. It either needs to continue graduating or would be deprecated. There is more work to do in either case.

@ehashman ehashman reopened this Oct 13, 2022
@k8s-triage-robot
Copy link

k8s-triage-robot commented Nov 12, 2022

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 12, 2022
@sftim
Copy link
Contributor

sftim commented Nov 13, 2022

/remove-lifecycle rotten

What are the barriers to moving this forward to, for example, beta and off by default?

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Nov 13, 2022
@sftim
Copy link
Contributor

sftim commented Nov 13, 2022

Should we freeze this issue?

(I'd assume than all alpha and beta features that ship in Kubernetes should have their KEP issues frozen, so that we continue to track the associated work)

@xiaoxubeii
Copy link
Member Author

xiaoxubeii commented Nov 21, 2022

This feature is alpha in v1.22. Any plan to make it beta?

BTW, I have some questions about the KEP.

  1. about the memoryThrottlingFactor factor, this is tricky. The kernel ability is not fully exposed to user API(I mean memoryThrottlingFactor is a node level setting; the app owner cannot set it in pod level.). memory.high is like a soft limit and memory.max is a hard limit.
    Currently, there is only resource.limit. No resource.softlimit.
    KEP-2570: Support memory qos with cgroups v2 #2571 (comment) @derekwaynecarr 's comment on it.

Yes. memory.high is more like a soft limit of memory, so we simply set (limits.memory or node allocatable memory)*memorThrottlingFactor in alpha version, which memorThrottlingFactor is 0.8 by default at this moment. In other words, we made (limits.memory or node allocatable memory) as resource.softlimit.

Meanwhile, the OOM is controlled by the kernel. If kubelet handles a pod that uses memory that exceeds the limit, it can easily add an OOMKilled event. For kernel killing, kubelet cannot get that directly. If memory.high==resource.lmit, kubelet can kill the pod instead of kernel oom killing.

kubelet will never KILL THE POD in this case, yet kernel does. Kernel will kill the container which memory usage is over limits.memory, set to memory.max in cgv2. There are no additional implementations of the kubelet in this KEP, other than setting correct values to cgv2.

Is there a performance issue if the throttle factor is too small? For example, some pods like Java will always use ~85% memory and memory.high will take effect continuously. The processes of the cgroup are throttled and put under heavy reclaim pressure. Will this be a risk of this feature?
Yes, maybe. The default throttle factor which is .8 here is an experimental value. We will expose it to kubelet startup parameters when appropriate, maybe at beta stage.

  1. memory.low vs memory.min: The kernel ability is not fully exposed to user API here as well.
    memory.low is like a soft limit and memory.request is a hard request.
    There is a comment in KEP-2570: Support memory qos with cgroups v2 #2571 (review) @mrunalp.
    Currently, there is only resource.request. No resource.softrequest.

Actually memory.low is not yet used in the KEP.

@pacoxu
Copy link
Member

pacoxu commented Dec 2, 2022

Just an idea on the factor.

Current proposal is memory.high = memory.limit * kubelet memoryThrottlingFactor

However, this would be a problem when the requested memory is close to the memory limit, and the throttling is too easy to reach. When users want to set the factor smaller, it may make no throttling due to high<request not being accepted.

  • For instance, if the request memory is 500Mi and the limit is 1Gi and the factor is 0.6.
    • Then the memory.high would be 600Mi. Just 100Mi higher than the requested memory.
  • Another instance, if the request memory is 800Mi, and the limit is 1Gi, for factor 0.6, no memory.high will be set on this case.
  • If request memory is not set, the high.memory can be assigned to 600Mi

A new proposal would be to make the factor based on the requested memory.
memory.high = memory.request + (memory.limit - memory.request) * kubelet memoryThrottlingFactor
With the first example above,

  • The memory.high would be 500Mi + 300Mi = 800Mi, which would be a better factor working still.
  • For the second instance, if request memory is 800Mi, and the limit is 1Gi, for factor 0.6, memory.high will be 920Mi.
  • If request memory is not set, the high.memory can be set to 600Mi. No change for this case.

Is this a better factor design? I'm not sure if there are other scenarios for throttling the memory.

Limit 1000Mi\ Request \ factor current design: memory.high my proposal: memory.high
request 0Mi
factor 0.6
600Mi 600Mi
request 500Mi
factor 0.6
600Mi 800Mi
request 800Mi
factor 0.6
max 920Mi
request 1Gi
factor 0.6
max max
request 0Mi
factor 0.8
800Mi 800Mi
request 500Mi
factor 0.8
800Mi 900Mi
request 800Mi
factor 0.8
max 960Mi
request 500Gi
factor 0.4
max 700Mi
calculation memory.high method memory.limit * kubelet memoryThrottlingFactor memory.request + (memory.limit - memory.request) * kubelet memoryThrottlingFactor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/node Categorizes an issue or PR as relevant to SIG Node. stage/alpha Denotes an issue tracking an enhancement targeted for Alpha status tracked/no Denotes an enhancement issue is NOT actively being tracked by the Release Team
Projects
None yet
Development

No branches or pull requests