New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support memory qos with cgroups v2 #2570
Comments
/sig node |
/assign @xiaoxubeii |
Hi! This sounds really interesting and I'd love to help out, please let me know how I can help out with this! |
/stage alpha |
Hi @xiaoxubeii This enhancement is in good shape, some minor change requests in light of Enhancement Freeze on Thursday May 13th:
Thanks! |
Hi @xiaoxubeii To help SIG's be aware of their workload, I just wanted to check to see if SIG-Node will need to do anything for this enhancement and if so, are they OK with it? |
@gracenng Hey grace, I have updated necessary contents as follows:
sig-node approvers @derekwaynecarr @mrunalp are reviewing for that. I am waiting for lgtm/approve and merge as |
@gracenng sig-node approvers(Derek and Mrunal) have gave lgtm/approve. There are few prr review requests, I have updated and am waiting for next review round. I think we can catch up with the freeze day :) |
Hi @xiaoxubeii , looks like your PRR was approved and the requested changes are all here. I have updated the status of this enhancement to |
Thanks for your help. Also thanks very much to a lot of valuable review suggestions and helps from @derekwaynecarr @mrunalp @bobbypage @giuseppe @odinuge @johnbelamaric @ehashman |
Hello @xiaoxubeii This enhancement is marked as Needs Docs for 1.22 release. Thank you! |
@ritpanjw OK, thanks for reminding. |
Hi @xiaoxubeii In light of Code Freeze on July 8th, this enhancement current status is Please let me know if there is other code PR associated with this enhancement. Thanks |
@gracenng It is all here, thanks. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
This kep has been released. |
This feature is alpha in v1.22. Any plan to make it beta? BTW, I have some questions about the KEP.
Meanwhile, the OOM is controlled by the kernel. If kubelet handles a pod that uses memory that exceeds the limit, it can easily add an OOMKilled event. For kernel killing, kubelet cannot get that directly. If Is there a performance issue if the throttle factor is too small? For example, some pods like Java will always use ~85% memory and
|
This should not have been closed as the feature is merely alpha. It either needs to continue graduating or would be deprecated. There is more work to do in either case. |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/remove-lifecycle rotten What are the barriers to moving this forward to, for example, beta and off by default? |
Should we freeze this issue? (I'd assume than all alpha and beta features that ship in Kubernetes should have their KEP issues frozen, so that we continue to track the associated work) |
Yes.
kubelet will never KILL THE POD in this case, yet kernel does. Kernel will kill the container which memory usage is over
Actually |
Just an idea on the factor.
Current proposal is However, this would be a problem when the requested memory is close to the memory limit, and the throttling is too easy to reach. When users want to set the factor smaller, it may make no throttling due to high<request not being accepted.
A new proposal would be to make the factor based on the requested memory.
Is this a better factor design? I'm not sure if there are other scenarios for throttling the memory.
|
how to pick the magic number memoryThrottlingFactor? I don't think a single value would work fine for all workloads. I think it is better if these settings are exposed to the user. IMHO, |
I shared a bit more context about this feature in my and @mrunalp kubecon talk regarding cgroupv2. One the interesting pieces feedback I received about the feature is that some folks may have applications which are quite latency / performance sensitive and are always using very close to the memory limit. In those cases, customers would prefer to not have a soft memory limit set so their application does not get impacted by kernel memory reclaim when hitting |
Yes, especially for some Java applications, they may not need it and it is necessary to have an option per pod for this. BTW, as you mentioned, the |
Like bobby said, we need find a way to opt-out it on a per pod level, which means users can enable or disable it by setting something like |
I tried to summarize things in https://docs.google.com/document/d/1p9awiXhu5f4mWsOqNpCX1W-bLLlICiABKU55XpeOgoA/edit?usp=sharing for discussions. |
We discussed it at SIG Node meeting on Dec 6th: https://docs.google.com/document/d/1Ne57gvidMEWXR70OxxnRkYquAoMpt56o75oZtg-OeBg/edit#bookmark=id.ph8vc9vhlcxy with the summary: “Collect feedback in the doc and then open a KEP update with alternatives”, discussion with timestamp: https://youtu.be/t3PcHj62f0c?t=686 |
This is now discussed in https://docs.google.com/document/d/1r-e_jWL5EllBgxtaNb9qI2eLOKf9wSuuuglTySWmfmQ/edit#heading=h.6nyswlzgrqq0. I opened a draft PR kubernetes/kubernetes#115371 accordingly. |
Enhancement Description
k/enhancements
) update PR(s): KEP-2570: Support memory qos with cgroups v2 #2571k/k
) update PR(s): Feature: add unified on CRI to support cgroup v2 kubernetes#102578 Feature: Support memory qos with cgroups v2 kubernetes#102970k/website
) update PR(s): Support Memory QoS with cgroups v2 for 1.22 #2570 website#28566The text was updated successfully, but these errors were encountered: