Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support memory qos with cgroups v2 #2570

Open
8 tasks done
xiaoxubeii opened this issue Mar 14, 2021 · 86 comments
Open
8 tasks done

Support memory qos with cgroups v2 #2570

xiaoxubeii opened this issue Mar 14, 2021 · 86 comments
Assignees
Labels
sig/node Categorizes an issue or PR as relevant to SIG Node. stage/alpha Denotes an issue tracking an enhancement targeted for Alpha status

Comments

@xiaoxubeii
Copy link
Member

xiaoxubeii commented Mar 14, 2021

Enhancement Description

@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Mar 14, 2021
@xiaoxubeii
Copy link
Member Author

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 14, 2021
@xiaoxubeii
Copy link
Member Author

/assign @xiaoxubeii

@MadhavJivrajani
Copy link
Contributor

Hi! This sounds really interesting and I'd love to help out, please let me know how I can help out with this!

@ehashman
Copy link
Member

ehashman commented May 4, 2021

/stage alpha
/milestone v1.22

@k8s-ci-robot k8s-ci-robot added the stage/alpha Denotes an issue tracking an enhancement targeted for Alpha status label May 4, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.22 milestone May 4, 2021
@JamesLaverack JamesLaverack added the tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team label May 5, 2021
@xiaoxubeii xiaoxubeii changed the title Support memory qos using cgroups v2 Support memory qos with cgroups v2 May 7, 2021
@gracenng
Copy link
Member

Hi @xiaoxubeii 👋 1.22 Enhancements Shadow here.

This enhancement is in good shape, some minor change requests in light of Enhancement Freeze on Thursday May 13th:

  • Update kep.yaml file to the latest template
  • In kep.yaml, status is currently provisional instead of implementable
  • Alpha graduation criteria missing
  • KEP not merged to master

Thanks!

@gracenng
Copy link
Member

Hi @xiaoxubeii 👋 1.22 Enhancements shadow here.

To help SIG's be aware of their workload, I just wanted to check to see if SIG-Node will need to do anything for this enhancement and if so, are they OK with it?
Thanks!

@xiaoxubeii
Copy link
Member Author

xiaoxubeii commented May 12, 2021

@gracenng Hey grace, I have updated necessary contents as follows:

  • update kep.yaml for prr approval
  • add Alpha graduation criteria

sig-node approvers @derekwaynecarr @mrunalp are reviewing for that. I am waiting for lgtm/approve and merge as implementable.

@xiaoxubeii
Copy link
Member Author

xiaoxubeii commented May 13, 2021

@gracenng sig-node approvers(Derek and Mrunal) have gave lgtm/approve. There are few prr review requests, I have updated and am waiting for next review round. I think we can catch up with the freeze day :)

@gracenng
Copy link
Member

Hi @xiaoxubeii , looks like your PRR was approved and the requested changes are all here. I have updated the status of this enhancement to tracked
Thank you for keeping me updated!

@xiaoxubeii
Copy link
Member Author

Thanks for your help. Also thanks very much to a lot of valuable review suggestions and helps from @derekwaynecarr @mrunalp @bobbypage @giuseppe @odinuge @johnbelamaric @ehashman
Really appreciate that :)

@ritpanjw
Copy link

Hello @xiaoxubeii 👋 , 1.22 Docs Shadow here.

This enhancement is marked as Needs Docs for 1.22 release.
Please follow the steps detailed in the documentation to open a PR against dev-1.22 branch in the k/website repo. This PR can be just a placeholder at this time and must be created before Fri July 9, 11:59 PM PDT.
Also, take a look at Documenting for a release to familiarize yourself with the docs requirement for the release.

Thank you!

@xiaoxubeii
Copy link
Member Author

@ritpanjw OK, thanks for reminding.

@gracenng
Copy link
Member

Hi @xiaoxubeii 🌞 1.22 enhancements shadow here.

In light of Code Freeze on July 8th, this enhancement current status is tracked, and we're currently tracking kubernetes/kubernetes#102578 kubernetes/kubernetes/pull/102970

Please let me know if there is other code PR associated with this enhancement.

Thanks

@xiaoxubeii
Copy link
Member Author

Hi @xiaoxubeii 🌞 1.22 enhancements shadow here.

In light of Code Freeze on July 8th, this enhancement current status is tracked, and we're currently tracking kubernetes/kubernetes#102578 kubernetes/kubernetes/pull/102970

Please let me know if there is other code PR associated with this enhancement.

Thanks

@gracenng It is all here, thanks.

@salaxander salaxander added tracked/no Denotes an enhancement issue is NOT actively being tracked by the Release Team and removed tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team labels Aug 19, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 17, 2021
@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 29, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 28, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 29, 2024
@pacoxu
Copy link
Member

pacoxu commented Jul 18, 2024

/remove-lifecycle stale
/reopen

@k8s-ci-robot
Copy link
Contributor

@pacoxu: Reopened this issue.

In response to this:

/remove-lifecycle stale
/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot reopened this Jul 18, 2024
@pacoxu
Copy link
Member

pacoxu commented Jul 18, 2024

@ndixita in https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2570-memory-qos#latest-update-stalled, the kep mentioned that we may use PSI here.

#4205 is PSI kep and we may wait for that KEP implementation, IIUC.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 17, 2024
@ndixita
Copy link
Contributor

ndixita commented Aug 17, 2024

/reopen

@k8s-ci-robot k8s-ci-robot reopened this Aug 17, 2024
@k8s-ci-robot
Copy link
Contributor

@ndixita: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@haircommander haircommander moved this from Done to Removed in SIG Node 1.32 KEPs planning Aug 20, 2024
@pacoxu
Copy link
Member

pacoxu commented Aug 26, 2024

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 26, 2024
@pacoxu
Copy link
Member

pacoxu commented Sep 14, 2024

Thanks for @ndixita detailed test: https://docs.google.com/document/d/1mY0MTT34P-Eyv5G1t_Pqs4OWyIH-cg9caRKWmqYlSbI/edit?usp=sharing.

🚨🚨 Sometimes the application pod is stuck for throttling memory. This is a worse behavior than OOM kill. So we decided to postpone promoting this feature until we can gracefully handle this issue.

See kubernetes/kubernetes#118699 (comment) as well.

So KEP needs an update.

For the blocker issue, it is a kernel behavior which may need fix or we need something like alibaba cloud memory high watermark which only trigger memory reclaim without throttling.

Some new thoughts on this.

The throttling behavior may not be suitable to all applications. Should we change the default behavior of this to be just disabled? Or I'm afraid this will be something forever alpha. 🤔

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 22, 2024
@haircommander
Copy link
Contributor

I chatted with a colleague who works on the memory subsystem about this, and he is looking into adding a knob to the kernel to tune how aggressive the kernel is with reclaiming memory. This could allow us to reduce the aggression which could let the kernel OOM kill in this case

@haircommander
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/node Categorizes an issue or PR as relevant to SIG Node. stage/alpha Denotes an issue tracking an enhancement targeted for Alpha status
Projects
Status: Tracked
Status: Removed from Milestone
Status: Removed
Development

No branches or pull requests