Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node memory swap support #2400

Open
30 of 40 tasks
ehashman opened this issue Feb 1, 2021 · 116 comments
Open
30 of 40 tasks

Node memory swap support #2400

ehashman opened this issue Feb 1, 2021 · 116 comments
Assignees
Labels
lead-opted-in Denotes that an issue has been opted in to a release sig/node Categorizes an issue or PR as relevant to SIG Node. stage/beta Denotes an issue tracking an enhancement targeted for Beta status

Comments

@ehashman
Copy link
Member

ehashman commented Feb 1, 2021

Enhancement Description

Please keep this description up to date. This will help the Enhancement Team to track the evolution of the enhancement efficiently.

/sig node

@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Feb 1, 2021
@karan
Copy link
Member

karan commented Feb 9, 2021

@cookieisaac

@ehashman
Copy link
Member Author

/stage alpha
/milestone v1.22

@k8s-ci-robot k8s-ci-robot added the stage/alpha Denotes an issue tracking an enhancement targeted for Alpha status label Apr 28, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.22 milestone Apr 28, 2021
@JamesLaverack JamesLaverack added the tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team label Apr 29, 2021
@ehashman ehashman self-assigned this May 10, 2021
@jrsapi
Copy link

jrsapi commented May 11, 2021

Greetings @ehashman!
Enhancement shadow checking in with a few reminders. 1.22 Enhancements Freeze starts at 23:59:59 pst on Thursday, May 13. A few items needing review for this KEP:

Thanks!

@ehashman
Copy link
Member Author

All of this is covered in #2602

@jrsapi
Copy link

jrsapi commented May 13, 2021

Greetings @ehashman!
Thanks for the follow-up. After reviewing the KEP and PRR everything looks on target. The enhancement is marked at risk but once merged we can move its status to tracked. A reminder that tomorrow 5/13 the Enhancement freeze starts at 23:59:59 PST.

Thanks!

@ehashman
Copy link
Member Author

Work breakdown for 1.22

  • Documentation for enabling swap (TBD - @ehashman?)
  • CI environment/test updates (@ike-ma)
    • Build images with swap for 2 Linux distros
    • Add jobs to test-infra that use the images and enable the swap feature flag/kubelet option for node e2e suite
    • See also Test Plan
  • CRI and kubelet changes (@ehashman)
    • Complete API changes per KEP
    • Add e2e tests as appropriate

External to k8s but still need to happen:

  • Containerd update to use new CRI
  • CRI-O update to use new CRI

Once the above CRI updates happen, ensure that CI environment is using latest container runtimes with updated CRI.

@jrsapi
Copy link

jrsapi commented Jun 24, 2021

Greetings @ehashman ,
Enhancement shadow checking with a reminder that we are 2 weeks away from code freeze (July 8, 2021). Can you like the k/k PR/s that are needed to implement this enhancement for the 1.22 milestone?

@jrsapi
Copy link

jrsapi commented Jul 6, 2021

Greetings @ehashman,
A friendly reminder that code freeze is this Thursday, July 8th and we're tracking the following k/k PR:

Thanks!

@ehashman
Copy link
Member Author

ehashman commented Jul 7, 2021

PR just merged. Docs placeholder is kubernetes/website#28838

@jrsapi
Copy link

jrsapi commented Jul 7, 2021

Awesome! Thanks for the update. Moving this to "Tracked".

@ehashman
Copy link
Member Author

We are good to go for 1.22! Docs complete.

@ehashman
Copy link
Member Author

/milestone v1.23

@k8s-ci-robot k8s-ci-robot modified the milestones: v1.22, v1.23 Aug 12, 2021
@ehashman
Copy link
Member Author

/stage beta

@k8s-ci-robot k8s-ci-robot added stage/beta Denotes an issue tracking an enhancement targeted for Beta status and removed stage/alpha Denotes an issue tracking an enhancement targeted for Alpha status labels Aug 12, 2021
@k8s-ci-robot k8s-ci-robot modified the milestones: v1.28, v1.29 Sep 20, 2023
@npolshakova
Copy link

/label lead-opted-in

@k8s-ci-robot k8s-ci-robot added the lead-opted-in Denotes that an issue has been opted in to a release label Sep 20, 2023
@jan-kantert
Copy link

Hey @jan-kantert! Thanks for reaching out.
You're right. This is indeed something we need to look into. I guess (?) that all needs to be done is to subtract the swap space from workingSet and simply ignore swap space for eviction decisions. But as said I still need to explore this area.

Would there be a scenario where swap + workingSet > node.status.capacity[memory] ? and would that lead to having negative value for memory.available?

I my understanding this is currently the case. However, in practice that is where eviction kicks in so that value stays around the eviction threshold. If you look at our nodes you will see that they are in fact only using around 50% of their available RAM (and bunch of swap) but memory.available indicates that they are almost our of memory (which they are not). On a plus side nodes will use the spare RAM as cache which is better spend than keeping unused stuff in RAM.

We are currently investigating with --experimental-allocatable-ignore-eviction to disable memory evictions. I will report back once we gained some experience with that.

@iholder101
Copy link
Contributor

Hey all.
I've made some cleaning w.r.t. old issues that are outdated, inactive and that I can't update their descriptions.

The issues can be tracked here:

From now on I plan to keep them organized and up-to-date.
@pacoxu you're welcome to add these under "future" in this issue's description (since unfortunately I can't edit it as well).

jsturtevant pushed a commit to jsturtevant/containerd that referenced this issue Sep 21, 2023
OCI runtime spec defines memory.swap as 'limit of memory+Swap usage'
so setting them to equal should disable the swap. Also, this change
should make containerd behaviour same as other runtimes e.g
'cri-dockerd/dockershim' and won't be impacted when user turn on
'NodeSwap' (kubernetes/enhancements#2400) feature.

Signed-off-by: Qasim Sarfraz <qasimsarfraz@microsoft.com>
@rayandas
Copy link
Member

rayandas commented Sep 26, 2023

Hello @ehashman 👋, v1.29 Enhancements team here.

Just checking in as we approach enhancements freeze on 01:00 UTC, Friday, 6th October, 2023.

This enhancement is targeting for stage beta for v1.29 (correct me, if otherwise)

Here's where this enhancement currently stands:

  • KEP readme using the latest template has been merged into the k/enhancements repo.
  • KEP status is marked as implementable for latest-milestone: 1.29. KEPs targeting stable will need to be marked as implemented after code PRs are merged and the feature gates are removed.
  • KEP readme has a updated detailed test plan section filled out
  • KEP readme has up to date graduation criteria
  • KEP has a production readiness review that has been completed and merged into k/enhancements.

For this KEP, we would just need to update the following:

The status of this enhancement is marked as at risk for enhancement freeze. Please keep the issue description up-to-date with appropriate stages as well. Thank you!

@pacoxu
Copy link
Member

pacoxu commented Sep 28, 2023

@iholder101 do you think this feature needs to be tracked by the release team in v1.29 release cycle?

  • If yes, do you have time to update the KEP status?
  • If no, we can just push forward swap related tasks and update the KEP in next release cycle.

@deads2k
Copy link
Contributor

deads2k commented Oct 2, 2023

Is this planning to move to stable in 1.29?

@SergeyKanzhelev
Copy link
Member

Is this planning to move to stable in 1.29?

no. We have critical issues with evction with the current implementation and we haven't even enabled it on by default. We will try to fix the eviction problem in 1.29 and see if there is a demand for more APIs.

@rayandas
Copy link
Member

rayandas commented Oct 4, 2023

Hi @ehashman checking in once more as we approach the 1.29 enhancement freeze deadline on 01:00 UTC, Friday, 6th October, 2023. The status of this enhancement is marked as at risk for enhancement freeze.
It looks like updating the latest milestone in kep.yaml will help this enhancement move to tracked for enhancement freeze. Let me know if I missed anything. Thanks.

@harche
Copy link
Contributor

harche commented Oct 5, 2023

Hi @ehashman checking in once more as we approach the 1.29 enhancement freeze deadline on 01:00 UTC, Friday, 6th October, 2023. The status of this enhancement is marked as at risk for enhancement freeze. It looks like updating the latest milestone in kep.yaml will help this enhancement move to tracked for enhancement freeze. Let me know if I missed anything. Thanks.

I created #4275 to bump swap to beta2 in 1.29

@rayandas
Copy link
Member

rayandas commented Oct 5, 2023

As the PR #4275 is merged and latest-milestone has been updated, marking this as Tracked for Enhancements Freeze. 🚀

@drewhagen
Copy link
Member

Hello @iholder101 @pacoxu @ehashman @SergeyKanzhelev 👋, v1.29 Docs Shadow here.
Does this enhancement work planned for v1.29 require any new docs or modification to existing docs?
If so, please follows the steps here to open a PR against dev-1.29 branch in the k/website repo. This PR can be just a placeholder at this time and must be created before Thursday, 19 October 2023.
Also, take a look at Documenting for a release to get yourself familiarize with the docs requirement for the release.
Thank you!

@harche
Copy link
Contributor

harche commented Oct 11, 2023

Hello @iholder101 @pacoxu @ehashman @SergeyKanzhelev 👋, v1.29 Docs Shadow here. Does this enhancement work planned for v1.29 require any new docs or modification to existing docs? If so, please follows the steps here to open a PR against dev-1.29 branch in the k/website repo. This PR can be just a placeholder at this time and must be created before Thursday, 19 October 2023. Also, take a look at Documenting for a release to get yourself familiarize with the docs requirement for the release. Thank you!

Ack. Thanks.

@drewhagen
Copy link
Member

drewhagen commented Oct 17, 2023

Hi @harche @iholder101 @pacoxu @ehashman @SergeyKanzhelev! The deadline to open a placeholder PR against k/website for required documentation is Thursday, 19 October. Could you please update me on the status of docs for this enhancement? Thank you!

@harche
Copy link
Contributor

harche commented Oct 19, 2023

Hi @harche @iholder101 @pacoxu @ehashman @SergeyKanzhelev! The deadline to open a placeholder PR against k/website for required documentation is Thursday, 19 October. Could you please update me on the status of docs for this enhancement? Thank you!

@drewhagen I have created a placehold PR here - kubernetes/website#43571

@jan-kantert
Copy link

Hey @jan-kantert! Thanks for reaching out.
You're right. This is indeed something we need to look into. I guess (?) that all needs to be done is to subtract the swap space from workingSet and simply ignore swap space for eviction decisions. But as said I still need to explore this area.

Would there be a scenario where swap + workingSet > node.status.capacity[memory] ? and would that lead to having negative value for memory.available?

I my understanding this is currently the case. However, in practice that is where eviction kicks in so that value stays around the eviction threshold. If you look at our nodes you will see that they are in fact only using around 50% of their available RAM (and bunch of swap) but memory.available indicates that they are almost our of memory (which they are not). On a plus side nodes will use the spare RAM as cache which is better spend than keeping unused stuff in RAM.

We are currently investigating with --experimental-allocatable-ignore-eviction to disable memory evictions. I will report back once we gained some experience with that.

We ran this setting in production for a while and it seems to perform well. We do not use swap as workload memory currently (i.e. pods reserve actual RAM for their workload instead of a mixture of swap and memory). However, this helps a lot with preventing evictions. Our clusters are almost free of evictions which is really great. It also allows us to use vertical pod autoscaler (VPA) for memory in more cases as pods might temporarily use more memory without fear of instant eviction (which we had previously on tightly packed clusters). So overall this seems to allow us to pack even tighter and get our average memory usage from 20-30% to 50-60% which is a roughly 50% cost reduction.

As a downside we see a lot more instances of random probe failures (similar to kubernetes/kubernetes#89898). This seems to be caused by better packing and higher CPU load. We generally see high load (i.e. 50+) but not necessarily cpu usage. So this might be connected with swap in/out of memory. Guess since kubelet has a general issues around probe timeout/slowness this is something which is kind of expected when you pack tighter. As a workaround we use a mutating webhook to patch all livenessProbe timeouts to at least 30s since a lot of software (i.e. linkerd) uses the 1s default timeout. Without the workaround we often see pods (such as linkerd with a 1s probe) getting terminated by kubelet. No issues with the workaround so far.

@rayandas
Copy link
Member

Hey again @ehashman @pacoxu @SergeyKanzhelev 👋, 1.29 Enhancements team here,

Just checking in as we approach code freeze at 01:00 UTC Wednesday 1st November 2023: .

Here's where this enhancement currently stands:

  • All PRs to the Kubernetes repo that are related to your enhancement are linked in the above issue description (for tracking purposes).

  • All PR/s are ready to be merged (they have approved and lgtm labels applied) by the code freeze deadline. This includes tests.

The status of this KEP is currently at risk for code freeze.

Also, please let me know if there are other PRs in k/k we should be tracking for this KEP.

As always, we are here to help if any questions come up. ✌️ Thanks!

@kcmartin
Copy link

Hi @ehashman @pacoxu @SergeyKanzhelev ! 👋 from the v1.29 Release Team-Communications! We would like to check if you have any plans to publish a blog for this KEP regarding new features, removals, and deprecations for this release.

If so, you need to open a PR placeholder in the website repository.
The deadline will be on Tuesday 14th November 2023 (after the Docs deadline PR ready for review)

Here's the 1.29 Calendar

@harche
Copy link
Contributor

harche commented Oct 31, 2023

Hi @ehashman @pacoxu @SergeyKanzhelev ! 👋 from the v1.29 Release Team-Communications! We would like to check if you have any plans to publish a blog for this KEP regarding new features, removals, and deprecations for this release.

If so, you need to open a PR placeholder in the website repository. The deadline will be on Tuesday 14th November 2023 (after the Docs deadline PR ready for review)

Here's the 1.29 Calendar

I am afraid, due to pending issues we won't be able to graduate swap to beta2 in 1.29.

@npolshakova
Copy link

npolshakova commented Nov 1, 2023

Hello @ehashman @pacoxu @SergeyKanzhelev 👋 1.29 Enhancements lead here,

Unfortunately, the implementation (code related) PR(s) associated with this enhancement is not in the merge-ready state by code-freeze and hence this enhancement is now removed from the 1.29 milestone.

If you still wish to progress this enhancement in 1.29, please file an exception request. Thanks!

/milestone clear

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lead-opted-in Denotes that an issue has been opted in to a release sig/node Categorizes an issue or PR as relevant to SIG Node. stage/beta Denotes an issue tracking an enhancement targeted for Beta status
Projects
Status: Major Change
Status: Removed From Milestone
Status: Tracked
Status: Removed from Milestone
Node Swap Beta
  
In progress
Development

No branches or pull requests