Support in-place Pod vertical scaling in VPA #4016

noBlubb · 2021-04-15T12:46:07Z

Hey everyone,

as I gather the VPA currently cannot update pods without recreating them:

Once restart free ("in-place") update of pod requests is available
from README

and neither can the GKE vertical scaler:

Due to Kubernetes limitations, the only way to modify the resource requests of a running Pod is to recreate the Pod
from https://cloud.google.com/kubernetes-engine/docs/concepts/verticalpodautoscaler#vertical_pod_autoscaling_in_auto_mode

Unfortunately, I was unable to learn the specific limitation from this (other than the mere absence of any such feature?) nor timeline for this to appear in VPA or how to contribute on this if possible. Could you please outline what is missing in VPA for this to be implemented?

Best regards,
Raffael

morganchristiansson · 2021-04-28T08:23:07Z

Would be nice with more details on the status feature. I would guess it's limitation in Kubernetes or from a lower level like containerd or kernel?

bskiba · 2021-04-28T08:33:35Z

At this moment this is a Kubernetes limitation (kernel and container runtime already supports resizing containers). There is work needed in scheduler, kubelet, core API so a pretty cross-cutting problem. Also a lot of systems assumed pod sized are immutable for a long time so there is need to untangle those as well.

There is ongoing work in Kubernetes to provide in-place pod resizes (Example: kubernetes/enhancements#1883). Once that work completes VPA will be able to take advantage of that.

k8s-triage-robot · 2021-07-27T11:25:13Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

k8s-triage-robot · 2021-08-26T12:25:04Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Jeffwan · 2021-08-27T05:33:19Z

/remove-lifecycle rotten

jmo-qap · 2021-09-15T05:30:47Z

kubernetes/kubernetes#102884

k8s-triage-robot · 2022-02-27T10:12:26Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

jbartosik · 2022-02-28T10:10:54Z

/remove-lifecycle rotten

jbartosik · 2022-02-28T10:32:57Z

/remove-lifecycle stale

k8s-triage-robot · 2022-05-29T11:12:16Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

jbartosik · 2022-05-31T13:03:01Z

/remove-lifecycle stale

k8s-triage-robot · 2022-08-29T13:35:22Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

jbartosik · 2022-09-02T11:50:00Z

/remove-lifecycle stale

Support for in-place updates didn't make it into K8s 1.25 but it aiming for 1.26.

k8s-triage-robot · 2022-12-01T12:25:16Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

voelzmo · 2022-12-02T09:45:09Z

/remove-lifecycle stale
Feature didn't make it in 1.26, but now targeted for 1.27 ;)

frivoire · 2023-01-05T11:50:49Z

This issue seems to be a duplicate of: #5046
Shouldn't we close one of those 2 issues ?

jbartosik · 2023-06-21T11:19:10Z

There are some open issues related to the feature: https://github.com/kubernetes/kubernetes/issues?q=is%3Aissue+is%3Aopen+%5BFG%3AInPlacePodVerticalScaling%5D

Most relevant seem:

[FG:InPlacePodVerticalScaling] If pod resize request exceeds node allocatable, fail it in admission handler kubernetes#114203 - until this is resolved VPA needs to decide an in-place update will not succeed after some time (maybe we need to do that even after this is resolved - we're not evicting other pods to make space for the one we want to scale up)
[FG:InPlacePodVerticalScaling] Pod Resize - long delay in updating apiPodStatus.Resources kubernetes#112264 - we can't give up on in place updates too quickly, even successful ones take at least minute or so in my experience
[FG:InPlacePodVerticalScaling] Add /resize subresource to request pod resource resizing kubernetes#109553 - with this we can patch only subresource

SergeyKanzhelev · 2023-07-25T22:34:44Z

I don't think the VPA should look at the ResizePolicy field in PodSpec.containers at all.

API currently is limited and not supporting the notion of "apply changes if possible without restart and not apply otherwise". Which may impact PDB. I don't know how autoscaler deals with PDB today, but if there will be higher frequency autoscaling with InPlace update hoping for non disruptive change, this will not work. In other words, we either need a new API to resize ONLY without the restart or treat a resize as a disruption affecting PDB.

voelzmo · 2023-07-26T11:36:05Z

@SergeyKanzhelev thanks for joining the discussion!

I don't know how [vertical pod] autoscaler deals with PDB today

Today, VPA uses the eviction API, which respects PDB.

we either need a new API to resize ONLY without the restart or treat a resize as a disruption affecting PDB.

I'm not sure which component the "we" part in this sentence is, but in general, I tend to agree with the need for an API that respects PDB. If kubelet needs to restart the Pod for applying a resource change, this should count towards PDB. However, I think this shouldn't be a concern that VPA has to deal with. Similarly to eviction, VPA should just be using an API that respects PDB if we consider this relevant for the restart case as well.

Regarding my statement from above

I don't think the VPA should look at the ResizePolicy field in PodSpec.containers at all.

This is no longer correct, as @jbartosik opted for a more informed approach in the enhancement proposal. Currently, VPA implements some constraints to ensure resource updates don't happen too frequently (for example, by requiring a mimimum absolute/relative change for Pods which have been running for shorter than 12 hours). The proposal contains the idea to change these constraints if a Container has ResizePolicy: NotRequired.

k8s-triage-robot · 2024-01-25T07:17:27Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

jbartosik · 2024-01-25T08:13:46Z

/remove-lifecycle stale
/lifecycle frozen

nikimanoledaki · 2024-09-27T12:36:54Z

Hi folks, could someone share a summary of what is blocking this feature please? +1 that this would be really useful to reduce workload evictions. Thank you!

voelzmo · 2024-09-30T08:46:35Z

I think the summary is: the kubernetes feature for in-place resource updates is in alpha stage and there are still many things to be done before it will be promoted to beta status. See kubernetes/enhancements#4704 for a summary and the ongoing discussion.
As for beta, many things will fundamentally change e.g. what the API for this feature is (they're e.g. talking about an introduction of a /resize subresource), I don't think we can start working on this from the VPA side before the feature reaches beta state in k/k.

adrianmoisey · 2024-09-30T13:00:26Z

Also note that a work-in-progress PR does exist: #6652

sftim · 2024-10-11T16:23:33Z

Help with the implementation (both for Pod-level resizing, and automatically managing that size) is very welcome.

adrianmoisey · 2024-10-11T17:49:05Z

Help with the implementation (both for Pod-level resizing, and automatically managing that size) is very welcome.

If someone did want to help, where can they go to get involved?

kennangaibel · 2024-10-31T23:28:29Z

Help with the implementation (both for Pod-level resizing, and automatically managing that size) is very welcome.

@sftim I would also be interested in helping

SergeyKanzhelev · 2024-11-05T06:16:55Z

The proposal contains the idea to change these constraints if a Container has ResizePolicy: NotRequired.

To re-iterate on #4016 (comment), there is no API exposed that would mean "no restart resize". Checking for ResizePolicy may only give information when container if DEFINITELY WILL BE restarted, but if the policy is not this, container MAY be restarted on resize.

If the API "no restart resize" is required, it will be a great feedback for the KEP.

We cannot just assume there will be no restart. VPA will need to have some sort of logic of respecting PDB and detecting restarts as well as a way for users to block VPA for a specfic Pod or Container.

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 27, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 26, 2021

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 27, 2021

jbartosik added the area/vertical-pod-autoscaler label Sep 15, 2021

jbartosik self-assigned this Nov 29, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 27, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 28, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 29, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 31, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 29, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 2, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 1, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 2, 2022

This was referenced May 9, 2023

VPA: Support vertical scaling of injected containers #5617

Open

AEP for support of in-place updates for VPA #5755

Merged

This was referenced Jun 21, 2023

Apply fixes to in place support VPA AEP #5877

Merged

[FG:InPlacePodVerticalScaling] Pod Resize - long delay in updating apiPodStatus.Resources kubernetes/kubernetes#112264

Closed

jbartosik mentioned this issue Jun 26, 2023

Clarification on restart-free updates with VPA #5885

Closed

jbartosik mentioned this issue Jul 24, 2023

VPA daemonset recommendations per-pod based on node metadata #5928

Open

jbartosik mentioned this issue Sep 4, 2023

VPA recommender is restarting continuously #6049

Closed

jbartosik mentioned this issue Oct 11, 2023

Kubernetes VPA as recommender for resource specification estimation #6123

Closed

wu0407 mentioned this issue Nov 7, 2023

Can VPA change pod requests without recreate (in k8s 1.28 with InPlacePodVerticalScaling feature gate)? #6244

Closed

durban mentioned this issue Dec 6, 2023

Dynamically resize the WSTP if availableProcessors() changes typelevel/cats-effect#3909

Open

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 25, 2024

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 25, 2024

jkyros linked a pull request Mar 25, 2024 that will close this issue

VPA: Implement in-place updates support #6652

Draft

nemobis mentioned this issue Apr 4, 2024

[AKS] Autoscale agentpool based on node-level CPU usage metrics #6690

Closed

voelzmo mentioned this issue Jul 24, 2024

Support for inPlaceVerticalScaling - Timelines ?? #6680

Closed

bouaouda-achraf mentioned this issue Aug 4, 2024

[FR] Vertical Pod Autoscaling leveraging InPlacePodVerticalScaling kubernetes/kubernetes#122836

Closed

voelzmo mentioned this issue Oct 1, 2024

KEP-1287: InPlacePodVerticalScaling BETA update kubernetes/enhancements#4704

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support in-place Pod vertical scaling in VPA #4016

Support in-place Pod vertical scaling in VPA #4016

noBlubb commented Apr 15, 2021

morganchristiansson commented Apr 28, 2021

bskiba commented Apr 28, 2021

k8s-triage-robot commented Jul 27, 2021

k8s-triage-robot commented Aug 26, 2021

Jeffwan commented Aug 27, 2021

jmo-qap commented Sep 15, 2021

k8s-triage-robot commented Feb 27, 2022

jbartosik commented Feb 28, 2022

jbartosik commented Feb 28, 2022

k8s-triage-robot commented May 29, 2022

jbartosik commented May 31, 2022

k8s-triage-robot commented Aug 29, 2022

jbartosik commented Sep 2, 2022

k8s-triage-robot commented Dec 1, 2022

voelzmo commented Dec 2, 2022

frivoire commented Jan 5, 2023

jbartosik commented Jun 21, 2023

SergeyKanzhelev commented Jul 25, 2023

voelzmo commented Jul 26, 2023

k8s-triage-robot commented Jan 25, 2024

jbartosik commented Jan 25, 2024

nikimanoledaki commented Sep 27, 2024

voelzmo commented Sep 30, 2024

adrianmoisey commented Sep 30, 2024

sftim commented Oct 11, 2024

adrianmoisey commented Oct 11, 2024

kennangaibel commented Oct 31, 2024

SergeyKanzhelev commented Nov 5, 2024

Support in-place Pod vertical scaling in VPA #4016

Support in-place Pod vertical scaling in VPA #4016

Comments

noBlubb commented Apr 15, 2021

morganchristiansson commented Apr 28, 2021

bskiba commented Apr 28, 2021

k8s-triage-robot commented Jul 27, 2021

k8s-triage-robot commented Aug 26, 2021

Jeffwan commented Aug 27, 2021

jmo-qap commented Sep 15, 2021

k8s-triage-robot commented Feb 27, 2022

jbartosik commented Feb 28, 2022

jbartosik commented Feb 28, 2022

k8s-triage-robot commented May 29, 2022

jbartosik commented May 31, 2022

k8s-triage-robot commented Aug 29, 2022

jbartosik commented Sep 2, 2022

k8s-triage-robot commented Dec 1, 2022

voelzmo commented Dec 2, 2022

frivoire commented Jan 5, 2023

jbartosik commented Jun 21, 2023

SergeyKanzhelev commented Jul 25, 2023

voelzmo commented Jul 26, 2023

k8s-triage-robot commented Jan 25, 2024

jbartosik commented Jan 25, 2024

nikimanoledaki commented Sep 27, 2024

voelzmo commented Sep 30, 2024

adrianmoisey commented Sep 30, 2024

sftim commented Oct 11, 2024

adrianmoisey commented Oct 11, 2024

kennangaibel commented Oct 31, 2024

SergeyKanzhelev commented Nov 5, 2024