In-Place Vertical Pod Scaling KEP to implementable, and mini-KEP for CRI extensions #1342

vinaykul · 2019-10-28T23:22:57Z

One of the items/comments that is related to In-Place Pod Vertical Scaling KEP is to extend/update the CRI API to better support different Container runtimes such as Windows etc for resource update.

This mini KEP outlines the proposed changes to CRI API to accomplish that review item. This does not block implementation of Vertical Scaling KEP, but would be good to have in the time-frame of implementation of In-Place Pod Vertical Scaling feature.

CC: @PatrickLang @dashpole @derekwaynecarr @dchen1107 @yujuhong

k8s-ci-robot · 2019-10-28T23:23:05Z

Hi @vinaykul. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

This reverts commit 29d95bc.

vinaykul · 2019-10-28T23:59:06Z

/assign @derekwaynecarr @mwielgus

vinaykul · 2019-10-29T00:27:07Z

/label api-review

dchen1107 · 2019-10-29T17:30:38Z

cc/ @yliaog on Windows Containers

dchen1107 · 2019-10-29T17:33:40Z

cc/ @Random-Liu on CRI-containerd for Windows

keps/sig-autoscaling/20181106-in-place-update-of-pod-resources.md

keps/sig-node/20191025-kubelet-container-resources-cri-api-changes.md

vinaykul · 2019-11-09T00:17:45Z

@liggitt I'm planning to attend your Live API review session at the K8s contributor summit in San Diego.

If you have additional time, do you think we can review the primary KEP and perhaps this mini-KEP (if applicable) as part of your session, or some other time during KubeCon if you or another reviewer is available?

CC @dashpole - I hope you are coming there :)

dashpole · 2019-11-09T00:32:49Z

@vinaykul I added it to my schedule.

keps/sig-node/20191025-kubelet-container-resources-cri-api-changes.md

keps/sig-node/20181106-in-place-update-of-pod-resources.md

vinaykul

updated the graduation criteria per sig-node discussion.

vinaykul · 2020-01-24T14:13:22Z

Here's the API change code preview - vinaykul/kubernetes#1 , specifically commit-id vinaykul/kubernetes@2a1aedd

@liggitt Please review the API change and the admission controller part. Is this what you had in mind?

CC: @dashpole @thockin @derekwaynecarr @dchen1107

liggitt · 2020-01-24T19:14:31Z

keps/sig-node/20181106-in-place-update-of-pod-resources.md

@@ -309,7 +334,7 @@ before applying limit increases.

 Pod v1 core API:
 * extended model,
-* new subresource,
+* new admission controller,
 * added validation.

 Admission Controllers: LimitRanger, ResourceQuota need to support Pod Updates:


The flow described above requires kubelets to update pod spec, which they do not have permission to do today.

That involves changing:

the node authorizer to permit kubelets to patch/update the pods resource (not just the pods/status subresource)

the NodeRestriction admission plugin to understand what types of updates a kubelet is allowed to make to a pod (we would not want to allow arbitrary label/image updates, for example)

cc @tallclair

Good point.

From what I can see, the simplest way is to introduce admitPodUpdate method to NodeRestriction plugin that verifies only ResourcesAllocated field is being touched, and that node updating the pod owns the pod. I'll try it out and see if that covers it without leaving any holes.

For authorization, I have modified NodeRules() in plugin/pkg/auth/authorizer/rbac/bootstrappolicy/policy.go to allow nodes to update the pod resource. (They are allowed to create and delete pods at this time).

The above approach is consistent with how pod creates and deletes by node are handled.

Since the proposal is scoped to support only cpu and memory resources, is kubelet only authorized to change those values? I am assuming that we would want the kubelet to report all resources allocated and enforced (not just cpu and memory), but we would not want to let a user change the pod spec in validation for anything other than cpu and memory? Is that an accurate understanding?

the alternative is that the pod admission plugin sets allocated for all resources other than cpu/memory, but that would make extending this support to other future resource types challenging.

Kubelet can continue to report status on all resources. I don't see a need to restrict status changes for the new Resources field. However, for ResourcesAllocated field, it is best to start with allowing Node to change what's actually supported now. As we add support for other resource types, we can just add to the list of supported resource types in the admission plugin.

And yes, for the user, we definitely want to lock it down to just what's supported - cpu and memory

liggitt · 2020-01-24T19:38:34Z

keps/sig-node/20181106-in-place-update-of-pod-resources.md

@@ -363,13 +388,131 @@ Other components:
   could be in use, and approaches such as setting limit near current usage may
   be required. This issue needs further investigation.



since this proposed adding a new field to pod spec, we need to consider the following cases:

updates by clients unaware of the new field, which preserve it and send existing the existing value (e.g. dynamic clients using unstructured json requests/responses, or clients using patch)

since those clients would not currently be successfully changing resources, there's probably nothing special that needs to be done for these clients

updated by clients unaware of the new field, which drop it on update (e.g. old versions of client-go)

an update request from a client like this would set the new field to nil. The server must not treat that as an attempt by the client to clear the field (and forbid based on an authorization check, etc), but must maintain compatibility with existing clients by copying the value from the existing pod

Since the ResourcesAllocated field is in pod spec, and pod spec is also used inside pod templates, are we intending to allow/disallow this field to be set inside workload API types (e.g. daemonset, deployment)? Unless we actively prevent it, values can be set for that field in those types, and we have to think through how to handle updates from old clients for those types as well.

For Controllers: propagate Template resources update to running Pod instances, has that been investigated and proven feasible? There are multiple mechanisms controllers use to match up particular child resources with particular generations of the parent resource, and it would be good to know if some (like hashing of the pod template to determine a label for the child resource's selector) are incompatible with in-place update of the pod template without rolling out a new instance of the child.

Yes. Pardon my ignorance with admission controllers, I've just started playing with it a few weeks ago. But I believe I should be able to mutate it with the new PodResourceAllocation controller - I'll look deeper into this. Is there a wiki that I can use to experiment with upgrade?

About controllers, we had the propagation working with Job and Deployment controllers in our old design prototype code. But I'll remove this from the scope of the current KEP - VPA cares about updating running pods, and I don't want to commit to it as I need to budget for a few surprises as I do a thorough implementation of the admission control changes and handle upgrade scenario. So we will disallow updating template nested pods. This can always be added as a subsequent enhancement.

@liggitt I dug a bit more into updating controller templates. Currently, we cannot update Resources field for Job controllers, but allowed to do so for Deployment controllers - it results in Pods being recreated with the new desired resources.

I want to keep the same behavior - if we attempted to disallow it because of this feature, it would be a breaking change.

In 1.19 or another future release, we can perhaps consider propagating the template resource change to running pods (as we had done in our old design PoC). So I'll clarify the KEP to state that current behavior will be maintained for template Pod Resources updates.

In 1.19 or another future release, we can perhaps consider propagating the template resource change to running pods (as we had done in our old design PoC). So I'll clarify the KEP to state that current behavior will be maintained for template Pod Resources updates.

If vertical scaling is only done on individual pod instances, that means a new rollout of a deployment will reset all resource use back to the original levels? Is that acceptable? That seems likely to cause problems if current pods were scaled up in response to load, then a rollout drops capacity back down significantly.

Or is the idea that a separate process would determine the average required resources and propagate that back into the workload template at some interval?

Or is the idea that a separate process would determine the average required resources and propagate that back into the workload template at some interval?

Yes. Current VPA behavior is to make resource recommendations based on historical measurements and current usage, and optionally apply those recommendations during admission control if the user chooses to allow VPA to control the resources. New recommendations are currently applied by evicting the current pod so that it hits the admission controller.

At this time, we want to keep the current behavior aside from the added ability for VPA to request a pod to be resized without restart.

I think the pod admission mutation makes sense as long as that happens prior to quota evaluation.

btw, i appreciate this additional detail.

@liggitt I'm able to take care of updates from older client-go versions by setting default values on create, and copying old object values on update by handling it in admission controller mutating phase rather than defaults.go. Doing this in defaults.go would attempt to set the values that were dropped by older client-go to default values and this we would lose data.

I was able to test this out by writing a little tool similar to staging/src/k8s.io/client-go/examples/create-update-delete-deployment, but one that calls Pods(ns).Update()

Validation allows Resources and ResourcesAllocated fields to be mutable only for PodSpec, and podresourceallocation and noderestriction plugins handle what user can do and what node can update.

Please review PR vinaykul/kubernetes#1

liggitt · 2020-01-24T19:56:31Z

keps/sig-node/20181106-in-place-update-of-pod-resources.md

 ## Graduation Criteria

-TODO
+### Alpha
+- In-Place Pod Resouces Update functionality is implemented,


for which controllers?

Just the pod for now. I'll update the KEP and remove controller propagation from the scope.

keps/sig-node/20181106-in-place-update-of-pod-resources.md

liggitt · 2020-01-24T20:02:31Z

keps/sig-node/20181106-in-place-update-of-pod-resources.md

+
+### Negative Tests
+TBD
+


Given this touches a field involved in pod spec, pod template spec, and workload controllers, we need tests to make sure introduction of this does not cause workloads to redeploy on API server upgrade (e.g. kubernetes/kubernetes#78633); tests that look something like what is described in kubernetes/kubernetes#78904, and which are actually run

…Restriction extension to limit what Node can access in PodSpec

derekwaynecarr

This is getting really close. I would like to clarify if API server validation enforces that only cpu and memory are allowed to change in the pod spec.

derekwaynecarr · 2020-01-27T21:48:21Z

keps/sig-node/20181106-in-place-update-of-pod-resources.md

@@ -309,7 +334,7 @@ before applying limit increases.

 Pod v1 core API:
 * extended model,
-* new subresource,
+* new admission controller,
 * added validation.

 Admission Controllers: LimitRanger, ResourceQuota need to support Pod Updates:


Since the proposal is scoped to support only cpu and memory resources, is kubelet only authorized to change those values? I am assuming that we would want the kubelet to report all resources allocated and enforced (not just cpu and memory), but we would not want to let a user change the pod spec in validation for anything other than cpu and memory? Is that an accurate understanding?

derekwaynecarr · 2020-01-27T21:52:28Z

keps/sig-node/20181106-in-place-update-of-pod-resources.md

@@ -363,13 +388,131 @@ Other components:
   could be in use, and approaches such as setting limit near current usage may
   be required. This issue needs further investigation.



I think the pod admission mutation makes sense as long as that happens prior to quota evaluation.

btw, i appreciate this additional detail.

derekwaynecarr · 2020-01-27T21:54:12Z

keps/sig-node/20181106-in-place-update-of-pod-resources.md

@@ -309,7 +334,7 @@ before applying limit increases.

 Pod v1 core API:
 * extended model,
-* new subresource,
+* new admission controller,
 * added validation.

 Admission Controllers: LimitRanger, ResourceQuota need to support Pod Updates:


the alternative is that the pod admission plugin sets allocated for all resources other than cpu/memory, but that would make extending this support to other future resource types challenging.

vinaykul · 2020-01-27T22:52:02Z

This is getting really close. I would like to clarify if API server validation enforces that only cpu and memory are allowed to change in the pod spec.

@derekwaynecarr Yes. I'll call this out explicitly in the KEP's affected components section and in the test plan. IIRC someone asked about resizing ephemeral storage, but I have scoped it out of this KEP and listed it as a potential future enhancement.

And same holds for Kubelet authorization as well. During Pod creation, we set default value of ResouresAllocated (SetDefaults_Pod function) equal to Resources.Requests if it is not set. And if it is set by user, we validate that it matches Resource.Requests. (At this time we don't support user requesting a resource allocation different from desired, but @dashpole had brought it up and we discussed it and left it as a possible future extension). Net result is that Node admits a pod at requested resources == resourcesAllocated or not at all (current pod admit behavior)

And yes, I do have our new plugin ordered before ResourceQuota plugin. I'll call it out explicitly in the KEP.

…ode's granular access details on access to PodSpec

vinaykul · 2020-01-28T04:03:23Z

@liggitt @derekwaynecarr Please see if the last two commits resolve the concerns you have. Thanks,

derekwaynecarr · 2020-01-28T16:33:42Z

@vinaykul thank you for the clarifications. this looks good to proceed on implementation.

will allow @liggitt to ack as well.

/lgtm

vinaykul · 2020-01-28T18:13:43Z

@derekwaynecarr There was a silly error in the CRI KEP yaml formatting and I had to make a commit to fix that, and it removed the /lgtm label. Could you please lgtm it again? Thanks and sorry for the extra ask.

@liggitt Can you please review and let me know if your issues have been addressed? Thanks.

vinaykul · 2020-01-28T18:14:46Z

/retest

k8s-ci-robot · 2020-01-28T18:15:00Z

@vinaykul: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

derekwaynecarr · 2020-01-28T18:21:04Z

/lgtm

k8s-ci-robot · 2020-01-28T18:22:03Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: derekwaynecarr, thockin, vinaykul

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/sig-node/OWNERS~~ [derekwaynecarr]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

haroon3rd · 2021-03-23T22:18:09Z

Hi @vinaykul, what is the update on In-Place Vertical Pod Scaling?

vinaykul · 2021-03-25T23:40:06Z

Hi @vinaykul, what is the update on In-Place Vertical Pod Scaling?

I'm waiting for @derekwaynecarr to review the changes I and @thockin worked out in PR #1883

Once Derek signs off, I plan to start implementation of the new API and design. I'll follow up with him in next week's meeting .. have been busy with other stuff for the past couple of weeks and didn't get to follow up with Derek. I still think we can make it for 1.22

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 28, 2019

k8s-ci-robot requested review from dchen1107 and derekwaynecarr October 28, 2019 23:23

k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Oct 28, 2019

vinaykul added 3 commits October 28, 2019 16:40

Mini-KEP for Kubelet Container Resources CRI extension/update

d8cc582

Address key open items and move KEP to implementable state

bf3d694

Revert "Address key open items and move KEP to implementable state"

f9d5553

This reverts commit 29d95bc.

vinaykul force-pushed the master branch from 69dd916 to f9d5553 Compare October 28, 2019 23:50

Address key open items, move KEP to implementable state

f6f092f

k8s-ci-robot assigned derekwaynecarr and mwielgus Oct 28, 2019

k8s-ci-robot added the api-review Categorizes an issue or PR as actively needing an API review. label Oct 29, 2019

vinaykul mentioned this pull request Oct 29, 2019

In-Place Update of Pod Resources #1287

Open

26 tasks

Random-Liu reviewed Oct 29, 2019

View reviewed changes

keps/sig-autoscaling/20181106-in-place-update-of-pod-resources.md Outdated Show resolved Hide resolved

keps/sig-node/20191025-kubelet-container-resources-cri-api-changes.md Outdated Show resolved Hide resolved

liggitt added this to Unassigned in API Reviews Oct 30, 2019

riking reviewed Nov 4, 2019

View reviewed changes

keps/sig-node/20191025-kubelet-container-resources-cri-api-changes.md Show resolved Hide resolved

Update CRI changes mini-KEP based on feedback, add design details

252a686

yujuhong reviewed Jan 21, 2020

View reviewed changes

keps/sig-node/20191025-kubelet-container-resources-cri-api-changes.md Show resolved Hide resolved

derekwaynecarr reviewed Jan 21, 2020

View reviewed changes

keps/sig-node/20181106-in-place-update-of-pod-resources.md Outdated Show resolved Hide resolved

Update graduation criteria per review feedback

cad896d

vinaykul commented Jan 21, 2020

View reviewed changes

riking approved these changes Jan 24, 2020

View reviewed changes

liggitt reviewed Jan 24, 2020

View reviewed changes

Call out RBAC policy allowing Node update access to PodSpec, and Node…

8655fa3

…Restriction extension to limit what Node can access in PodSpec

derekwaynecarr requested changes Jan 27, 2020

View reviewed changes

Call out Node authorization changes, NodeRestriction extension, and N…

c2b750c

…ode's granular access details on access to PodSpec

k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jan 28, 2020

Make yaml linter happy

8249ed0

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 28, 2020

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 28, 2020

k8s-ci-robot merged commit 1b71b99 into kubernetes:master Jan 28, 2020

k8s-ci-robot added this to the v1.18 milestone Jan 28, 2020

This was referenced Jan 15, 2021

Migrated KEP to new format | sig-node | kubelet-container-resources-cri-api-changes #2266

Merged

Container Resources CRI API Changes for Pod Vertical Scaling #2273

Closed

Random-Liu mentioned this pull request Jul 1, 2021

In-place Pod Vertical Scaling feature kubernetes/kubernetes#102884

Merged

		@@ -363,13 +388,131 @@ Other components:
		could be in use, and approaches such as setting limit near current usage may
		be required. This issue needs further investigation.


		### Negative Tests
		TBD

In-Place Vertical Pod Scaling KEP to implementable, and mini-KEP for CRI extensions #1342

In-Place Vertical Pod Scaling KEP to implementable, and mini-KEP for CRI extensions #1342

Conversation

vinaykul commented Oct 28, 2019

k8s-ci-robot commented Oct 28, 2019

vinaykul commented Oct 28, 2019

vinaykul commented Oct 29, 2019

dchen1107 commented Oct 29, 2019

dchen1107 commented Oct 29, 2019

vinaykul commented Nov 9, 2019

dashpole commented Nov 9, 2019

vinaykul left a comment

Choose a reason for hiding this comment

vinaykul commented Jan 24, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liggitt Jan 27, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vinaykul Jan 25, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

derekwaynecarr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vinaykul commented Jan 27, 2020

vinaykul commented Jan 28, 2020

derekwaynecarr commented Jan 28, 2020

vinaykul commented Jan 28, 2020

vinaykul commented Jan 28, 2020

k8s-ci-robot commented Jan 28, 2020

derekwaynecarr commented Jan 28, 2020

k8s-ci-robot commented Jan 28, 2020

haroon3rd commented Mar 23, 2021

vinaykul commented Mar 25, 2021

liggitt Jan 27, 2020 •

edited

vinaykul Jan 25, 2020 •

edited