cleanups based on discussion

kubernetes · Jul 6, 2020 · b19f391 · b19f391
1 parent 91d8810
commit b19f391
Showing 1 changed file with 58 additions and 90 deletions.
diff --git a/keps/sig-node/20181106-in-place-update-of-pod-resources.md b/keps/sig-node/20181106-in-place-update-of-pod-resources.md
@@ -157,6 +157,12 @@ ResourcesAllocated field for CPU and memory resources.
 Why is "allocated" spec?  The need for special admission control is a strong
 smell.  Why doesn't this belong in status?  If there's a compelling reason,
 this KEP should say so.
+
+Alternate proposal:
+- Add Pod.Status.ContainerStatuses[i].Resources and .ResourcesAllocated (so
+  consumers can see changes)
+- Store a local checkpoint holding the "allocated" values, to close the gap in
+  case of a status wipe or ill-timed reboot.
 <<[/UNRESOLVED]>>
 
 <<[UNRESOLVED]>>
@@ -168,39 +174,30 @@ access to pod spec.
 #### Container Resize Policy
 
 To provide fine-grained user control, PodSpec.Containers is extended with
-ResizePolicy - a list of named subobjects (new object) that supports 'cpu'
-and 'memory' as names. It supports the following policy values:
-* NoRestart - the default value; resize Container without restarting it,
-* RestartContainer - restart the Container in-place to apply new resource
-  values. (e.g. Java process needs to change its Xmx flag)
-
-By using ResizePolicy, user can mark Containers as safe (or unsafe) for
-in-place resource update. Kubelet uses it to determine the required action.
+ResizePolicy - a list of named subobjects (new object) that supports 'cpu' and
+'memory' as names. It supports the following policy values:
+* RestartNotRequired - default value; try to resize the Container without
+  restarting it, if possible.
+* Restart - the container requires a restart to apply new resource values.
+  (e.g.  Java process needs to change its Xmx flag) By using ResizePolicy, user
+  can mark Containers as safe (or unsafe) for in-place resource update. Kubelet
+  uses it to determine the required action.
+
+Note: `RestartNotRequired` does not guarantee that container won't be
+restarted. The runtime may choose to stop the container if it is unable to
+apply the new resources without doing so.
 
 Setting the flag to separately control CPU & memory is due to an observation
 that usually CPU can be added/removed without much problem whereas changes to
 available memory are more probable to require restarts.
 
-If more than one resource type with different policies are updated, then
-RestartContainer policy takes precedence over NoRestart policy.
-
-Additionally, if RestartPolicy is 'Never', ResizePolicy must be set to
-NoRestart in order to pass validation.
-
-<<[UNRESOLVED]>>
-Should something like RuntimeClass be allowed to say "no in-place restarts are
-allowed", so that users would get a synchronous error if they set this?  Or is
-that overkill?`
-<<[/UNRESOLVED]>>
-
-<<[UNRESOLVED]>>
-`resizePolicy: NoRestart` is not obvious.  To me that reads as "you may not
-restart this, but that's not what it means, right?  Can we phrase itbetter?
+If more than one resource type with different policies are updated at the same
+time, then any `Restart` policy takes precedence over `RestartNotRequired` policies.
 
-Do we need a policy like "RestartNotAllowed" which says the opposite of
-NoRestart - you MAY NOT restart me?  That seems complicated and will leave
-resizes in "limbo", so maybe not.
-<<[/UNRESOLVED]>>
+If a container's RestartPolicy is `Never`, the ResizePolicy must be set to
+`RestartNotRequired` to pass validation.  That said, any in-place resize may
+result in the container being stopped *and not restarted*, if the system can
+not perform the resize in place.
 
 #### CRI Changes
 
@@ -224,37 +221,9 @@ Node that accommodates the Pod.
 For a newly created Pod, Spec.Containers[i].ResourcesAllocated must match
 Spec.Containers[i].Resources.Requests. When Kubelet admits a new Pod, values in
 Spec.Containers[i].ResourcesAllocated are used to determine if there is enough
-room to admit the Pod. Kubelet does not set Pod's ResourcesAllocated after
+room to admit the Pod. Kubelet does not set Pod's ResourcesAllocated when
 admitting a new Pod.
 
-<<[UNRESOLVED]>>
-The above is confusing.  Does it mean that ResourcesAllocated will be set by
-someone other than Kubelet?  Who?  Why?
-
-Above you said 'limit access to ResourcesAllocated field such that only Kubelet
-can update this field' which is the opposite of 'kubelet does not set Pod's
-ResourcesAllocated after admitting' here.  I am very confused.
-
-Naively, I would expect it to be:
-
-Create:
-* user creates a pod with `resources`
-* scheduler schedules based on new pod `resources` and existing pods'
-  `allocated`
-* kubelet accepts the pod based on new pod `resources` and existing pods'
-  `allocated`
-* kubelet sets `allocated` = `resources` immediately (acts as an ACK)
-* kubelet sets `actual` = `resources` once created
-
-Update:
-* user updates a pod's `resources`
-* kubelet considers the new `resources` and other pods' `allocated`
-* if viable, kubelet sets `allocated` = `resources` immediately (ACK) and sets
-  `actual` = `resources` once resized
-* if not viable, kubelet does not change `allocated` and instead drops an event
-  and maybe a new status field or condition
-<<[/UNRESOLVED]>>
-
 When a Pod resize is requested, Kubelet attempts to update the resources
 allocated to the Pod and its Containers. Kubelet first checks if the new
 desired resources can fit the Node allocable resources by computing the sum of
@@ -266,41 +235,40 @@ new desired resources (i.e Spec.Containers[i].Resources.Requests) to the sum.
   UpdateContainerResources CRI API to update Container resource limits. Once
   all Containers are successfully updated, it updates
   Pod.Status.ContainerStatuses[i].Resources to reflect new resource values.
-* If new desired resources don't fit, Kubelet rejects the resize, and no
-  further action is taken.
-  - Kubelet retries the Pod resize at a later time.
+* If new desired resources don't fit, Kubelet will not act on the resize, and
+  no further action is taken at this time. Kubelet will retry the resize at a
+  later time.
 
 <<[UNRESOLVED]>>
-This is close to what I wrote above, but still vague.  What does it mean to
-"reject" a resize?  How do controllers know it has been deferred temporarily
-(will retry) or premantently (no more retries)? Does this put the pod into a
-new condition that higher-level controls need to know?
+As discussed in comments, "no signal" is a poor signal.  I advocate for at
+least this:
+
+"""
+Kubelet will generate Events on the Pod whenever a resize is acted upon or when
+a resize is deferred.  This will allow humans to know that progress is being
+made.
+"""
+
+I also advocate for this, though it can be deferred (though I bet we need it
+eventually):
+
+"""
+When a resize is in progress, Kubelet will create a Condition on the pod,
+called ResizeInProgress.  When a resize has been deferred, Kubelet will create
+a condition called ResizeDeferred.
+"""
 <<[/UNRESOLVED]>>
 
-If multiple Pods need resizing, they are handled sequentially in the order in
-which Pod additions and updates arrive at Kubelet.
-
-<<[UNRESOLVED]>>
-I'd call this over-specced.  Kubelet should have freedom to choose higher-prio
-pods or "easy" cases or whatever.
-<<[/UNRESOLVED]>>
+If multiple Pods need resizing, they are handled sequentially in an order
+defined by the Kubelet (e.g. in order of arrivial).
 
 Scheduler may, in parallel, assign a new Pod to the Node because it uses cached
 Pods to compute Node allocable values. If this race condition occurs, Kubelet
 resolves it by rejecting that new Pod if the Node has no room after Pod resize.
 
-<<[UNRESOLVED]>>
-What prevents the scheduler from doing the same thing again and again?  This is
-not specific to this KEP, but we don't have a mechanism to prevent this.  E.g.
-we COULD do something like:
-
-* scheduler proposes a pod to a node in a new "earmark" resource
-* kubelet ACKs the earmark with an expiry timestamp
-* scheduler binds the pod to the node
-
-This is obviously more complicated, so we can consider it separately, but it
-feels like a gap that will get worse.
-<<[/UNRESOLVED]>>
+Note: After a Pod is rejected, the scheduler could try to reschedule the
+replacement pod on the same node that just rejected it.  This is a general
+statement about Kubernetes and is outside the scope of this KEP.
 
 #### Kubelet Restart Tolerance
 
@@ -321,7 +289,7 @@ decisions.
 ### Flow Control
 
 The following steps denote a typical flow of an in-place resize operation for a
-Pod with ResizePolicy set to NoRestart for all its Containers.
+Pod with ResizePolicy set to RestartNotRequired for all its Containers.
 
 1. Initiating actor updates Pod's Spec.Containers[i].Resources via PATCH verb.
 1. API Server validates the new Resources. (e.g. Limits are not below
@@ -368,7 +336,7 @@ Pod with ResizePolicy set to NoRestart for all its Containers.
      - Do nothing and let Kubelet back off and later retry the in-place resize.
 
 <<[UNRESOLVED]>>
-I'm suggesting that there be a real signal indicating "I tried to resize, but I
+As above: I'm suggesting that there be a real signal indicating "I tried to resize, but I
 can't" beyond the lack of update to `allocated`.
 <<[/UNRESOLVED]>>
 
@@ -424,7 +392,7 @@ another container might be using those files.
 
 I think, instead, we need to say that resizes (especially but not exclusively
 shrinks) can fail.  A failed resize could cause container restarts even when
-NoRestart is specified.
+RestartNotRequired is specified.
 
 Can a failed resize cause a full pod teardown (including a new IP assignment,
 emptyDir volumes being wiped, etc)?  I don't know.  It seems consistent, to me,
@@ -586,15 +554,15 @@ Setup a namespace with min and max LimitRange and create a single, valid Pod.
 ### Resize Policy Tests
 
 Setup a guaranteed class Pod with two containers (c1 & c2).
-1. No resize policy specified, defaults to NoRestart. Verify that CPU and
+1. No resize policy specified, defaults to RestartNotRequired. Verify that CPU and
    memory are resized without restarting containers.
-1. NoRestart (cpu, memory) policy for c1, RestartContainer (cpu, memory) for c2.
+1. RestartNotRequired (cpu, memory) policy for c1, Restart (cpu, memory) for c2.
    Verify that c1 is resized without restart, c2 is restarted on resize.
-1. NoRestart cpu, RestartContainer memory policy for c1. Resize c1 CPU only,
+1. RestartNotRequired cpu, Restart memory policy for c1. Resize c1 CPU only,
    verify container is resized without restart.
-1. NoRestart cpu, RestartContainer memory policy for c1. Resize c1 memory only,
+1. RestartNotRequired cpu, Restart memory policy for c1. Resize c1 memory only,
    verify container is resized with restart.
-1. NoRestart cpu, RestartContainer memory policy for c1. Resize c1 CPU & memory,
+1. RestartNotRequired cpu, Restart memory policy for c1. Resize c1 CPU & memory,
    verify container is resized with restart.
 
 ### Backward Compatibility and Negative Tests