Min at zero - design doc #57

mwielgus · 2017-05-11T10:23:42Z

MaciekPytel · 2017-05-11T10:28:32Z

cluster-autoscaler/proposals/min_at_zero_gcp.md

+
+# Introduction
+
+One of the common requests for Cluster Autoscaler (for example:  [1], [2])  is the ability to scale some node groups to zero. This would definitely be a very useful feature but the implementation is somehow problematic in ScaleUP due to couple reasons:


Either actually link examples or remove 'for example' part.

s/somehow/somewhat

MaciekPytel · 2017-05-11T10:30:34Z

cluster-autoscaler/proposals/min_at_zero_gcp.md

+
+One of the common requests for Cluster Autoscaler (for example:  [1], [2])  is the ability to scale some node groups to zero. This would definitely be a very useful feature but the implementation is somehow problematic in ScaleUP due to couple reasons:
+
+* [P1] There is no live example of what a new node would look like if the currently zero-sized node group was expanded. The node shape is defined as:


Can we use some other letter for numbering problems (maybe I for 'issue')? I think most people interpret P1 as priority.

MaciekPytel · 2017-05-11T10:32:25Z

cluster-autoscaler/proposals/min_at_zero_gcp.md

+* [P3] There is no live example of what DaemonSets would be run on the new node.
+
+In general the above can be summarized as that the full definition of a new node needs to be somehow known before the node is actually created in order to decide whether the creation of a new node from a particular node group makes sense or not. Scale down has no issues with min@0.
+Design


/s/Design/# Design

MaciekPytel · 2017-05-11T10:33:46Z

cluster-autoscaler/proposals/min_at_zero_gcp.md

+In general the above can be summarized as that the full definition of a new node needs to be somehow known before the node is actually created in order to decide whether the creation of a new node from a particular node group makes sense or not. Scale down has no issues with min@0.
+Design
+
+Problems P1, P1A, P1B, P1C, P2, P3 needs to be solved. The primary focus is to create a solution for GCE/GKE but the proposed option should be generic enough to allow to expand this feature to other cloud providers if found needed and business-justified.


s/needs/need

MaciekPytel · 2017-05-11T10:36:53Z

cluster-autoscaler/proposals/min_at_zero_gcp.md

+custom-<cpu_count>-<memory_in_mb>
+```
+
+So it also quite easy to get all of capacity information from it.


MaciekPytel · 2017-05-11T10:39:45Z

cluster-autoscaler/proposals/min_at_zero_gcp.md

+
+### [P1B] - Node allocatable
+
+In GKE 1.5.6 allocatable for new nodes is equal to capacity. For simplicity we could assume that the new node will have the 90% (or -0.1cpu/-200mb) of capacity. Being wrong or underestimating here is not fatal, most users will probably be OK with this. Once some nodes are present we will have more precise estimates. The worst thing that can happen is that the scale up may not be triggered if the request is exactly at the node capacity - system pods.


That is not true on non-GKE cluster on GKE. We could at least mention that fact and put some sort of TODO to revisit this.

MaciekPytel · 2017-05-11T11:06:47Z

cluster-autoscaler/proposals/min_at_zero_gcp.md

+
+# Solution
+
+Given the all information above it should be relatively simple to write a module that given the access to GCP Api and Kubernetes API server. We will expand the NodeGroup interface (https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/cloud_provider.go#L40) with a method EstimateNodeShape, taking no parameters and returning NodeInfo (containing api.Node and all pods running by default on the node) or error if unable to do so.


Update the method name

MaciekPytel · 2017-05-11T11:09:49Z

cluster-autoscaler/proposals/min_at_zero_gcp.md

+
+only if the current size of the node group is 0 or all of the nodes are unready/broken. Otherwise CA will try to estimate the shape of the node using live examples to avoid repeating any mis-estimation errors.
+
+The EstimateNodeShape will also be run on CA startup to ensure that CA is able to build an example for the node pool should the node group min size was set to 0.


should the node group min size was set to 0 - that is not grammatical

MaciekPytel · 2017-05-11T12:23:44Z

cluster-autoscaler/proposals/min_at_zero_gcp.md

+
+# Design
+
+Problems P1, P1A, P1B, P1C, P2, P3 needs to be solved. The primary focus is to create a solution for GCE/GKE but the proposed option should be generic enough to allow to expand this feature to other cloud providers.


s/P1/1, etc

s/primary focus/primary focus of this document

MaciekPytel · 2017-05-11T12:24:54Z

cluster-autoscaler/proposals/min_at_zero_gcp.md

+custom-<cpu_count>-<memory_in_mb>
+```
+
+So it ia also quite easy to get all of the capacity information from it.


MaciekPytel · 2017-05-11T12:32:54Z

/lgtm

…-TestNodeGroupResize UPSTREAM: <carry>: openshift: Rework TestNodeGroupResize

Track usage in capacity status

mwielgus added area/cluster-autoscaler documentation area/provider/gcp Issues or PRs related to gcp provider labels May 11, 2017

mwielgus added this to the CA-0.6 milestone May 11, 2017

mwielgus assigned MaciekPytel May 11, 2017

mwielgus requested a review from MaciekPytel May 11, 2017 10:23

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 11, 2017

MaciekPytel reviewed May 11, 2017

View reviewed changes

mwielgus force-pushed the min-zero-design-doc branch from 3686608 to e9c557d Compare May 11, 2017 10:37

MaciekPytel reviewed May 11, 2017

View reviewed changes

mwielgus force-pushed the min-zero-design-doc branch from e9c557d to c7457a4 Compare May 11, 2017 12:17

MaciekPytel reviewed May 11, 2017

View reviewed changes

mwielgus force-pushed the min-zero-design-doc branch from c7457a4 to 7c6c675 Compare May 11, 2017 12:29

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 11, 2017

Min at zero - design doc

ff258b8

mwielgus force-pushed the min-zero-design-doc branch from 7c6c675 to ff258b8 Compare May 11, 2017 12:44

mwielgus merged commit e17f350 into kubernetes:master May 11, 2017

mwielgus mentioned this pull request May 11, 2017

Update scale to 0 proposal with allocatable estimation #60

Closed

frobware pushed a commit to frobware/autoscaler that referenced this pull request Mar 19, 2019

Merge pull request kubernetes#57 from frobware/unit-test-improvements…

dc2a750

…-TestNodeGroupResize UPSTREAM: <carry>: openshift: Rework TestNodeGroupResize

yaroslava-serdiuk pushed a commit to yaroslava-serdiuk/autoscaler that referenced this pull request Feb 22, 2024

Merge pull request kubernetes#57 from alculquicondor/capacity_status

d74ccc6

Track usage in capacity status

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Min at zero - design doc #57

Min at zero - design doc #57

mwielgus commented May 11, 2017

MaciekPytel May 11, 2017

MaciekPytel May 11, 2017

mwielgus May 11, 2017

MaciekPytel May 11, 2017

mwielgus May 11, 2017

MaciekPytel May 11, 2017

mwielgus May 11, 2017

MaciekPytel May 11, 2017

MaciekPytel May 11, 2017

MaciekPytel May 11, 2017

mwielgus May 11, 2017

MaciekPytel May 11, 2017

mwielgus May 11, 2017

MaciekPytel May 11, 2017

mwielgus May 11, 2017

MaciekPytel May 11, 2017

MaciekPytel May 11, 2017

MaciekPytel May 11, 2017

MaciekPytel commented May 11, 2017


		# Introduction

		One of the common requests for Cluster Autoscaler (for example: [1], [2]) is the ability to scale some node groups to zero. This would definitely be a very useful feature but the implementation is somehow problematic in ScaleUP due to couple reasons:


		One of the common requests for Cluster Autoscaler (for example: [1], [2]) is the ability to scale some node groups to zero. This would definitely be a very useful feature but the implementation is somehow problematic in ScaleUP due to couple reasons:

		* [P1] There is no live example of what a new node would look like if the currently zero-sized node group was expanded. The node shape is defined as:


		### [P1B] - Node allocatable

		In GKE 1.5.6 allocatable for new nodes is equal to capacity. For simplicity we could assume that the new node will have the 90% (or -0.1cpu/-200mb) of capacity. Being wrong or underestimating here is not fatal, most users will probably be OK with this. Once some nodes are present we will have more precise estimates. The worst thing that can happen is that the scale up may not be triggered if the request is exactly at the node capacity - system pods.


		# Solution

		Given the all information above it should be relatively simple to write a module that given the access to GCP Api and Kubernetes API server. We will expand the NodeGroup interface (https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/cloud_provider.go#L40) with a method EstimateNodeShape, taking no parameters and returning NodeInfo (containing api.Node and all pods running by default on the node) or error if unable to do so.


		only if the current size of the node group is 0 or all of the nodes are unready/broken. Otherwise CA will try to estimate the shape of the node using live examples to avoid repeating any mis-estimation errors.

		The EstimateNodeShape will also be run on CA startup to ensure that CA is able to build an example for the node pool should the node group min size was set to 0.


		# Design

		Problems P1, P1A, P1B, P1C, P2, P3 needs to be solved. The primary focus is to create a solution for GCE/GKE but the proposed option should be generic enough to allow to expand this feature to other cloud providers.

Min at zero - design doc #57

Min at zero - design doc #57

Conversation

mwielgus commented May 11, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MaciekPytel commented May 11, 2017