New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Min at zero - design doc #57
Conversation
|
||
# Introduction | ||
|
||
One of the common requests for Cluster Autoscaler (for example: [1], [2]) is the ability to scale some node groups to zero. This would definitely be a very useful feature but the implementation is somehow problematic in ScaleUP due to couple reasons: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either actually link examples or remove 'for example' part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/somehow/somewhat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
One of the common requests for Cluster Autoscaler (for example: [1], [2]) is the ability to scale some node groups to zero. This would definitely be a very useful feature but the implementation is somehow problematic in ScaleUP due to couple reasons: | ||
|
||
* [P1] There is no live example of what a new node would look like if the currently zero-sized node group was expanded. The node shape is defined as: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use some other letter for numbering problems (maybe I for 'issue')? I think most people interpret P1 as priority.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* [P3] There is no live example of what DaemonSets would be run on the new node. | ||
|
||
In general the above can be summarized as that the full definition of a new node needs to be somehow known before the node is actually created in order to decide whether the creation of a new node from a particular node group makes sense or not. Scale down has no issues with min@0. | ||
Design |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/s/Design/# Design
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
In general the above can be summarized as that the full definition of a new node needs to be somehow known before the node is actually created in order to decide whether the creation of a new node from a particular node group makes sense or not. Scale down has no issues with min@0. | ||
Design | ||
|
||
Problems P1, P1A, P1B, P1C, P2, P3 needs to be solved. The primary focus is to create a solution for GCE/GKE but the proposed option should be generic enough to allow to expand this feature to other cloud providers if found needed and business-justified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/needs/need
custom-<cpu_count>-<memory_in_mb> | ||
``` | ||
|
||
So it also quite easy to get all of capacity information from it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/it/it is
3686608
to
e9c557d
Compare
|
||
### [P1B] - Node allocatable | ||
|
||
In GKE 1.5.6 allocatable for new nodes is equal to capacity. For simplicity we could assume that the new node will have the 90% (or -0.1cpu/-200mb) of capacity. Being wrong or underestimating here is not fatal, most users will probably be OK with this. Once some nodes are present we will have more precise estimates. The worst thing that can happen is that the scale up may not be triggered if the request is exactly at the node capacity - system pods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is not true on non-GKE cluster on GKE. We could at least mention that fact and put some sort of TODO to revisit this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added.
|
||
# Solution | ||
|
||
Given the all information above it should be relatively simple to write a module that given the access to GCP Api and Kubernetes API server. We will expand the NodeGroup interface (https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/cloud_provider.go#L40) with a method EstimateNodeShape, taking no parameters and returning NodeInfo (containing api.Node and all pods running by default on the node) or error if unable to do so. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the method name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
only if the current size of the node group is 0 or all of the nodes are unready/broken. Otherwise CA will try to estimate the shape of the node using live examples to avoid repeating any mis-estimation errors. | ||
|
||
The EstimateNodeShape will also be run on CA startup to ensure that CA is able to build an example for the node pool should the node group min size was set to 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the node group min size was set to 0 - that is not grammatical
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
e9c557d
to
c7457a4
Compare
|
||
# Design | ||
|
||
Problems P1, P1A, P1B, P1C, P2, P3 needs to be solved. The primary focus is to create a solution for GCE/GKE but the proposed option should be generic enough to allow to expand this feature to other cloud providers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/P1/1, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/primary focus/primary focus of this document
custom-<cpu_count>-<memory_in_mb> | ||
``` | ||
|
||
So it ia also quite easy to get all of the capacity information from it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/ia/is
c7457a4
to
7c6c675
Compare
/lgtm |
7c6c675
to
ff258b8
Compare
…-TestNodeGroupResize UPSTREAM: <carry>: openshift: Rework TestNodeGroupResize
Track usage in capacity status
Ref: #43
cc: @MaciekPytel