Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 9 additions & 49 deletions machine_management/applying-autoscaling.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,12 @@
[id="applying-autoscaling"]
= Applying autoscaling to an {product-title} cluster
include::_attributes/common-attributes.adoc[]
:context: applying-autoscaling

:context: applying-autoscaling
toc::[]

Applying autoscaling to an {product-title} cluster involves deploying a cluster autoscaler and then deploying machine autoscalers for each machine type in your cluster.
[role="_abstract"]
Apply autoscaling to an {product-title} cluster to automatically adjust the size of the cluster to meet deployment needs. You can deploy a cluster autoscaler and then deploy machine autoscalers for each machine type in your cluster. After you configure the cluster autoscaler, you must configure at least one machine autoscaler.

[IMPORTANT]
====
Expand All @@ -15,73 +16,32 @@ You can configure the cluster autoscaler only in clusters where the Machine API

include::modules/cluster-autoscaler-about.adoc[leveloffset=+1]

[id="configuring-clusterautoscaler_{context}"]
=== Configuring the cluster autoscaler

First, deploy the cluster autoscaler to manage automatic resource scaling in your {product-title} cluster.

[NOTE]
====
Because the cluster autoscaler is scoped to the entire cluster, you can make only one cluster autoscaler for the cluster.
====

//Cluster autoscaler resource definition
include::modules/cluster-autoscaler-cr.adoc[leveloffset=+3]
include::modules/cluster-autoscaler-cr.adoc[leveloffset=+2]

//Configuring a priority expander for the cluster autoscaler
include::modules/cluster-autoscaler-config-priority-expander.adoc[leveloffset=+3]
include::modules/cluster-autoscaler-config-priority-expander.adoc[leveloffset=+2]

//Labeling GPU machine sets for the cluster autoscaler
include::modules/machineset-label-gpu-autoscaler.adoc[leveloffset=+3]
include::modules/machineset-label-gpu-autoscaler.adoc[leveloffset=+2]

:FeatureName: cluster autoscaler
:FeatureResourceName: ClusterAutoscaler
include::modules/deploying-resource.adoc[leveloffset=+2]

.Next steps
* After you configure the cluster autoscaler, you must xref:../machine_management/applying-autoscaling.adoc#configuring-machineautoscaler_applying-autoscaling[configure at least one machine autoscaler].

include::modules/machine-autoscaler-about.adoc[leveloffset=+1]

[id="configuring-machineautoscaler_{context}"]
=== Configuring machine autoscalers

After you deploy the cluster autoscaler, deploy `MachineAutoscaler` resources that reference the compute machine sets that are used to scale the cluster.

[IMPORTANT]
====
You must deploy at least one `MachineAutoscaler` resource after you deploy the `ClusterAutoscaler` resource.
====

[NOTE]
====
You must configure separate resources for each compute machine set. Remember that compute machine sets are different in each region, so consider whether you want to enable machine scaling in multiple regions. The compute machine set that you scale must have at least one machine in it.
====
include::modules/machine-autoscaler-configuring.adoc[leveloffset=+2]

include::modules/machine-autoscaler-cr.adoc[leveloffset=+3]

:FeatureName: machine autoscaler
:FeatureResourceName: MachineAutoscaler
include::modules/deploying-resource.adoc[leveloffset=+2]

[id="disabling-autoscaling_{context}"]
== Disabling autoscaling

You can disable an individual machine autoscaler in your cluster or disable autoscaling on the cluster entirely.
include::modules/deleting-machine-autoscaler.adoc[leveloffset=+1]

include::modules/deleting-machine-autoscaler.adoc[leveloffset=+2]

[role="_additional-resources"]
.Additional resources
* xref:../machine_management/applying-autoscaling.adoc#deleting-cluster-autoscaler_applying-autoscaling[Disabling the cluster autoscaler]
* xref:../machine_management/applying-autoscaling.adoc#MachineAutoscaler-deploying_applying-autoscaling[Deploying a machine autoscaler]

include::modules/deleting-cluster-autoscaler.adoc[leveloffset=+2]

[role="_additional-resources"]
.Additional resources
* xref:../machine_management/applying-autoscaling.adoc#deleting-machine-autoscaler_applying-autoscaling[Disabling the machine autoscaler]
* xref:../machine_management/applying-autoscaling.adoc#ClusterAutoscaler-deploying_applying-autoscaling[Deploying a cluster autoscaler]
include::modules/deleting-cluster-autoscaler.adoc[leveloffset=+1]

[role="_additional-resources"]
== Additional resources
Expand Down
1 change: 1 addition & 0 deletions modules/cluster-autoscaler-about.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
[id="cluster-autoscaler-about_{context}"]
= About the cluster autoscaler

[role="_abstract"]
The cluster autoscaler adjusts the size of an {product-title} cluster to meet its current deployment needs. It uses declarative, Kubernetes-style arguments to provide infrastructure management that does not rely on objects of a specific cloud provider. The cluster autoscaler has a cluster scope, and is not associated with a particular namespace.

The cluster autoscaler increases the size of the cluster when there are pods that fail to schedule on any of the current worker nodes due to insufficient resources or when another node is necessary to meet deployment needs. The cluster autoscaler does not increase the cluster resources beyond the limits that you specify.
Expand Down
20 changes: 7 additions & 13 deletions modules/cluster-autoscaler-config-priority-expander.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,9 @@
[id="cluster-autoscaler-config-priority-expander_{context}"]
= Configuring a priority expander for the cluster autoscaler

When the cluster autoscaler uses the priority expander, it scales up by using the machine set with the highest user-assigned priority.
To use this expander, you must create a config map that defines the priority of your machine sets.

For each specified priority level, you must create regular expressions to identify machine sets that you want to use when prioritizing a machine set for selection.
The regular expressions must match the name of any compute machine set that you want the cluster autoscaler to consider for selection.
[role="_abstract"]
Configure a priority expander to control which machine set expands when the cluster autoscaler increases the size of the cluster.
You can create a priority expander config map by listing priority values and regular expressions that define machine sets.

.Prerequisites

Expand Down Expand Up @@ -53,23 +51,19 @@ For example, use the regular expression pattern `\*fast*` to match any compute m
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-autoscaler-priority-expander # <1>
namespace: openshift-machine-api # <2>
name: cluster-autoscaler-priority-expander
namespace: openshift-machine-api
data:
priorities: |- # <3>
priorities: |-
10:
- .*fast.*
- .*archive.*
40:
- .*prod.*
----
<1> You must name config map `cluster-autoscaler-priority-expander`.
<2> You must create the config map in the same namespace as cluster autoscaler pod, which is the `openshift-machine-api` namespace.
<3> Define the priority of your machine sets.
+
Define the priority of your machine sets.
The `priorities` values must be positive integers.
The cluster autoscaler uses higher-value priorities before lower-value priorities.
+
For each priority level, specify the regular expressions that correspond to the machine sets you want to use.
--

Expand Down
134 changes: 87 additions & 47 deletions modules/cluster-autoscaler-cr.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
[id="cluster-autoscaler-cr_{context}"]
= Cluster autoscaler resource definition

[role="_abstract"]
This `ClusterAutoscaler` resource definition shows the parameters and sample values for the cluster autoscaler.

[NOTE]
Expand All @@ -21,80 +22,119 @@ kind: "ClusterAutoscaler"
metadata:
name: "default"
spec:
podPriorityThreshold: -10 # <1>
podPriorityThreshold: -10
resourceLimits:
maxNodesTotal: 24 # <2>
maxNodesTotal: 24
cores:
min: 8 # <3>
max: 128 # <4>
min: 8
max: 128
memory:
min: 4 # <5>
max: 256 # <6>
min: 4
max: 256
gpus:
- type: <gpu_type> # <7>
min: 0 # <8>
max: 16 # <9>
logVerbosity: 4 # <10>
scaleDown: # <11>
enabled: true # <12>
delayAfterAdd: 10m # <13>
delayAfterDelete: 5m # <14>
delayAfterFailure: 30s # <15>
unneededTime: 5m # <16>
utilizationThreshold: "0.4" # <17>
expanders: ["Random"] # <18>
- type: <gpu_type>
min: 0
max: 16
logVerbosity: 4
scaleDown:
enabled: true
delayAfterAdd: 10m
delayAfterDelete: 5m
delayAfterFailure: 30s
unneededTime: 5m
utilizationThreshold: "0.4"
expanders: ["Random"]
----
<1> Specify the priority that a pod must exceed to cause the cluster autoscaler to deploy additional nodes. Enter a 32-bit integer value. The `podPriorityThreshold` value is compared to the value of the `PriorityClass` that you assign to each pod.
<2> Specify the maximum number of nodes to deploy. This value is the total number of machines that are deployed in your cluster, not just the ones that the autoscaler controls. Ensure that this value is large enough to account for all of your control plane and compute machines and the total number of replicas that you specify in your `MachineAutoscaler` resources.
<3> Specify the minimum number of cores to deploy in the cluster.
<4> Specify the maximum number of cores to deploy in the cluster.
<5> Specify the minimum amount of memory, in GiB, in the cluster.
<6> Specify the maximum amount of memory, in GiB, in the cluster.
<7> Optional: To configure the cluster autoscaler to deploy GPU-enabled nodes, specify a `type` value.

.Cluster autoscaler parameters
[cols="1,3",options="header"]
|===
|Parameter |Description

|`podPriorityThreshold`
|Specify the priority that a pod must exceed to cause the cluster autoscaler to deploy additional nodes. Enter a 32-bit integer value. The `podPriorityThreshold` value is compared to the value of the `PriorityClass` that you assign to each pod.

|`maxNodesTotal`
|Specify the maximum number of nodes to deploy. This value is the total number of machines that are deployed in your cluster, not just the ones that the autoscaler controls. Ensure that this value is large enough to account for all of your control plane and compute machines and the total number of replicas that you specify in your `MachineAutoscaler` resources.

|`cores.min`
|Specify the minimum number of cores to deploy in the cluster.

|`cores.max`
|Specify the maximum number of cores to deploy in the cluster.

|`memory.min`
|Specify the minimum amount of memory, in GiB, in the cluster.

|`memory.max`
|Specify the maximum amount of memory, in GiB, in the cluster.

|`gpus.type`
|Optional: To configure the cluster autoscaler to deploy GPU-enabled nodes, specify a `type` value.
This value must match the value of the `spec.template.spec.metadata.labels[cluster-api/accelerator]` label in the machine set that manages the GPU-enabled nodes of that type.
For example, this value might be `nvidia-t4` to represent Nvidia T4 GPUs, or `nvidia-a10g` for A10G GPUs.
For more information, see "Labeling GPU machine sets for the cluster autoscaler".
<8> Specify the minimum number of GPUs of the specified type to deploy in the cluster.
<9> Specify the maximum number of GPUs of the specified type to deploy in the cluster.
<10> Specify the logging verbosity level between `0` and `10`. The following log level thresholds are provided for guidance:
+
--

|`gpus.min`
|Specify the minimum number of GPUs of the specified type to deploy in the cluster.

|`gpus.max`
|Specify the maximum number of GPUs of the specified type to deploy in the cluster.

|`logVerbosity`
a|Specify the logging verbosity level between `0` and `10`. The following log level thresholds are provided for guidance:

* `1`: (Default) Basic information about changes.
* `4`: Debug-level verbosity for troubleshooting typical issues.
* `9`: Extensive, protocol-level debugging information.
--
+

If you do not specify a value, the default value of `1` is used.
<11> In this section, you can specify the period to wait for each action by using any valid link:https://golang.org/pkg/time/#ParseDuration[ParseDuration] interval, including `ns`, `us`, `ms`, `s`, `m`, and `h`.
<12> Specify whether the cluster autoscaler can remove unnecessary nodes.
<13> Optional: Specify the period to wait before deleting a node after a node has recently been _added_. If you do not specify a value, the default value of `10m` is used.
<14> Optional: Specify the period to wait before deleting a node after a node has recently been _deleted_. If you do not specify a value, the default value of `0s` is used.
<15> Optional: Specify the period to wait before deleting a node after a scale down failure occurred. If you do not specify a value, the default value of `3m` is used.
<16> Optional: Specify a period of time before an unnecessary node is eligible for deletion. If you do not specify a value, the default value of `10m` is used.
<17> Optional: Specify the _node utilization level_. Nodes below this utilization level are eligible for deletion.
+

|`scaleDown`
|In this section, you can specify the period to wait for each action by using any valid link:https://golang.org/pkg/time/#ParseDuration[ParseDuration] interval, including `ns`, `us`, `ms`, `s`, `m`, and `h`.

|`scaleDown.enabled`
|Specify whether the cluster autoscaler can remove unnecessary nodes.

|`scaleDown.delayAfterAdd`
|Optional: Specify the period to wait before deleting a node after a node has recently been _added_. If you do not specify a value, the default value of `10m` is used.

|`scaleDown.delayAfterDelete`
|Optional: Specify the period to wait before deleting a node after a node has recently been _deleted_. If you do not specify a value, the default value of `0s` is used.

|`scaleDown.delayAfterFailure`
|Optional: Specify the period to wait before deleting a node after a scale down failure occurred. If you do not specify a value, the default value of `3m` is used.

|`scaleDown.unneededTime`
|Optional: Specify a period of time before an unnecessary node is eligible for deletion. If you do not specify a value, the default value of `10m` is used.

|`scaleDown.utilizationThreshold`
a|Optional: Specify the _node utilization level_. Nodes below this utilization level are eligible for deletion.

The node utilization level is the sum of the requested resources divided by the allocated resources for the node, and must be a value greater than `"0"` but less than `"1"`. If you do not specify a value, the cluster autoscaler uses a default value of `"0.5"`, which corresponds to 50% utilization. You must express this value as a string.
<18> Optional: Specify any expanders that you want the cluster autoscaler to use.

|`expanders`
a|Optional: Specify any expanders that you want the cluster autoscaler to use.
The following values are valid:
+
--

* `LeastWaste`: Selects the machine set that minimizes the idle CPU after scaling.
If multiple machine sets would yield the same amount of idle CPU, the selection minimizes unused memory.
* `Priority`: Selects the machine set with the highest user-assigned priority.
To use this expander, you must create a config map that defines the priority of your machine sets.
For more information, see "Configuring a priority expander for the cluster autoscaler."
* `Random`: (Default) Selects the machine set randomly.
--
+

If you do not specify a value, the default value of `Random` is used.
+

You can specify multiple expanders by using the `[LeastWaste, Priority]` format.
The cluster autoscaler applies each expander according to the specified order.
+

In the `[LeastWaste, Priority]` example, the cluster autoscaler first evaluates according to the `LeastWaste` criteria.
If more than one machine set satisfies the `LeastWaste` criteria equally well, the cluster autoscaler then evaluates according to the `Priority` criteria.
If more than one machine set satisfies all of the specified expanders equally well, the cluster autoscaler selects one to use at random.

|===

[NOTE]
====
When performing a scaling operation, the cluster autoscaler remains within the ranges set in the `ClusterAutoscaler` resource definition, such as the minimum and maximum number of cores to deploy or the amount of memory in the cluster. However, the cluster autoscaler does not correct the current values in your cluster to be within those ranges.
Expand Down
10 changes: 6 additions & 4 deletions modules/deleting-cluster-autoscaler.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="deleting-cluster-autoscaler_{context}"]
= Disabling the cluster autoscaler

[role="_abstract"]
To disable the cluster autoscaler, you delete the corresponding `ClusterAutoscaler` resource.

[NOTE]
Expand Down Expand Up @@ -33,11 +34,12 @@ default 42m
+
[source,terminal]
----
$ oc get ClusterAutoscaler/default \//<1>
-o yaml> <cluster_autoscaler_backup_name>.yaml //<2>
$ oc get ClusterAutoscaler/default \
-o yaml> <cluster_autoscaler_backup_name>.yaml
----
<1> `default` is the name of the `ClusterAutoscaler` CR.
<2> `<cluster_autoscaler_backup_name>` is the name for the backup of the CR.
where:

<cluster_autoscaler_backup_name>:: Specifies the file name in which to store the backup.

. Delete the `ClusterAutoscaler` CR by running the following command:
+
Expand Down
10 changes: 6 additions & 4 deletions modules/deleting-machine-autoscaler.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[id="deleting-machine-autoscaler_{context}"]
= Disabling a machine autoscaler

[role="_abstract"]
To disable a machine autoscaler, you delete the corresponding `MachineAutoscaler` custom resource (CR).

[NOTE]
Expand Down Expand Up @@ -34,12 +35,13 @@ compute-us-west-1a MachineSet compute-us-west-1a 2 4 37m
+
[source,terminal]
----
$ oc get MachineAutoscaler/<machine_autoscaler_name> \//<1>
$ oc get MachineAutoscaler/<machine_autoscaler_name> \
-n openshift-machine-api \
-o yaml> <machine_autoscaler_name_backup>.yaml //<2>
-o yaml> <machine_autoscaler_name_backup>.yaml
----
<1> `<machine_autoscaler_name>` is the name of the CR that you want to delete.
<2> `<machine_autoscaler_name_backup>` is the name for the backup of the CR.
where:

<machine_autoscaler_name_backup>:: Specifies the file name in which to store the backup.

. Delete the `MachineAutoscaler` CR by running the following command:
+
Expand Down
7 changes: 5 additions & 2 deletions modules/deploying-resource.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
[id="{FeatureResourceName}-deploying_{context}"]
= Deploying a {FeatureName}

[role="_abstract"]
To deploy a {FeatureName}, you create an instance of the `{FeatureResourceName}` resource.

.Procedure
Expand All @@ -22,9 +23,11 @@ To deploy a {FeatureName}, you create an instance of the `{FeatureResourceName}`
+
[source,terminal]
----
$ oc create -f <filename>.yaml <1>
$ oc create -f <filename>.yaml
----
<1> `<filename>` is the name of the custom resource file.
where:

<filename>:: Specifies the name of the YAML file you created.

// Undefine attributes, so that any mistakes are easily spotted
:!FeatureName:
Expand Down
Loading