Find file
a6dcf86 Nov 3, 2017
@guineveresaenger @grodrigues3
257 lines (204 sloc) 10.9 KB

Warning! This document might be outdated.

Horizontal Pod Autoscaling


This document briefly describes the design of the horizontal autoscaler for pods. The autoscaler (implemented as a Kubernetes API resource and controller) is responsible for dynamically controlling the number of replicas of some collection (e.g. the pods of a ReplicationController) to meet some objective(s), for example a target per-pod CPU utilization.

This design supersedes


The resource usage of a serving application usually varies over time: sometimes the demand for the application rises, and sometimes it drops. In Kubernetes version 1.0, a user can only manually set the number of serving pods. Our aim is to provide a mechanism for the automatic adjustment of the number of pods based on CPU utilization statistics (a future version will allow autoscaling based on other resources/metrics).

Scale Subresource

In Kubernetes version 1.1, we are introducing Scale subresource and implementing horizontal autoscaling of pods based on it. Scale subresource is supported for replication controllers and deployments. Scale subresource is a Virtual Resource (does not correspond to an object stored in etcd). It is only present in the API as an interface that a controller (in this case the HorizontalPodAutoscaler) can use to dynamically scale the number of replicas controlled by some other API object (currently ReplicationController and Deployment) and to learn the current number of replicas. Scale is a subresource of the API object that it serves as the interface for. The Scale subresource is useful because whenever we introduce another type we want to autoscale, we just need to implement the Scale subresource for it. The wider discussion regarding Scale took place in issue #1629.

Scale subresource is in API for replication controller or deployment under the following paths:



It has the following structure:

// represents a scaling request for a resource.
type Scale struct {

	// defines the behavior of the scale.
	Spec ScaleSpec

	// current status of the scale.
	Status ScaleStatus

// describes the attributes of a scale subresource
type ScaleSpec struct {
	// desired number of instances for the scaled object.
	Replicas int `json:"replicas,omitempty"`

// represents the current status of a scale subresource.
type ScaleStatus struct {
	// actual number of observed instances of the scaled object.
	Replicas int `json:"replicas"`

	// label query over pods that should match the replicas count.
	Selector map[string]string `json:"selector,omitempty"`

Writing to ScaleSpec.Replicas resizes the replication controller/deployment associated with the given Scale subresource. ScaleStatus.Replicas reports how many pods are currently running in the replication controller/deployment, and ScaleStatus.Selector returns selector for the pods.

HorizontalPodAutoscaler Object

In Kubernetes version 1.1, we are introducing HorizontalPodAutoscaler object. It is accessible under:


It has the following structure:

// configuration of a horizontal pod autoscaler.
type HorizontalPodAutoscaler struct {

	// behavior of autoscaler.
	Spec HorizontalPodAutoscalerSpec

	// current information about the autoscaler.
	Status HorizontalPodAutoscalerStatus

// specification of a horizontal pod autoscaler.
type HorizontalPodAutoscalerSpec struct {
	// reference to Scale subresource; horizontal pod autoscaler will learn the current resource
	// consumption from its status,and will set the desired number of pods by modifying its spec.
	ScaleRef SubresourceReference
	// lower limit for the number of pods that can be set by the autoscaler, default 1.
	MinReplicas *int
	// upper limit for the number of pods that can be set by the autoscaler.
	// It cannot be smaller than MinReplicas.
	MaxReplicas int
	// target average CPU utilization (represented as a percentage of requested CPU) over all the pods;
	// if not specified it defaults to the target CPU utilization at 80% of the requested resources.
	CPUUtilization *CPUTargetUtilization

type CPUTargetUtilization struct {
	// fraction of the requested CPU that should be utilized/used,
	// e.g. 70 means that 70% of the requested CPU should be in use.
	TargetPercentage int

// current status of a horizontal pod autoscaler
type HorizontalPodAutoscalerStatus struct {
	// most recent generation observed by this autoscaler.
	ObservedGeneration *int64

	// last time the HorizontalPodAutoscaler scaled the number of pods;
	// used by the autoscaler to control how often the number of pods is changed.
	LastScaleTime *unversioned.Time

	// current number of replicas of pods managed by this autoscaler.
	CurrentReplicas int

	// desired number of replicas of pods managed by this autoscaler.
	DesiredReplicas int

	// current average CPU utilization over all pods, represented as a percentage of requested CPU,
	// e.g. 70 means that an average pod is using now 70% of its requested CPU.
	CurrentCPUUtilizationPercentage *int

ScaleRef is a reference to the Scale subresource. MinReplicas, MaxReplicas and CPUUtilization define autoscaler configuration. We are also introducing HorizontalPodAutoscalerList object to enable listing all autoscalers in a namespace:

// list of horizontal pod autoscaler objects.
type HorizontalPodAutoscalerList struct {

	// list of horizontal pod autoscaler objects.
	Items []HorizontalPodAutoscaler

Autoscaling Algorithm

The autoscaler is implemented as a control loop. It periodically queries pods described by Status.PodSelector of Scale subresource, and collects their CPU utilization. Then, it compares the arithmetic mean of the pods' CPU utilization with the target defined in Spec.CPUUtilization, and adjusts the replicas of the Scale if needed to match the target (preserving condition: MinReplicas <= Replicas <= MaxReplicas).

The period of the autoscaler is controlled by the --horizontal-pod-autoscaler-sync-period flag of controller manager. The default value is 30 seconds.

CPU utilization is the recent CPU usage of a pod (average across the last 1 minute) divided by the CPU requested by the pod. In Kubernetes version 1.1, CPU usage is taken directly from Heapster. In future, there will be API on master for this purpose (see issue #11951).

The target number of pods is calculated from the following formula:

TargetNumOfPods = ceil(sum(CurrentPodsCPUUtilization) / Target)

Starting and stopping pods may introduce noise to the metric (for instance, starting may temporarily increase CPU). So, after each action, the autoscaler should wait some time for reliable data. Scale-up can only happen if there was no rescaling within the last 3 minutes. Scale-down will wait for 5 minutes from the last rescaling. Moreover any scaling will only be made if: avg(CurrentPodsConsumption) / Target drops below 0.9 or increases above 1.1 (10% tolerance). Such approach has two benefits:

  • Autoscaler works in a conservative way. If new user load appears, it is important for us to rapidly increase the number of pods, so that user requests will not be rejected. Lowering the number of pods is not that urgent.

  • Autoscaler avoids thrashing, i.e.: prevents rapid execution of conflicting decision if the load is not stable.

Relative vs. absolute metrics

We chose values of the target metric to be relative (e.g. 90% of requested CPU resource) rather than absolute (e.g. 0.6 core) for the following reason. If we choose absolute metric, user will need to guarantee that the target is lower than the request. Otherwise, overloaded pods may not be able to consume more than the autoscaler's absolute target utilization, thereby preventing the autoscaler from seeing high enough utilization to trigger it to scale up. This may be especially troublesome when user changes requested resources for a pod because they would need to also change the autoscaler utilization threshold. Therefore, we decided to choose relative metric. For user, it is enough to set it to a value smaller than 100%, and further changes of requested resources will not invalidate it.

Support in kubectl

To make manipulation of HorizontalPodAutoscaler object simpler, we added support for creating/updating/deleting/listing of HorizontalPodAutoscaler to kubectl. In addition, in future, we are planning to add kubectl support for the following use-cases:

  • When creating a replication controller or deployment with kubectl create [-f], there should be a possibility to specify an additional autoscaler object. (This should work out-of-the-box when creation of autoscaler is supported by kubectl as we may include multiple objects in the same config file).
  • [future] When running an image with kubectl run, there should be an additional option to create an autoscaler for it.
  • [future] We will add a new command kubectl autoscale that will allow for easy creation of an autoscaler object for already existing replication controller/deployment.

Next steps

We list here some features that are not supported in Kubernetes version 1.1. However, we want to keep them in mind, as they will most probably be needed in the future. Our design is in general compatible with them.

  • [future] Autoscale pods based on metrics different than CPU (e.g. memory, network traffic, qps). This includes scaling based on a custom/application metric.
  • [future] Autoscale pods base on an aggregate metric. Autoscaler, instead of computing average for a target metric across pods, will use a single, external, metric (e.g. qps metric from load balancer). The metric will be aggregated while the target will remain per-pod (e.g. when observing 100 qps on load balancer while the target is 20 qps per pod, autoscaler will set the number of replicas to 5).
  • [future] Autoscale pods based on multiple metrics. If the target numbers of pods for different metrics are different, choose the largest target number of pods.
  • [future] Scale the number of pods starting from 0. All pods can be turned-off, and then turned-on when there is a demand for them. When a request to service with no pods arrives, kube-proxy will generate an event for autoscaler to create a new pod. Discussed in issue #3247.
  • [future] When scaling down, make more educated decision which pods to kill. E.g.: if two or more pods from the same replication controller are on the same node, kill one of them. Discussed in issue #4301.