Increase Instance Utility

The intent of auto scaling is to improve instance utility and service availability. Rather than provision for peak needs, AWS has the ability to dynamically adjust the auto scaling group (ASG) size based on demand. The goal of this documentation is to help educate and provide the basic configuration for dynamic auto scaling.

Below is an example ASG being statically provisioned to meet peak demand. Throughout the day/week, regardless of demand, the size is set to a fixed value. Note: the Min, Desired, and Max instances all have the same value.

Let's assume load average is a strong signal for instance utility. From the graphs below, assuming at least dual core instance types, the ASG is under utilized. For example, instances with 2 cores, to be fully utilized, should have a load average in the range: 2-4.

Limiting resource

In general, to increase utilization, the first step is to identify the limiting system/application resource. This is usually accomplished with a load (squeeze) test. Once the limiting resource is identified, the goal is to size the ASG with enough capacity, plus some additional headroom, to meet demand with respect to the resource. With dynamic auto scaling, throughout the period (minute, hour, day, week), the ASG is constantly adjusting size according to the resource. For example, the graph below is the same ASG using load average to auto scale.

Note, the load average range is between 1 and 3. Also, the duration of time near a load average of 3 is much long. This configuration has a much higher instance utility. In an optimal configuration, there would be low variance throughout the day. The above example was using m1.large (2 cores), scaling down by 10% when load average dropped below 1 and scaling up by 10% when load average exceeded 3. The setup also had the min set to 9 instances, the reason for the large variance for a portion of the day. To fill the gap (variance), with more aggressive scaling down, set the min to a lower number.

Application metrics for scaling

To further improve instance utility, consider auto scaling using an application metric. The previous example used system load average, a measure of CPU queue length. The problem with a single measurement for scaling is the assumption there is a single limiting resource. A measurement that takes into consideration multiple limiting resources is probably more accurate. For example, requests-per-second (RPS), can be used to measure the average number of requests per instance without impacting quality of service (QoS). The challenge is to find a strong correlation between RPS and QoS. Below are two graphs showing a strong correlation between RPS and QoS. The first graph is RPS, per instance, followed by aggregate system queuing, a proxy for QoS.

queueSize

We can use this information to configure a much more specific scaling policy. From the graphs, queuing increases when the application exceeds 25 RPS. To avoid queuing, and improve the QoS, we define an auto scaling policy as follows. If RPS exceeds 20, scale by 10%. If RPS falls below 10 scale down by 10%. Below are the graphs: total RPS, RPS per instance, total queuing, with dynamic auto scaling.

Queuing still occurs, but is mitigated by the auto scaling policy. Note, in the morning, queuing occurs more frequently as the farm scales up to meet initial demand.

A Netflix Original Production
Tech Blog | Twitter @NetflixOSS | Jobs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase Instance Utility

Increase Instance Utility

Limiting resource

Application metrics for scaling

Clone this wiki locally