Skip to content
Bharath Raghavendra Reddy Guvvala edited this page Jan 17, 2024 · 35 revisions

Welcome to the ottoscalr wiki!

Ottoscalr

Kubernetes offers the capability to autoscale workloads horizontally via the Horizontal Pod Autoscaler(HPA) . With horizontal auto scaling workloads can scale down when there is less load and scale out when there's increased demand for compute. Working alongside Cluster Autoscaler in public clouds, this will lead to direct cost savings by reducing the cluster size. In private data centers, this will free up compute resources which can be used by preemptible workloads via k8s scheduling priorities. While HPA is offered out of the box for workload owners to integrate, tuning the right auto scaling configuration for every workload can be time consuming and error prone. Ottoscalr aims to be a drop-in framework that can continuously fine tune the autoscaling policy for every workload in the cluster.

Features

Ottoscalr offers a bunch of features to ensure that the workloads are configured with the right HPA configuration that'll ensure optimal compute savings and readiness to handle surge of compute demand based on the historical usage trends.

  • Autoscalers: OttoScalr doesn't autoscale the workloads by itself, but works by creating/managing the HPAs and KEDA ScaledObjects which influences how and when workloads are scaled.
  • Controllers: Ottoscalr is made up of a bunch of controllers that perform a variety of tasks in ensuring that the workloads are configured with the right HPA policy at all times. It's primarily made up of two controller groups -- for generation of recommendations and for HPA enforcement.
  • Workloads: Support for stateless workloads of kinds -- Deployments and Argo Rollouts (optional, can be toggled during the deployment)
  • Pluggable Recommenders: Ottoscalr provides an extensible framework for pluggable recommenders which will generate recommendations of autoscaler configurations which are then enforced on the workload.
  • Graded Policies: Since there's no one size fits autoscaling policy for a workload. Ottoscalr works with a set of graded policies and takes workload through these policies till the ideal policy recommended by the recommender is applied.
  • Integration with promql compliant metric sources: Works with any promql compliant metrics source for gathering historical workload resource utilization metrics.

Concepts

Autoscaling cycle lag (ACL)

We define "Autoscaling cycle lag (ACL)" as a variable that represents the time taken for a workload to respond to and absorb an autoscaling event such as an increase to CPU utilization beyond the threshold.

acl

The above diagram depicts the typical execution for an autoscaling cycle. As we can see there is a cycle which needs to be executed before the new pods are ready and able to serve traffic. The overall time taken for the cycle is named as Autoscaling Cycle Lag(ACL). Some of the components of the ACL formula are cluster specific (metrics server poll interval, HPA controller poll interval, controller SLOs, scheduling latency etc ) and some are workload specific (pod bootstrap latency, image size). The lower the ACL the sooner the workload can scale out/down in response to the autoscaling events viz resource utilization spikes.

ACL is an important factor that is considered when generating policy recommendations for workloads.

Workload slack

The difference between the workload resource limits and the actual resource usage.

Autoscaling savings

The savings obtained by autoscaling a workload based on a resource utilization metric. The savings at any point is Compute resources consumed by a workload without autoscaling enabled - Compute resources consumed by a workload while being autoscaled.

The objective of Ottoscalr is to increase the autoscaling savings for a workload while minimizing the slack. The following diagram visualizes the HPA simulation, savings and slack for a sample workload.

hpareco

Policy Recommendations

Ottoscalr offers a bunch of capabilities. In its simplest form (without the HPA enforcement enabled) it recommends a HPA configuration (max, min, utilization threshold etc) based on the CPU utilization history (this time period is configurable) of the workloads. These recommendations are captured as a PolicyRecommendation custom resource. PolicyRecommendations are created with the same name as that of the workloads they correspond to and are namespace scoped. Following command helps to list the policy recommendations in a namespace

kubectl get policyrecommendations.ottoscaler.io -n <namespace>

Policy recommendations are refreshed based on multiple triggers to account for the new data, changes to application profile, breaches to application SLOs etc. These are following triggers that refresh the policy recommendation of a workloads

  • Periodic trigger Ensures that the recommendations are refreshed after a time interval (configured as a platform wide parameter)
  • Workload Updates Refreshes after changes to any of workload fields. This ensures that any new deployment of the workload which can impact the pod bootstrap time etc and thus impact the ACL are taken into account
  • Breaches Breaches to application redline utilization threshold (configurable as a platform wide parameter).

Recommendation framework

Ottoscalr is also a framework for recommenders that can be plugged in and customized to suit new requirements or adopt ML based approaches. We'll go over the details of the framework in the Recommenders page.

HPA configuration

HPA configuration is the spec of the HPA/ScaledObject resource which influence how and when a workload is scaled. HPA configuration has multiple fields some of the key being defined by the following parameters -- min, max replica count and the resource utilization threshold. Ottoscalr derives and recommends the HPA config after analyzing historical resource consumption data.

HPA Enforcement

As mentioned in the "Policy Recommendation" section, a policy recommendation is just that -- a recommendation or an ideal HPA configuration based on certain history and characteristics specific to the workload. For ottoscalr to work, the HPA still needs to be created based on the recommendation. We call this HPA enforcement. Now, either the policy recommendation can be enforced as it is, or the workload can be taken through the stages of HPA policies varying from the most safe/conservative (no-op policy where no autoscaling is enabled) to the most risky possible (ideal policy as recommended) policies. In Ottoscalr, it is done with the help of graded policies.

Graded Policies

Graded policies are a set of policies, which are configured by the cluster administrator to take the workloads through a set of incremental HPA configurations (ranked by the risk index) before landing on the target HPA configuration. This means that the HPA configuration of a workload will be oscillating between these policies and the target policy recommendation. Ottoscalr's policy state machine will ensure that depending on various factors contributing to the performance of a HPA configuration over a workload, a policy is chosen to be applied. Each policy has a risk index associated which is a numerical value assigned to compare the policies and order them based on risk. The following state machine diagram illustrates how a workload goes through different policies , where Pi < Pi+1 in terms of risk index.

policystatemachine

Aging

Aging in Ottoscalr ensures that the recommended policy is refreshed at periodic intervals to account for the recent utilization data. It also ensures that the current enforced policy is advanced closer towards the target recommended policy.

Breaches

Breach detection mechanism in Ottoscalr ensures that breaches to the application SLOs are detected (currently based on a redline utilization threshold) and the enforced policy is downgraded to a conservative one to let the breach subside. If necessary the enforced policy will be downgraded to a no-op (where no autoscaling happens).

HPA Enforcer

HPA enforcer ensures that the PolicyRecommendations generated for a workload are translated into HPA resources and applied. There are also controls in place to whitelist/blacklist workloads to be onboarded to HPA enforcer. This enables fine grained onboarding plan without onboarding all the workloads in a single go. The details for these configurable parameters are found in the installation and configuration guide.