## Scaling

- Scaling can be done in two different ways
  - **Vertical Scaling**.
  - **Horizontal Scaling**.

- **Vertical Scaling** implies adding more CPU, Memory and/or Storage to a physical/virtual machine or a Pod (the Pod must be restarted).
  - **Scaling up**: adding more CPU, Memory and/or Storage.
  - **Scaling down**: removing CPU, Memory and/or Storage.

- **Horizontal Scaling** implies adding more physical/virtual machines or Pods.
  - **Scaling out**: adding more physical/virtual machines or Pods.
  - **Scaling in**: removing physical/virtual machines or Pods.

Virtual Machines and Pods can be scaled **manually** or **automatically**.

## Scaling Nodes Vertically

- Assume we have one **Node** running some **Pods**, each with one **Container**.

  <img src="../notebook_images/v_node_scaling_before.png" alt="Before Scaling" width="280" height="260" style="margin-bottom:2em">

- We can **scale up a Node** by **increasing** its **CPU and/or Memory**.

  <img src="../notebook_images/v_node_scaling_up.png" alt="Scaling Out" width="700" height="300" style="margin-bottom:2em">

- We can **scale down a Node** by **decreasing** its **CPU and/or Memory**.

  <img src="../notebook_images/v_node_scaling_down.png" alt="Scaling In" width="700" height="300" style="margin-bottom:2em">


- The **Cluster Autoscaler** can **automatically scale down/up Nodes**. 
  - The **Cluster Autoscaler** uses the K8s **Metrics Server** to gather **Pod utilization statistics**.
  - We won’t use the **Cluster Autoscaler** in this course.
  - Instructions for installing and using the **Cluster Autoscaler** can be found here:
    - https://github.com/kubernetes/autoscaler
    - https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler

## Scaling Nodes Horizontally

- Assume we have two **Nodes**, each running a number of **Pods**.

  <img src="../notebook_images/h_node_scaling_before.png" alt="Before Scaling" width="450" height="250" style="margin-bottom:2em">

- We can **scale out Nodes** by **adding more Nodes** to the cluster.

  <img src="../notebook_images/h_node_scaling_out.png" alt="Scaling Out" width="700" height="250" style="margin-bottom:2em">

- We can **scale in Nodes** by **removing Nodes** from the cluster.

  <img src="../notebook_images/h_node_scaling_in.png" alt="Scaling In" width="220" height="250" style="margin-bottom:2em">


- The **Cluster Autoscaler** can **automatically scale in/out Nodes**. 
  - The **Cluster Autoscaler** uses the K8s **Metrics Server** to gather **Pod utilization statistics**.
  - We won’t use the **Cluster Autoscaler** in this course.
  - Instructions for installing and using the **Cluster Autoscaler** can be found here:
    - https://github.com/kubernetes/autoscaler
    - https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler

## Scaling Pods Vertically

- Assume we have some **Pods**, each with one **Container**, running on one **Node**.
  - The **node** has 100 millicores (`cpu: 100m`) and 128 mebibytes (`memory: 128Mi`).
  - The Pod's **container** has 10 millicores (`cpu: 10m`) and 10 mebibytes (`memory: 10Mi`).
  - The quotas of `cpu` and `memory` resources relate to the machine they are running on.

  <img src="../notebook_images/v_pod_scaling_before.png" alt="Before Scaling" width="280" height="260" style="margin-bottom:2em">

- We can **scale up** a **Container** (**Pod**) by **increasing its CPU and/or Memory**.
  - Here we are adding more cpu and memory to the right Pod's container (`cpu: 15m` and `memory: 15Mi`).

  <img src="../notebook_images/v_pod_scaling_up.png" alt="Scaling Up" width="700" height="300" style="margin-bottom:2em">

- We can **scale down** a **Container** (**Pod**) by **decreasing its CPU and/or Memory**.
  - Here we are removing some cpu and memory from the right Pod's container (`cpu: 5m` and `memory: 5Mi`).

  <img src="../notebook_images/v_pod_scaling_down.png" alt="Scaling Down" width="700" height="300" style="margin-bottom:2em">

- The **Vertical Pod Autoscaler (VPA)** can **automatically scale up/down Pods**.
  - The **VPA** uses the K8s **Metrics Server** to gather **Pod utilization statistics**.
  - The **VPA** is currently in beta, and we won’t use it in this course.
  - Instructions for installing and using the **VPA** can be found here:
    - https://github.com/kubernetes/autoscaler
    - https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler

## Scaling Pods Horizontally

- Assume we have three **Nodes**, each running a number of **Pods**.

  <img src="../notebook_images/h_pod_scaling_before.png" alt="Before Scaling" width="700" height="250" style="margin-bottom:2em">

- We can **scale out Pods** by **adding more Pods**.

  <img src="../notebook_images/h_pod_scaling_out.png" alt="Scaling Out" width="700" height="250" style="margin-bottom:2em">

- We can **scale in Pods** by **removing Pods**.

  <img src="../notebook_images/h_pod_scaling_in.png" alt="Scaling In" width="700" height="250" style="margin-bottom:2em">

## Horizontal Pod Autoscaler (HPA)

- The **Horizontal Pod Autoscaler (HPA)** can **automatically scale out/in Pods**.
  - The **HPA** uses the K8s **Metrics Server** to gather **Pod utilization statitics**.
  - The **HPA** checks the **Metrics Server every 30 seconds**.

- **Pods** must have <span style="color:#DDDD00;font-weight:bold">requests</span> and <span style="color:#DDDD00;font-weight:bold">limits</span> defined for its **resources**.

- The **HPA autoscales** according to:
  - The <span style="color:#6688FF;font-weight:bold">minReplicas</span> and <span style="color:#6688FF;font-weight:bold">maxReplicas</span> defined.
  - By <span style="color:#7030A0;font-weight:bold">monitoring a specific target (type of resource, e.g. CPU or memory) for a specific condition</span>.

- The **HPA** is connected to a **target** via its **scaleTargetRef** map.
  - Where the properties below must match the resource to monitor statistics for:
    - The <span style="color:#00C800;font-weight:bold">apiVersion</span>.
    - The <span style="color:#00C800;font-weight:bold">kind</span>.
    - The <span style="color:#00C800;font-weight:bold">name</span>.

- The **HPA** has a Cooldown/Delay to prevent racing conditions:
  - Once a **change is made**, the **HPA waits (delays) for a while**.
  - By default, the **delay on scale out events is 3 minutes**.
  - By default, the **delay on scale in events is 5 minutes**.

<img src="../notebook_images/hpa_yaml.png" alt="HPA YAML" width="1000" height="550" style="margin-bottom:2em">

## `kubectl` commands for the HPA


| Command                                                                           | Description                         |
| :-------------------------------------------------------------------------------- | :---------------------------------- |
| `kubectl autoscale deployment [DeploymentName] --cpu-percent=50 --min=3 --max=10` | Create HPA (CPU 50%, min 3, max 10) |
| `kubectl apply -f [HPAYAMLFile]`                                                  | Apply HPA (create/update HPA)       |
| `kubectl get hpa [HPAName]`                                                       | Get HPA Status                      |
| `kubectl delete -f [HPAYAMLFile]`                                                 | Delete HPA                          |
| `kubectl delete hpa [HPAName]`                                                    | Delete HPA                          |

## Ensure a Kubenetes cluster is running

- Use any Kubernetes cluster.
  - In this example, a Minikube cluster with 3 nodes is used: `minikube start --nodes 3`.

## Enable the `metrics-server` Minikube addon (if using Minikube)

- The Horizontal Pod Autoscaler (HPA) uses the metrics collected by the `metrics-server`.

In [1]:
!minikube addons enable metrics-server
!minikube addons list

💡  metrics-server is an addon maintained by Kubernetes. For any concerns contact minikube on GitHub.
You can view the list of minikube maintainers at: https://github.com/kubernetes/minikube/blob/master/OWNERS
    ▪ Using image registry.k8s.io/metrics-server/metrics-server:v0.6.4
🌟  The 'metrics-server' addon is enabled
|-----------------------------|----------|--------------|--------------------------------|
|         ADDON NAME          | PROFILE  |    STATUS    |           MAINTAINER           |
|-----------------------------|----------|--------------|--------------------------------|
| ambassador                  | minikube | disabled     | 3rd party (Ambassador)         |
| auto-pause                  | minikube | disabled     | minikube                       |
| cloud-spanner               | minikube | disabled     | Google                         |
| csi-hostpath-driver         | minikube | disabled     | Kubernetes                     |
| dashboard                   | minikube |

## Install the Metrics Server (if not using Minikube)

- To check if the Metrics Server is installed in your cluster, look for a pod called `metrics-server` in the `kube-system` namespace
  -  `kubectl get po -n kube-system`
- To install the Metrics Server
  - Download the YAML file `components.yaml` from `https://github.com/kubernetes-sigs/metrics-server/releases`
  - Edit the `components.yaml` file and add the parameter `- --kubelet-insecure-tls` to the `Deployment` section.
    - This enables the Metrics Server to run if TLS is not configured.
  - Install the Metrics Server with `kubectl apply -f components.yaml`.

```bash
# components.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
spec:
  selector:
    matchLabels:
      k8s-app: metrics-server
  strategy:
    rollingUpdate:
      maxUnavailable: 0
  template:
    metadata:
      labels:
        k8s-app: metrics-server
    spec:
      containers:
      - args:
        - --cert-dir=/tmp
        - --secure-port=443
        - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
        - --kubelet-use-node-status-port
        - --kubelet-insecure-tls # <-------------------------------------------- Add this line
        - --metric-resolution=15s
        image: k8s.gcr.io/metrics-server/metrics-server:v0.5.0
```

In [None]:
!kubectl apply -f manifests/components.yaml

## Create a Deployment called `nginx-deployment`

- The Deployment definitions is in the YAML file `manifests/nginx-deployment.yaml`
  - The Deployment's name is `nginx-deployment`.
  - It has `replicas` set to `1`.
  - It's Pod template uses Docker image `nginx:alpine`.
  - The name of it's container is `nginx`.
  - It's container listens on `containerPort` `80`.
  - It has it's `resources` defined as below:

    ```bash
    resources:           # The container's resources are defined below
      
      requests:          # The minimum resourcs the container is requesting is:
        cpu: 200m        # A minimum of 200 millicores (CPU)
        memory: 126Mi    # A minimum of 126 mebibytes (RAM)
      
      limits:            # The maximum resourcs the container is requesting to burst to if needed is:
        cpu: 500m        # A minimum of 500 millicores (CPU)
        memory: 256Mi    # A minimum of 256 mebibytes (RAM)
    ```

In [1]:
!kubectl apply -f manifests/nginx-deployment.yaml

deployment.apps/nginx-deployment created


## Create a Service called `nginx-service`

- The Service definitions is in the YAML file `manifests/nginx-service.yaml`
  - The Service's name is `nginx-service`.
  - It listens on `port` `80`.
  - It redirects traffic to `targetPort` `80`.
  - It's selector selects the Pod's created by the Deployment's Pod template.

In [2]:
!kubectl apply -f manifests/nginx-service.yaml

service/nginx-service created


## List Deployments, ReplicaSets, Pods and Services

- We see the Deployment with one associated ReplicaSet and Pod, and the Service have been created.
  - Notice that there is only 1 pod.

In [3]:
#!kubectl get deployments -o wide
#!kubectl get replicasets -o wide
#!kubectl get pods -o wide
#!kubectl get services -o wide
!kubectl get all -o wide

NAME                                    READY   STATUS    RESTARTS   AGE   IP            NODE           NOMINATED NODE   READINESS GATES
pod/nginx-deployment-86c598b675-2smfc   1/1     Running   0          16s   10.244.1.12   minikube-m02   <none>           <none>

NAME                    TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE    SELECTOR
service/kubernetes      ClusterIP   10.96.0.1        <none>        443/TCP   107m   <none>
service/nginx-service   ClusterIP   10.109.215.212   <none>        80/TCP    12s    app=nginx

NAME                               READY   UP-TO-DATE   AVAILABLE   AGE   CONTAINERS   IMAGES         SELECTOR
deployment.apps/nginx-deployment   1/1     1            1           16s   nginx        nginx:alpine   app=nginx

NAME                                          DESIRED   CURRENT   READY   AGE   CONTAINERS   IMAGES         SELECTOR
replicaset.apps/nginx-deployment-86c598b675   1         1         1       16s   nginx        nginx:alpine   app=ng

## Create HPA with autoscaling limits for Deployment `nginx-deployment`

- The Horizontal Pod Autoscaler (HPA) is defined in the file `manifests/hpa-cpu.yaml`.
  - The equivalent imperative command to the declarative YAML file is:

    ```bash
    kubectl autoscale deployment nginx-deployment --name hpa-cpu --cpu-percent=5 --min=2 --max=4
    ```
  - The HPA's name is `hpa-cpu`.
  - It sets `minReplicas` to `2`.
  - It sets `maxReplicas` to `10`.
  - It's `scaleTargetRef` targets the Deployment `nginx-deployment`.
  - It monitors `metrics` for the `cpu` `resource` `type`.
    -  It's `target` is of `type` `Utilization` with `averageUtilization` set to `5`.
  - This means the HPA will autoscale `nginx-deployment` Deployment's Pod `replicas` based on:
    - The average CPU utilization across all Pod `replicas` reaching the threshold value `5%`.
      - If average CPU utilization goes above `5%`, the HPA will add more Pods.
      - If average CPU utilization goes below `5%`, the HPA will remove Pods.

```bash
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: hpa-cpu
spec:
  
  minReplicas: 2                 # When autoscaling in, don't go below 2 Pods
  maxReplicas: 10                # When autoscaling out, don't go over 10 Pods
  
  scaleTargetRef:                # The settings below determine what resouce to monitor and autoscale
    apiVersion: apps/v1            # This matches the "apiVersion" set in the Deployment's YAML definition
    kind: Deployment               # This matches the "kind" set in the Deployment's YAML definition
    name: nginx-deployment         # This matches the "name" set in the Deployment's YAML definition
  
  metrics:                       # The settings below determine what metrics to monitor (it's a list, only one item in this case)
  - type: Resource                 # The type of metric to monitor is a "Resource"
    resource:                      # The properties for the "resource" are specified below
      name: cpu                      # The resource monitored is "cpu"
      target:                        # The target for the monitored resource is specified below
        type: Utilization              # The target type is "Utilization"
        averageUtilization: 2          # The threshold for the target is an "averageUtilization" of "2" percent (5%)
```

In [4]:
!kubectl apply -f manifests/hpa-cpu.yaml

horizontalpodautoscaler.autoscaling/hpa-cpu created


## List Horizontal Pod Autoscalers (HPAs)

- We see that:
  - The `hpa-cpu` HPA references the `nginx-deployment` Deployment.
  - It's `TARGETS` show the current value (`0%`) and the threshold value (`2%`) of the target metric.
  - It's `MINPODS` is set to `2`.
  - It's `MAXPODS` is set to `10`.
  - The current number of Pod `REPLICAS` it has autoscaled to is `2`.

You can also run the command `kubectl get hpa -o wide --watch` in a separate terminal to see the HPA's alues changing.

In [6]:
!kubectl get hpa -o wide

NAME      REFERENCE                     TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
hpa-cpu   Deployment/nginx-deployment   0%/2%     2         10        2          79s


## Deploy a Pod called `busybox-pod`

- The Pod's definition is in the YAML file `manifests/busybox-pod.yaml`.
  - It name is `busybox-pod`.
  - We will remote into this Pod, and use it to apply a load on the `nginx` containers.
    - This will increase the average CPU utilization of the Pod replicas, causing the HPA to autoscale the Pod replicas.

In [7]:
!kubectl apply -f manifests/busybox-pod.yaml

pod/busybox-pod created


## Connect to the BusyBox Pod's container and increase load on Nginx

- Run these commands in a terminal (won't work from a notebook cell).

```bash

# Open an interactive session to the BusyBox Pod's container
kubectl exec busybox-pod -it -- /bin/sh

# --- Output ---
# / #
# / #
# --------------

# Increase load on the Nginx web server Pods
while true; do wget --server-response http://nginx-service 2>&1 | awk '/^  HTTP/{printf $3 " "}'; done

# --- Output ---
# 
# --------------

```

## List Horizontal Pod Autoscalers (HPAs)

- We see that:
  - The current target value under `TARGETS`
    - First increases to `4%` above the threshold target value `2%`.
    - Then decreases to `2%` as the HPA scales out more Pods.
    - The threshold target values is still `2%`.
  - The HPA has autoscaled the number of Pod `REPLICAS` from `2` to `4`.
    - The `MINPODS` and `MAXPODS` is still `2` and `10` respectively.

You can also run the command `kubectl get hpa -o wide --watch` in a separate terminal to see the HPA's alues changing.

In [9]:
!kubectl get hpa -o wide

NAME      REFERENCE                     TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
hpa-cpu   Deployment/nginx-deployment   4%/2%     2         10        2          3m20s


In [10]:
!kubectl get hpa -o wide

NAME      REFERENCE                     TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
hpa-cpu   Deployment/nginx-deployment   4%/2%     2         10        4          3m32s


In [11]:
!kubectl get hpa -o wide

NAME      REFERENCE                     TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
hpa-cpu   Deployment/nginx-deployment   3%/2%     2         10        4          4m17s


In [12]:
!kubectl get hpa -o wide

NAME      REFERENCE                     TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
hpa-cpu   Deployment/nginx-deployment   2%/2%     2         10        4          5m16s


## Stop the endless loop in the terminal running BusyBox

```bash
# Press Ctrl + C in the terminal with the interactive shell to BusyBox to terminate the while loop
```

## List Horizontal Pod Autoscalers (HPAs)

- We see that:
- The current target value under `TARGETS`
    - First decreases to `1%` under the threshold target value `2%`.
    - Continues to decreases to `0%` as the load on the Nginx web server replcas reduces.
    - The threshold target values is still `2%`.
  - The HPA has autoscaled the number of Pod `REPLICAS` from `4` to `2`.
    - The `MINPODS` and `MAXPODS` is still `2` and `10` respectively.

In [14]:
!kubectl get hpa

NAME      REFERENCE                     TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
hpa-cpu   Deployment/nginx-deployment   2%/2%     2         10        4          5m38s


In [15]:
!kubectl get hpa

NAME      REFERENCE                     TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
hpa-cpu   Deployment/nginx-deployment   1%/2%     2         10        4          6m17s


In [16]:
!kubectl get hpa

NAME      REFERENCE                     TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
hpa-cpu   Deployment/nginx-deployment   1%/2%     2         10        2          6m32s


In [17]:
!kubectl get hpa

NAME      REFERENCE                     TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
hpa-cpu   Deployment/nginx-deployment   0%/2%     2         10        2          7m17s


## Delete the Horizontal Pod Autoscaler (HPA)

**Note**

- When you delete an HPA, the current number of replicas for the Deployment the HPA is autoscaling will remain.
  - The number of replicas won't revert back to the `replicas` setting in the Deployment's YAML file.
- The desired amount of replicas has to be set manually after deleting the HPA.

In [18]:
!kubectl delete hpa hpa-cpu

horizontalpodautoscaler.autoscaling "hpa-cpu" deleted


## Delete the BusyBox Pod, Service and Deployment

In [19]:
!kubectl delete -f manifests/busybox-pod.yaml --grace-period=0 --force
!kubectl delete -f manifests/nginx-service.yaml
!kubectl delete -f manifests/nginx-deployment.yaml

pod "busybox-pod" force deleted
service "nginx-service" deleted
deployment.apps "nginx-deployment" deleted


## Experiment with other HPA YAML files

- There are an additional two YAML files for the HPA that you can experiment with:
  - `manifests/hpa-memory.yaml` that sets a memory (RAM) target instead of a CPU target.
  - `manifests/hpa-cpu-memory.yaml` that includes both a CPU and a memory target.
- Repeat this notebook, but use these YAML files instead of `manifests/hpa-cpu.yaml`.

## Disable the `metrics-server` Minikube addon (if using Minikube)

In [20]:
!minikube addons disable metrics-server
!minikube addons list

🌑  "The 'metrics-server' addon is disabled
|-----------------------------|----------|--------------|--------------------------------|
|         ADDON NAME          | PROFILE  |    STATUS    |           MAINTAINER           |
|-----------------------------|----------|--------------|--------------------------------|
| ambassador                  | minikube | disabled     | 3rd party (Ambassador)         |
| auto-pause                  | minikube | disabled     | minikube                       |
| cloud-spanner               | minikube | disabled     | Google                         |
| csi-hostpath-driver         | minikube | disabled     | Kubernetes                     |
| dashboard                   | minikube | disabled     | Kubernetes                     |
| default-storageclass        | minikube | disabled     | Kubernetes                     |
| efk                         | minikube | disabled     | 3rd party (Elastic)            |
| freshpod                    | minikube | disa

## Uninstall the Metrics Server (if not using Minikube)

- To uninstall the Metrics Server run the command: `kubectl delete -f components.yaml`

In [None]:
!kubectl delete -f manifests/components.yaml

## Delete the Cluster

- Delete your specific cluster
  - This example uses Minikube, so the command is:

    ```bash
    minikube stop
    minikube delete
    ```