# Setup Azure Kubernetes Infrastructure
In this notebook we will setup 
- An AKS cluster with
  - **GPU enabled Spot VM nodepool** for running elastic training
  - **CPU VM nodepool** for running Rendezevous server - training control plane
- Azure Storage Account for hosting training data and model training checkpoints
- Deploy Kubernetes Components
  - Torch Elastic Operator
  - ETCD server for training control plane
  - Azure Blob CSI Driver to map Blob storage to container as persistent volumes


## Define Variables
Set variables required for the project

In [55]:
subscription_id = "f869415f-5cff-46a3-b728-20659d14d62d"           # fill in
resource_group = "elastic-lab"           # fill in
region = "eastus2"                    # fill in

storage_account_name = "trainingdataen"        # fill in
storage_container_name = "workerdata"             

aks_name = "elasticaks"    # feel free to replace or use this default
aks_spot_nodepool = "spotgpu"       # feel free to replace or use this default
aks_cpu_nodepool = "cpuworkers"     # feel free to replace or use this default
aks_gpu_sku = "Standard_NC12"       # feel free to replace or use this default 

## Azure account login
If you are not already logged in to an Azure account, the command below will initiate a login. This will pop up a browser where you can select your login. (if no web browser is available or if the web browser fails to open, use device code flow with `az login --use-device-code` or login in WSL command  prompt and proceed to notebook)

In [None]:
%%bash
az login -o table


In [None]:
!az account set --subscription "$subscription_id"

In [None]:
!az account show

## Create Resource Group
Azure encourages the use of groups to organize all the Azure components you deploy. That way it is easier to find them but also we can delete a number of resources simply by deleting the group.

In [10]:
!az group create -l {region} -n {resource_group}

{
  "id": "/subscriptions/f869415f-5cff-46a3-b728-20659d14d62d/resourceGroups/elastic-lab",
  "location": "eastus2",
  "managedBy": null,
  "name": "elastic-lab",
  "properties": {
    "provisioningState": "Succeeded"
  },
  "tags": null,
  "type": "Microsoft.Resources/resourceGroups"
}
[0m

## Create AKS Cluster and NodePools
Below, we create the AKS cluster with default 1 system node (to save time, in production use more nodes as per best practices) in the resource group we created earlier. This step can take 5 or more minutes.


In [22]:
%%time
!az aks create --resource-group {resource_group} \
    --name {aks_name} \
    --node-vm-size Standard_D2s_v3 \
    --node-count 1 \
    --location {region}  \
    --kubernetes-version 1.18.17 \
    --generate-ssh-keys

[33mThe behavior of this command has been altered by the following extension: aks-preview[0m
{
  "aadProfile": null,
  "addonProfiles": {
    "KubeDashboard": {
      "config": null,
      "enabled": false,
      "identity": null
    }
  },
  "agentPoolProfiles": [
    {
      "availabilityZones": null,
      "count": 1,
      "enableAutoScaling": null,
      "enableEncryptionAtHost": false,
      "enableFips": false,
      "enableNodePublicIp": false,
      "gpuInstanceProfile": null,
      "kubeletConfig": null,
      "kubeletDiskType": "OS",
      "linuxOsConfig": null,
      "maxCount": null,
      "maxPods": 110,
      "minCount": null,
      "mode": "System",
      "name": "nodepool1",
      "nodeImageVersion": "AKSUbuntu-1804gen2-2021.05.01",
      "nodeLabels": {},
      "nodePublicIpPrefixId": null,
      "nodeTaints": null,
      "orchestratorVersion": "1.18.17",
      "osDiskSizeGb": 128,
      "osDiskType": "Managed",
      "osSku": "Ubuntu",
      "osType": "Linux",
    

## Connect to AKS Cluster
To configure kubectl to connect to Kubernetes cluster, run the following command

In [24]:
!az aks get-credentials --resource-group {resource_group} --name {aks_name}

[33mThe behavior of this command has been altered by the following extension: aks-preview[0m
Merged "elasticaks" as current context in /home/lenisha/.kube/config
[0m

Let's verify connection by listing the nodes.

In [25]:
!kubectl get nodes

NAME                                STATUS   ROLES   AGE     VERSION
aks-nodepool1-40607851-vmss000000   Ready    agent   2m24s   v1.18.17


Taint System node with `CriticalAddonsOnly` taint so it is available only for system workloads

In [47]:
!kubectl taint nodes -l agentpool=nodepool1 CriticalAddonsOnly=true:NoSchedule --overwrite


node/aks-nodepool1-40607851-vmss000000 modified


## Create GPU enabled and CPU Node Pools
To create GPU enabled nodepool, will use fully configured AKS image that contains the NVIDIA device plugin for Kubenetes, see [Use the AKS specialized GPU image (preview)](https://docs.microsoft.com/en-us/azure/aks/gpu-cluster#use-the-aks-specialized-gpu-image-preview). Creating nodepools could take five or more minutes.

In [26]:
%%time
!az feature register --name GPUDedicatedVHDPreview --namespace Microsoft.ContainerService
!az feature list -o table --query "[?contains(name, 'Microsoft.ContainerService/GPUDedicatedVHDPreview')].{Name:name,State:properties.state}"
!az provider register --namespace Microsoft.ContainerService
!az extension add --name aks-preview


[33mOnce the feature 'GPUDedicatedVHDPreview' is registered, invoking 'az provider register -n Microsoft.ContainerService' is required to get the change propagated[0m
{
  "id": "/subscriptions/f869415f-5cff-46a3-b728-20659d14d62d/providers/Microsoft.Features/providers/Microsoft.ContainerService/features/GPUDedicatedVHDPreview",
  "name": "Microsoft.ContainerService/GPUDedicatedVHDPreview",
  "properties": {
    "state": "Registered"
  },
  "type": "Microsoft.Features/providers/features"
}
[0mName                                               State
-------------------------------------------------  ----------
Microsoft.ContainerService/GPUDedicatedVHDPreview  Registered
[33mExtension 'aks-preview' is already installed.[0m
[0mCPU times: user 347 ms, sys: 221 ms, total: 569 ms
Wall time: 8.1 s


In [49]:
%%time
!az aks nodepool add \
    --resource-group {resource_group} \
    --cluster-name {aks_name} \
    --name {aks_spot_nodepool} \
    --priority Spot \
    --eviction-policy Delete \
    --spot-max-price -1 \
    --enable-cluster-autoscaler \
    --min-count 1 \
    --max-count 3 \
    --node-vm-size {aks_gpu_sku} \
    --aks-custom-headers UseGPUDedicatedVHD=true,usegen2vm=true

[33mThe behavior of this command has been altered by the following extension: aks-preview[0m
{
  "agentPoolType": "VirtualMachineScaleSets",
  "availabilityZones": null,
  "count": 3,
  "enableAutoScaling": true,
  "enableEncryptionAtHost": false,
  "enableFips": false,
  "enableNodePublicIp": false,
  "gpuInstanceProfile": null,
  "id": "/subscriptions/f869415f-5cff-46a3-b728-20659d14d62d/resourcegroups/elastic-lab/providers/Microsoft.ContainerService/managedClusters/elasticaks/agentPools/spotgpu",
  "kubeletConfig": null,
  "kubeletDiskType": "OS",
  "linuxOsConfig": null,
  "maxCount": 3,
  "maxPods": 110,
  "minCount": 1,
  "mode": "User",
  "name": "spotgpu",
  "nodeImageVersion": "AKSUbuntu-1804gpu-2021.05.01",
  "nodeLabels": {
    "kubernetes.azure.com/scalesetpriority": "spot"
  },
  "nodePublicIpPrefixId": null,
  "nodeTaints": [
    "kubernetes.azure.com/scalesetpriority=spot:NoSchedule"
  ],
  "orchestratorVersion": "1.18.17",
  "osDiskSizeGb": 128,
  "osDiskType": "Manag

## Verify GPU is available on Kubernetes Node
Now use the kubectl describe node command to confirm that the GPUs are schedulable. Under the Capacity section, the GPU should list as nvidia.com/gpu: 2.

In [62]:
!kubectl describe node -l kubernetes.azure.com/scalesetpriority=spot

Name:               aks-spotgpu-40607851-vmss000001
Roles:              agent
Labels:             accelerator=nvidia
                    agentpool=spotgpu
                    beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=Standard_NC12
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=eastus2
                    failure-domain.beta.kubernetes.io/zone=0
                    kubernetes.azure.com/cluster=MC_elastic-lab_elasticaks_eastus2
                    kubernetes.azure.com/node-image-version=AKSUbuntu-1804gpu-2021.05.01
                    kubernetes.azure.com/role=agent
                    kubernetes.azure.com/scalesetpriority=spot
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=aks-spotgpu-40607851-vmss000001
                    kubernetes.io/os=linux
                    kubernetes.io/role=agent
                    node-role.kubernetes.io/agent=
    

## Create CPU NodePool for running ETCD

In [50]:
%%time 
!az aks nodepool add \
  --resource-group {resource_group} \
    --cluster-name {aks_name} \
    --name {aks_cpu_nodepool} \
    --enable-cluster-autoscaler \
    --min-count 1 \
    --max-count 3 \
    --node-vm-size Standard_D2s_v3 

[33mThe behavior of this command has been altered by the following extension: aks-preview[0m
{
  "agentPoolType": "VirtualMachineScaleSets",
  "availabilityZones": null,
  "count": 3,
  "enableAutoScaling": true,
  "enableEncryptionAtHost": false,
  "enableFips": false,
  "enableNodePublicIp": false,
  "gpuInstanceProfile": null,
  "id": "/subscriptions/f869415f-5cff-46a3-b728-20659d14d62d/resourcegroups/elastic-lab/providers/Microsoft.ContainerService/managedClusters/elasticaks/agentPools/cpuworkers",
  "kubeletConfig": null,
  "kubeletDiskType": "OS",
  "linuxOsConfig": null,
  "maxCount": 3,
  "maxPods": 110,
  "minCount": 1,
  "mode": "User",
  "name": "cpuworkers",
  "nodeImageVersion": "AKSUbuntu-1804gen2-2021.05.01",
  "nodeLabels": null,
  "nodePublicIpPrefixId": null,
  "nodeTaints": null,
  "orchestratorVersion": "1.18.17",
  "osDiskSizeGb": 128,
  "osDiskType": "Managed",
  "osSku": "Ubuntu",
  "osType": "Linux",
  "podSubnetId": null,
  "powerState": {
    "code": "Runnin

## Verify Taints on the Kubernetes nodes
Verify that system pool and have the Taints `CriticalAddonsOnly` and `kubernetes.azure.com/scalesetpriority` respectively   


In [51]:
!kubectl get nodes -o json | jq '.items[].spec.taints'

[1;30mnull[0m
[1;30mnull[0m
[1;30mnull[0m
[1;39m[
  [1;39m{
    [0m[34;1m"effect"[0m[1;39m: [0m[0;32m"NoSchedule"[0m[1;39m,
    [0m[34;1m"key"[0m[1;39m: [0m[0;32m"CriticalAddonsOnly"[0m[1;39m,
    [0m[34;1m"value"[0m[1;39m: [0m[0;32m"true"[0m[1;39m
  [1;39m}[0m[1;39m
[1;39m][0m
[1;39m[
  [1;39m{
    [0m[34;1m"effect"[0m[1;39m: [0m[0;32m"NoSchedule"[0m[1;39m,
    [0m[34;1m"key"[0m[1;39m: [0m[0;32m"kubernetes.azure.com/scalesetpriority"[0m[1;39m,
    [0m[34;1m"value"[0m[1;39m: [0m[0;32m"spot"[0m[1;39m
  [1;39m}[0m[1;39m
[1;39m][0m
[1;39m[
  [1;39m{
    [0m[34;1m"effect"[0m[1;39m: [0m[0;32m"NoSchedule"[0m[1;39m,
    [0m[34;1m"key"[0m[1;39m: [0m[0;32m"kubernetes.azure.com/scalesetpriority"[0m[1;39m,
    [0m[34;1m"value"[0m[1;39m: [0m[0;32m"spot"[0m[1;39m
  [1;39m}[0m[1;39m
[1;39m][0m
[1;39m[
  [1;39m{
    [0m[34;1m"effect"[0m[1;39m: [0m[0;32m"NoSchedule"[0m[1;39m,
    [0m[34;1m"ke

# Create Storage Account for training data 
In this section of the notebook, we'll create an Azure blob storage that we'll use throughout the tutorial. This object store will be used to store input images and save checkpoints. Use `az cli` to create the account

In [56]:
%%time
!az storage account create -n {storage_account_name} -g {resource_group} --query 'provisioningState'


"Succeeded"
[K[0mCPU times: user 1.06 s, sys: 456 ms, total: 1.51 s
Wall time: 23.6 s


Grab the keys of the storage account that was just created.We would need them for binding Kubernetes Persistent Volume. The --quote '[0].value' part of the command simply means to select the value of the zero-th indexed of the set of keys.

In [57]:
key = !az storage account keys list --account-name {storage_account_name} -g {resource_group} --query '[0].value'


The stdout from the command above is stored in a string array of 1. Select the element in the array and ttrip opening and closing quotation marks.

In [58]:
storage_account_key = str(key[0][1:-1]) # this is used to strip opening and closing quotation marks

In [59]:
# create storage container

!az storage container create \
    --account-name {storage_account_name} \
    --account-key {storage_account_key} \
    --name {storage_container_name}

{
  "created": true
}
[0m

# Install Kubernetes Blob CSI Driver 
[Azure Blob Storage CSI driver for Kubernetes](https://github.com/kubernetes-sigs/blob-csi-driver) allows Kubernetes to access Azure Storage. We will deploy it using Helm3 package manager as described in the docs https://github.com/kubernetes-sigs/blob-csi-driver/tree/master/charts

In [66]:
!helm repo add blob-csi-driver https://raw.githubusercontent.com/kubernetes-sigs/blob-csi-driver/master/charts
!helm install blob-csi-driver blob-csi-driver/blob-csi-driver --namespace kube-system --version v1.1.0



"blob-csi-driver" already exists with the same configuration, skipping
Error: cannot re-use a name that is still in use
^C


In [71]:
!kubectl -n kube-system get pods -l "app.kubernetes.io/instance=blob-csi-driver"

NAME                                   READY   STATUS    RESTARTS   AGE
csi-blob-controller-56956c6dbd-bhnhx   4/4     Running   0          6m7s
csi-blob-controller-56956c6dbd-bj8s2   4/4     Running   0          6m7s
csi-blob-node-4ff9l                    3/3     Running   0          6m7s
csi-blob-node-5vsp7                    3/3     Running   0          6m7s
csi-blob-node-94k5j                    3/3     Running   0          4m21s
csi-blob-node-xhvwp                    3/3     Running   0          6m7s


## Create Persistent Volume for Azure Blob
For more details on creating   `PersistentVolume` using CSI driver refer to https://github.com/kubernetes-sigs/blob-csi-driver/blob/master/deploy/example/e2e_usage.md

In [76]:
!kubectl create namespace elastic-job
# Create secret to access storage account
!kubectl create secret generic azure-blobsecret --from-literal azurestorageaccountname={storage_account_name} --from-literal azurestorageaccountkey="{storage_account_key}" --type=Opaque -n elastic-job



Error from server (AlreadyExists): namespaces "elastic-job" already exists
Error from server (AlreadyExists): secrets "azure-blobsecret" already exists


Persistent Volume YAML definition is in `kube/azure-blobfules-pv.yaml` with fields pointing to secret created above and containername we created in storage account:
```
  csi:
    driver: blob.csi.azure.com
    readOnly: false
    volumeHandle: trainingdata  # make sure this volumeid is unique in the cluster
    volumeAttributes:
      containerName: workerdata # Modify if changed in Notebook
    nodeStageSecretRef:
      name: azure-blobsecret
      namespace: elastic-job
```

In [78]:
# Create PersistentVolume and PersistenVollumeClaim for container mounts
!kubectl apply -f kube/azure-blobfuse-pv.yaml

persistentvolume/pv-blob created
persistentvolumeclaim/pvc-blob created


Now all the Kubernetes preparation steps are done, we will look at adjusting training script to be able to run it in Elastic Fault tolerant way [Step 2 Distributed Training Script](/Step2-DistributedTraining.md)