### Deploy Web App on Azure Container Services (AKS)
In this notebook, we will set up an Azure Container Service which will be managed by Kubernetes. We will then take the Docker image we created earlier that contains our app and deploy it to the AKS cluster. Then, we will check everything is working by sending an image to it and getting it scored.
    
The process is split into the following steps:
* [Define our resource names](#section1)
* [Login to Azure](#section2)
* [Create resource group and create AKS](#section3)
* [Connect to AKS](#section4)
* [Deploy our app](#section5)
* [Tear it all down](#section6)

This guide assumes is designed to be run on linux and requires that the Azure CLI is installed.

In [1]:
import os
import json
from testing_utilities import write_json_to_file
%load_ext dotenv

<a id='section1'></a>
## Setup
Below are the various name definitions for the resources needed to setup AKS.

In [2]:
%%writefile --append .env
# This cell is tagged `parameters`
# Please modify the values below as you see fit

# If you have multiple subscriptions select the subscription you want to use 
selected_subscription = "YOUR_SUBSCRIPTION"

# Resource group, name and location for AKS cluster.
resource_group = "RESOURCE_GROUP" 
aks_name = "AKS_CLUSTER_NAME"
location = "eastus"

In [4]:
%%writefile --append .env

# This cell is tagged `parameters`
# Please modify the values below as you see fit

# If you have multiple subscriptions select the subscription you want to use 
selected_subscription = "Team Danielle Internal"

# Resource group, name and location for AKS cluster.
resource_group = "fbaksrg" 
aks_name = "fbAKSClustergpu"
location = "eastus"


Appending to .env


In [5]:
%dotenv
image_name = os.getenv('docker_login') + os.getenv('image_repo') # Set image_name to caia/tfresnet-gpu 
                                                                # if you want to skip creating your own container.

<a id='section2'></a>
## Azure account login
The command below will initiate a login to your Azure account. It will pop up with an url to go to where you will enter a one off code and log into your Azure account using your browser.

In [9]:
!az login -o table

In [6]:
!az account set --subscription "$selected_subscription"

In [8]:
!az account show

In [5]:
!az provider register -n Microsoft.ContainerService

[33mRegistering is still on-going. You can monitor using 'az provider show -n Microsoft.ContainerService'[0m


In [None]:
!az provider show -n Microsoft.ContainerService

<a id='section3'></a>
## Create resource group and create AKS

### Create resource group
Azure encourages the use of groups to organise all the Azure components you deploy. That way it is easier to find them but also we can deleted a number of resources simply by deleting the group.

In [9]:
 !az group create --name $resource_group --location $location

Below, we create the AKS cluster in the resource group we created earlier. This can take up to 15 minutes.

In [10]:
!az aks create --resource-group $resource_group --name $aks_name --node-count 1 --generate-ssh-keys -s Standard_NC6

### Install kubectl CLI

To connect to the Kubernetes cluster, we will use kubectl, the Kubernetes command-line client. To install, run the following:

In [11]:
!sudo az aks install-cli

[33mDownloading client to /usr/local/bin/kubectl from https://storage.googleapis.com/kubernetes-release/release/v1.11.1/bin/linux/amd64/kubectl[0m
[33mPlease ensure that /usr/local/bin is in your search PATH, so the `kubectl` command can be found.[0m


<a id='section4'></a>
## Connect to AKS cluster

To configure kubectl to connect to the Kubernetes cluster, run the following command:

In [12]:
!az aks get-credentials --resource-group $resource_group --name $aks_name

Merged "fbAKSClustergpu" as current context in /home/fboylu/.kube/config


Let's verify connection by listing the nodes.

In [13]:
!kubectl get nodes

NAME                       STATUS    ROLES     AGE       VERSION
aks-nodepool1-28016997-0   Ready     agent     59d       v1.9.6


Let's check the pods on our cluster.

In [16]:
!kubectl get pods --all-namespaces

NAMESPACE     NAME                                    READY     STATUS    RESTARTS   AGE
kube-system   azureproxy-79c5db744-r5ggd              1/1       Running   2          59d
kube-system   heapster-55f855b47-4m7xr                2/2       Running   0          59d
kube-system   kube-dns-v20-7c556f89c5-4z4z6           3/3       Running   0          59d
kube-system   kube-dns-v20-7c556f89c5-mp5fh           3/3       Running   0          59d
kube-system   kube-proxy-k8t2c                        1/1       Running   0          59d
kube-system   kube-svc-redirect-z6ppp                 1/1       Running   8          59d
kube-system   kubernetes-dashboard-546f987686-8krxm   1/1       Running   2          59d
kube-system   tunnelfront-695bcbdc68-t4l8t            1/1       Running   28         59d


<a id='section5'></a>
## Deploy application

Below we define our Kubernetes manifest file for our service and load balancer. Note that we have to specify the volume mounts to the drivers that are located on the node.


In [17]:
app_template = {
  "apiVersion": "apps/v1beta1",
  "kind": "Deployment",
  "metadata": {
      "name": "azure-dl"
  },
  "spec":{
      "replicas":1,
      "template":{
          "metadata":{
              "labels":{
                  "app":"azure-dl"
              }
          },
          "spec":{
              "containers":[
                  {
                      "name": "azure-dl",
                      "image": image_name,
                      "env":[
                          {
                              "name": "LD_LIBRARY_PATH",
                              "value": "$LD_LIBRARY_PATH:/usr/local/nvidia/lib64:/opt/conda/envs/py3.6/lib"
                          }
                      ],
                      "ports":[
                          {
                              "containerPort":80,
                              "name":"model"
                          }
                      ],
                      "volumeMounts":[
                          {
                            "mountPath": "/usr/local/nvidia",
                            "name": "nvidia"
                          }
                      ],
                      "resources":{
                           "requests":{
                               "alpha.kubernetes.io/nvidia-gpu": 1
                           },
                           "limits":{
                               "alpha.kubernetes.io/nvidia-gpu": 1
                           }
                       }  
                  }
              ],
              "volumes":[
                  {
                      "name": "nvidia",
                      "hostPath":{
                          "path":"/usr/local/nvidia"
                      },
                  },
              ]
          }
      }
  }
}

service_temp = {
  "apiVersion": "v1",
  "kind": "Service",
  "metadata": {
      "name": "azure-dl"
  },
  "spec":{
      "type": "LoadBalancer",
      "ports":[
          {
              "port":80
          }
      ],
      "selector":{
            "app":"azure-dl"
      }
   }
}

In [18]:
write_json_to_file(app_template, 'az-dl.json') # We write the service template to the json file

In [19]:
write_json_to_file(service_temp, 'az-dl.json', mode='a') # We add the loadbelanacer template to the json file

Let's check the manifest created.

In [20]:
!cat az-dl.json

{
    "apiVersion": "apps/v1beta1",
    "kind": "Deployment",
    "metadata": {
        "name": "azure-dl"
    },
    "spec": {
        "replicas": 1,
        "template": {
            "metadata": {
                "labels": {
                    "app": "azure-dl"
                }
            },
            "spec": {
                "containers": [
                    {
                        "env": [
                            {
                                "name": "LD_LIBRARY_PATH",
                                "value": "$LD_LIBRARY_PATH:/usr/local/nvidia/lib64:/opt/conda/envs/py3.6/lib"
                            }
                        ],
                        "image": "caia/tfresnet-gpu",
                        "name": "azure-dl",
                        "ports": [
                            {
                                "containerPort": 80,
                                "name": "model"
                            }
             

Next, we will use kubectl create command to deploy our application.

In [21]:
!kubectl create -f az-dl.json

deployment.apps/azure-dl created
service/azure-dl created


Let's check if the pod is deployed.

In [25]:
!kubectl get pods --all-namespaces

NAMESPACE     NAME                                    READY     STATUS    RESTARTS   AGE
default       azure-dl-c6b866d47-mn8pr                1/1       Running   0          10m
kube-system   azureproxy-79c5db744-r5ggd              1/1       Running   2          59d
kube-system   heapster-55f855b47-4m7xr                2/2       Running   0          59d
kube-system   kube-dns-v20-7c556f89c5-4z4z6           3/3       Running   0          59d
kube-system   kube-dns-v20-7c556f89c5-mp5fh           3/3       Running   0          59d
kube-system   kube-proxy-k8t2c                        1/1       Running   0          59d
kube-system   kube-svc-redirect-z6ppp                 1/1       Running   8          59d
kube-system   kubernetes-dashboard-546f987686-8krxm   1/1       Running   2          59d
kube-system   tunnelfront-695bcbdc68-t4l8t            1/1       Running   28         59d


If anything goes wrong you can use the commands below to observe the events on the node as well as review the logs.

In [26]:
!kubectl get events

LAST SEEN   FIRST SEEN   COUNT     NAME                                        KIND         SUBOBJECT                   TYPE      REASON                  SOURCE                              MESSAGE
12m         12m          1         azure-dl-b47cf8cdd-4gg6v.15485effd8145de4   Pod          spec.containers{azure-dl}   Normal    Killing                 kubelet, aks-nodepool1-28016997-0   Killing container with id docker://azure-dl:Need to kill Pod
10m         10m          1         azure-dl-c6b866d47-mn8pr.15485f14253c78d6   Pod                                      Normal    Scheduled               default-scheduler                   Successfully assigned azure-dl-c6b866d47-mn8pr to aks-nodepool1-28016997-0
10m         10m          1         azure-dl-c6b866d47-mn8pr.15485f1432f1e899   Pod                                      Normal    SuccessfulMountVolume   kubelet, aks-nodepool1-28016997-0   MountVolume.SetUp succeeded for volume "nvidia" 
10m         10m          1         azure-dl

In [27]:
pod_json = !kubectl get pods -o json
pod_dict = json.loads(''.join(pod_json))
!kubectl logs {pod_dict['items'][0]['metadata']['name']}

2018-08-06 18:33:16,788 CRIT Supervisor running as root (no user in config file)
2018-08-06 18:33:16,790 INFO supervisord started with pid 1
2018-08-06 18:33:17,792 INFO spawned: 'program_exit' with pid 9
2018-08-06 18:33:17,794 INFO spawned: 'nginx' with pid 10
2018-08-06 18:33:17,796 INFO spawned: 'gunicorn' with pid 11
2018-08-06 18:33:18,828 INFO success: program_exit entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-08-06 18:33:22.752422: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-08-06 18:33:22.947363: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: ddde:00:00.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2018-08-06 18:33:22.947410: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0

It can take a few minutes for the service to populate the EXTERNAL-IP field. This will be the IP you use to call the service. You can also specify an IP to use please see the AKS documentation for further details.

In [28]:
!kubectl get service azure-dl

NAME       TYPE           CLUSTER-IP     EXTERNAL-IP     PORT(S)        AGE
azure-dl   LoadBalancer   10.0.153.149   40.121.110.33   80:32087/TCP   11m


Next, we will [test our web application](05_TestWebApp.ipynb) deployed on AKS. 