# 4. Neural Style Transfer on AKS

Now that the AKS cluster is up, we need to deploy our __flask app__ and __scoring app__ onto it.

To do so, we'll do the following:
1. Build our __flask app__ and __scoring app__ push it to Dockerhub
2. Create our dot-yaml files for each of these apps (these dot-yaml files will need to have the proper configuration for the pods to use blobfuse to access our blob storage container). We should end up creating: `flask_app_deployment.json` and `scoring_app_deployment.json`
3. Use `kubectl` to make these deployments to our AKS cluster
4. Expose the __flask app__ REST endpoint so that it can be accessed externally

### Kubernetes Deployment
In this notebook, we will deploy our __flask app__ and __scoring app__ on the kubernetes cluster. Since the __flask app__ does not require heavy computation, we will deploy it on one node and reserve the remaining nodes for the __scoring app__ as it will perform the parallel computation.

---

### Import packages and load .env

In [1]:
from dotenv import set_key, get_key, find_dotenv, load_dotenv
from pathlib import Path
import subprocess
import json
import os

In [2]:
env_path = find_dotenv(raise_error_if_not_found=True)
load_dotenv(env_path)

True

### Build Scoring App Docker Image

In [3]:
%%writefile scoring_app/requirements.txt
azure==4.0.0
torch==0.4.1
torchvision==0.2.1

Overwriting scoring_app/requirements.txt


In [4]:
%%writefile scoring_app/Dockerfile

FROM nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04

RUN echo "deb http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64 /" > /etc/apt/sources.list.d/nvidia-ml.list

RUN apt-get update && apt-get install -y --no-install-recommends \
        build-essential \
        ca-certificates \
        cmake \
        curl \
        git \
        nginx \
        supervisor \
        wget && \
        rm -rf /var/lib/apt/lists/*

ENV PYTHON_VERSION=3.6
RUN curl -o ~/miniconda.sh -O  https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh  && \
    chmod +x ~/miniconda.sh && \
    ~/miniconda.sh -b -p /opt/conda && \
    rm ~/miniconda.sh && \
    /opt/conda/bin/conda create -y --name py$PYTHON_VERSION python=$PYTHON_VERSION && \
    /opt/conda/bin/conda clean -ya
ENV PATH /opt/conda/envs/py$PYTHON_VERSION/bin:$PATH
ENV LD_LIBRARY_PATH /opt/conda/envs/py$PYTHON_VERSION/lib:/usr/local/cuda/lib64/:$LD_LIBRARY_PATH
ENV PYTHONPATH /code/:$PYTHONPATH

RUN mkdir /app
WORKDIR /app
ADD process_images_from_queue.py /app
ADD style_transfer.py /app
ADD main.py /app
ADD util.py /app
ADD requirements.txt /app
ADD azure.py /app

RUN pip install --no-cache-dir -r requirements.txt

CMD ["python", "main.py"]

Overwriting scoring_app/Dockerfile


In [5]:
!sudo docker build -t {get_key(env_path, "SCORING_IMAGE")} scoring_app

Sending build context to Docker daemon  35.84kB
Step 1/17 : FROM nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04
 ---> 7e8410ba243b
Step 2/17 : RUN echo "deb http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64 /" > /etc/apt/sources.list.d/nvidia-ml.list
 ---> Using cache
 ---> 2aee8c10151f
Step 3/17 : RUN apt-get update && apt-get install -y --no-install-recommends         build-essential         ca-certificates         cmake         curl         git         nginx         supervisor         wget &&         rm -rf /var/lib/apt/lists/*
 ---> Using cache
 ---> c69b13e821b7
Step 4/17 : ENV PYTHON_VERSION=3.6
 ---> Using cache
 ---> 9c3490d15c2d
Step 5/17 : RUN curl -o ~/miniconda.sh -O  https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh  &&     chmod +x ~/miniconda.sh &&     ~/miniconda.sh -b -p /opt/conda &&     rm ~/miniconda.sh &&     /opt/conda/bin/conda create -y --name py$PYTHON_VERSION python=$PYTHON_VERSION &&     /opt/conda/bin/conda c

Tag and push docker image

In [6]:
!sudo docker login --username pjh177787 --password 'Hans&951022'

https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded


In [7]:
repo = "{}/{}".format(get_key(env_path, "DOCKER_LOGIN"), get_key(env_path, "SCORING_IMAGE"))

In [8]:
!sudo docker tag {get_key(env_path, "SCORING_IMAGE")} {repo}

In [9]:
!sudo docker push {repo}

The push refers to repository [docker.io/pjh177787/oxford_scoring_app]

[1B34c46d98: Preparing 
[1B112a59d2: Preparing 
[1Bb59fe06d: Preparing 
[1B93a511e9: Preparing 
[1B34dc443b: Preparing 
[1B711df71e: Preparing 
[1B0c6970e5: Preparing 
[1Be82df0c2: Preparing 
[1B89b11fc2: Preparing 
[1Bb7e6614f: Preparing 
[1Bc5838665: Preparing 
[1B7ea5d26b: Preparing 
[1B7acf624e: Preparing 
[1Ba0fe4fdd: Preparing 
[1B1c46eb92: Preparing 
[1B6e800c43: Preparing 
[1Bf22d44f3: Preparing 
[1B6f329a25: Preparing 
[1B7de5faec: Preparing 
[1Ba27b0484: Layer already exists K[18A[1K[K[15A[1K[K[12A[1K[K[14A[1K[K[11A[1K[K[10A[1K[K[8A[1K[K[9A[1K[K[13A[1K[K[7A[1K[K[6A[1K[K[5A[1K[K[2A[1K[K[3A[1K[K[1A[1K[Klatest: digest: sha256:1ee0eb0937c29ab45041fb6b6e3e6fba3f9441b1172410fa8d27c409455bfd8f size: 4507


### Build Flask App Docker Image

Create our Dockerfile and save it to the directory, `flask_app/`.

In [10]:
%%writefile flask_app/Dockerfile

FROM continuumio/miniconda3

RUN mkdir /app
WORKDIR /app
ADD add_images_to_queue.py /app
ADD preprocess.py /app
ADD postprocess.py /app
ADD util.py /app
ADD main.py /app
ADD azure.py /app

RUN conda install -c conda-forge -y ffmpeg
RUN pip install azure
RUN pip install flask

CMD ["python", "main.py"]

Overwriting flask_app/Dockerfile


Build the Docker image

In [11]:
!sudo docker build -t {get_key(env_path, "FLASK_IMAGE")} flask_app

Sending build context to Docker daemon  24.06kB
Step 1/12 : FROM continuumio/miniconda3
 ---> 6b5cf97566c3
Step 2/12 : RUN mkdir /app
 ---> Using cache
 ---> 3e41fbdb3278
Step 3/12 : WORKDIR /app
 ---> Using cache
 ---> 032cc5cbe3da
Step 4/12 : ADD add_images_to_queue.py /app
 ---> Using cache
 ---> a7aad3e80c4e
Step 5/12 : ADD preprocess.py /app
 ---> Using cache
 ---> 88ec5fb7fcf2
Step 6/12 : ADD postprocess.py /app
 ---> Using cache
 ---> 796d0a2feade
Step 7/12 : ADD util.py /app
 ---> Using cache
 ---> 842a56dcdc61
Step 8/12 : ADD main.py /app
 ---> Using cache
 ---> c01786b74db0
Step 9/12 : RUN conda install -c conda-forge -y ffmpeg
 ---> Using cache
 ---> 44813df7c13c
Step 10/12 : RUN pip install azure
 ---> Using cache
 ---> 3f610cf3cffd
Step 11/12 : RUN pip install flask
 ---> Using cache
 ---> 1391f94ad253
Step 12/12 : CMD ["python", "main.py"]
 ---> Using cache
 ---> 8fd611bb40dc
Successfully built 8fd611bb40dc
Successfully tagged oxford_flask_app:latest


Tag and push.

In [12]:
repo = "{}/{}".format(get_key(env_path, "DOCKER_LOGIN"), get_key(env_path, "FLASK_IMAGE"))

In [13]:
!sudo docker tag {get_key(env_path, "FLASK_IMAGE")} {repo}

In [14]:
!sudo docker push {repo}

The push refers to repository [docker.io/pjh177787/oxford_flask_app]

[1B46fb0425: Preparing 
[1Bdbdc848b: Preparing 
[1B671269e9: Preparing 
[1B2767c2e6: Preparing 
[1Ba5e2b056: Preparing 
[1Bfe3428e9: Preparing 
[1B55a10f32: Preparing 
[2B55a10f32: Waiting g 
[1B66549fba: Preparing 
[1Bc65c8dc4: Preparing 
[1B2f5d7ee9: Preparing 
[2B2f5d7ee9: Waiting g 
[1B1ff9ade6: Preparing 
[2B1ff9ade6: Layer already exists K[13A[1K[K[11A[1K[K[10A[1K[K[9A[1K[K[8A[1K[K[7A[1K[K[5A[1K[K[4A[1K[K[3A[1K[K[1A[1K[K[2A[1K[Klatest: digest: sha256:665578e4e83eb869241d779113116ebadc84e0446c8b76bfaf9710db561d479a size: 3249


### Create our Flask App and Scoring App deployments on AKS

We need to deploy both our aci and aks docker images to the AKS cluster. Since we'll need to set up our gpu and drivers and blobfuse mount point for both deployments, we'll set these up first:

In [15]:
volume_mounts = [
#     {"name": "nvidia", "mountPath": "/usr/local/nvidia"},
#     {"name": "blob", "mountPath": get_key(env_path, "MOUNT_DIR")},
]

resources = {
#     "requests": {"alpha.kubernetes.io/nvidia-gpu": 1},
#     "limits": {"alpha.kubernetes.io/nvidia-gpu": 1},
}

volumes = [
#     {"name": "nvidia", "hostPath": {"path": "/usr/local/nvidia"}},
#     {
#         "name": "blob",
#         "flexVolume": {
#             "driver": "azure/blobfuse",
#             "readOnly": False,
#             "secretRef": {"name": "blobfusecreds"},
#             "options": {
#                 "container": get_key(env_path, "STORAGE_CONTAINER_NAME"),
#                 "tmppath": "/tmp/blobfuse",
#                 "mountoptions": "--file-cache-timeout-in-seconds=120 --use-https=true",
#             },
#         },
#     },
]

env = [
    {
        "name": "MOUNT_DIR", 
        "value": get_key(env_path, "MOUNT_DIR")
    },
    {
        "name": "LB_LIBRARY_PATH",
        "value": "$LD_LIBRARY_PATH:/usr/local/nvidia/lib64:/opt/conda/envs/py3.6/lib",
    },
    {
        "name": "DP_DISABLE_HEALTHCHECKS", 
        "value": "xids"
    },
    {
        "name": "STORAGE_MODEL_DIR",
        "value": get_key(env_path, "STORAGE_MODEL_DIR")
    },
    {
        "name": "SUBSCRIPTION_ID",
        "value": get_key(env_path, "SUBSCRIPTION_ID")
    },
    {
        "name": "RESOURCE_GROUP",
        "value": get_key(env_path, "RESOURCE_GROUP")
    },
    {
        "name": "REGION",
        "value": get_key(env_path, "REGION")
    },
    {
        "name": "SB_SHARED_ACCESS_KEY_NAME",
        "value": get_key(env_path, "SB_SHARED_ACCESS_KEY_NAME")
    },
    {
        "name": "SB_SHARED_ACCESS_KEY_VALUE",
        "value": get_key(env_path, "SB_SHARED_ACCESS_KEY_VALUE")
    },
    {
        "name": "SB_NAMESPACE",
        "value": get_key(env_path, "SB_NAMESPACE")
    },
    {
        "name": "SB_QUEUE", 
        "value": get_key(env_path, "SB_QUEUE")
    },
]

Define the aks deployment and save it to a `scoring_app_deployment.json` file using the variables set above.

In [16]:
scoring_app_deployment_json = {
    "apiVersion": "apps/v1beta1",
    "kind": "Deployment",
    "metadata": {
        "name": "scoring-app", 
        "labels": {
            "purpose": "dequeue_messages_and_apply_style_transfer"
        }
    },
    "spec": {
        "replicas": int(get_key(env_path, "NODE_COUNT")) - 1,
        "template": {
            "metadata": {
                "labels": {
                    "app": "scoring-app"
                }
            },
            "spec": {
                "containers": [
                    {
                        "name": "scoring-app",
                        "image": "{}/{}:latest".format(get_key(env_path, "DOCKER_LOGIN"), get_key(env_path, "SCORING_IMAGE")),
                        "volumeMounts": volume_mounts,
                        "resources": resources,
                        "ports": [{
                            "containerPort": 433
                        }],
                        "env": env,
                    }
                ],
                "volumes": volumes
            },
        },
    },
}

with open("scoring_app_deployment.json", "w") as outfile:
    json.dump(scoring_app_deployment_json, outfile, indent=4, sort_keys=True)
    outfile.write('\n\n')

Using the `scoring_app_deployment.json` we created, create our deployment on AKS. This can take a few minutes...

In [17]:
!kubectl delete -f scoring_app_deployment.json

deployment.apps "scoring-app" deleted


In [18]:
!kubectl create -f scoring_app_deployment.json

deployment.apps/scoring-app created


Define the flask app deployment and save it to a `flask_app_deployment.json` file using the variables set above.

In [19]:
flask_app_deployment_json = {
    "apiVersion": "apps/v1beta1",
    "kind": "Deployment",
    "metadata": {
        "name": "flask-app", 
        "labels": {
            "purpose": "pre_and_post_processing_and_queue_images"
        }
    },
    "spec": {
        "replicas": 1,
        "template": {
            "metadata": {
                "labels": {
                    "app": "flask-app"
                }
            },
            "spec": {
                "containers": [
                    {
                        "name": "flask-app",
                        "image": "{}/{}:latest".format(get_key(env_path, "DOCKER_LOGIN"), get_key(env_path, "FLASK_IMAGE")),
                        "volumeMounts": volume_mounts,
                        "resources": resources,
                        "ports": [{
                            "containerPort": 8080
                        }],
                        "env": env,
                    }
                ],
                "volumes": volumes
            },
        },
    },
}

with open("flask_app_deployment.json", "w") as outfile:
    json.dump(flask_app_deployment_json, outfile, indent=4, sort_keys=True)
    outfile.write('\n\n')

Using the `flask_app_deployment.json` we created, create our flask app deployment on AKS. This can take a few minutes...

In [20]:
!kubectl delete -f flask_app_deployment.json

deployment.apps "flask-app" deleted


In [21]:
!kubectl create -f flask_app_deployment.json

deployment.apps/flask-app created


These deployments may take a few minutes. You can inspect the state of the pods by running the command: `kubectl get pods`. When the deployment is done, the results may look as follows:
```bash
NAME                           READY   STATUS              RESTARTS   AGE
flask-app-6db66c97ff-x8rq4     1/1     Running             0          78s
scoring-app-846dd6bc79-5nm5b   1/1     Running             0          73s
scoring-app-846dd6bc79-6qc6k   1/1     Running             0          73s
scoring-app-846dd6bc79-8gtsv   1/1     Running             0          73s
scoring-app-846dd6bc79-hjsfc   1/1     Running             0          73s
```

In [49]:
!kubectl exec scoring-app-7fc7d4cb9d-f7gct -- ls /opt/conda/envs/py3.6/lib/python3.6/site-packages/

error: unable to upgrade connection: container not found ("scoring-app")


In [40]:
!kubectl exec flask-app-66c4887ff8-wgztf -- ls /

app
bin
boot
dev
etc
home
lib
lib64
media
mnt
opt
proc
root
run
sbin
srv
sys
tmp
usr
var


In [29]:
!kubectl logs scoring-app-7fc7d4cb9d-f7gct

2019-07-15 09:15:16,150 [root:process_images_from_queue.py:34] DEBUG - Start listening to queue 'oxfordqueue' on service bus...
2019-07-15 09:15:16,150 [root:process_images_from_queue.py:39] DEBUG - Peek queue...
Traceback (most recent call last):
  File "/opt/conda/envs/py3.6/lib/python3.6/site-packages/urllib3/connection.py", line 160, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw)
  File "/opt/conda/envs/py3.6/lib/python3.6/site-packages/urllib3/util/connection.py", line 57, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  File "/opt/conda/envs/py3.6/lib/python3.6/socket.py", line 745, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/envs/py3.6/lib/python3.6/site-packages/urllib3/

In [27]:
!kubectl describe pods

Name:           flask-app-66c4887ff8-wgztf
Namespace:      default
Priority:       0
Node:           aks-nodepool1-80042525-2/10.240.0.6
Start Time:     Mon, 15 Jul 2019 09:10:55 +0000
Labels:         app=flask-app
                pod-template-hash=66c4887ff8
Annotations:    <none>
Status:         Running
IP:             10.244.4.8
Controlled By:  ReplicaSet/flask-app-66c4887ff8
Containers:
  flask-app:
    Container ID:   docker://e4d37554f27a8d3647f2a0cccff0225b2c0a5d9745961e5c9ec69e5815717b0a
    Image:          pjh177787/oxford_flask_app:latest
    Image ID:       docker-pullable://pjh177787/oxford_flask_app@sha256:665578e4e83eb869241d779113116ebadc84e0446c8b76bfaf9710db561d479a
    Port:           8080/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Mon, 15 Jul 2019 09:11:01 +0000
    Ready:          True
    Restart Count:  0
    Environment:
      MOUNT_DIR:                     /data
      LB_LIBRARY_PATH:               $LD_LIBRARY_PATH:/usr/local/n



Name:           scoring-app-7fc7d4cb9d-mm449
Namespace:      default
Priority:       0
Node:           aks-nodepool1-80042525-3/10.240.0.5
Start Time:     Mon, 15 Jul 2019 09:10:54 +0000
Labels:         app=scoring-app
                pod-template-hash=7fc7d4cb9d
Annotations:    <none>
Status:         Running
IP:             10.244.1.5
Controlled By:  ReplicaSet/scoring-app-7fc7d4cb9d
Containers:
  scoring-app:
    Container ID:   docker://dc5399d8fecf9d6e3a059bb1e318cd50b4aae44984247b7b756f4bbbd0b7d368
    Image:          pjh177787/oxford_scoring_app:latest
    Image ID:       docker-pullable://pjh177787/oxford_scoring_app@sha256:1ee0eb0937c29ab45041fb6b6e3e6fba3f9441b1172410fa8d27c409455bfd8f
    Port:           433/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Mon, 15 Jul 2019 09:12:49 +0000
      Finished:   

Expose the flask-app in the kubernetes cluster. This will open a public endpoint.

In [24]:
!kubectl expose deployment flask-app --type="LoadBalancer"

service/flask-app exposed


Run `!watch kubectl get services` and wait until the external ip goes from pending to being realized. It can take some time.

NOTE: If the following command is run without the external ip being realized, an error will be thrown. 

In [25]:
external_ip = !kubectl get services -o=jsonpath={.items[*].status.loadBalancer.ingress[0].ip}
external_ip = external_ip[0]

IndexError: list index out of range

Since we'll use the `external_ip` later on, save it to the dot-env file.

In [None]:
set_key(env_path, "AKS_EXTERNAL_IP", external_ip)

### Test that the deployment works end-to-end

Set the name of the new test video.

In [None]:
new_video_name = "aks_test_orangutan.mp4"

Make a copy the old `orangutan.mp4` video but named with the `<new_video_name>`. 

In [None]:
!cp data/orangutan.mp4 data/{new_video_name}

Use `curl` to hit the endpoint of the kubernetes cluster we just deployed.

In [None]:
!curl {external_ip}":8080/process?video_name="{new_video_name}

Inspect your kubernetes cluster to see that the process is running. You can use the commands below to do so. Alternatively, you can also inspect the blob storage container to see that the images are being created.

When the video completes, you can play the video file directly from your mounted blob container:

In [None]:
%%HTML
<video width="320" height="240" controls>
  <source src="data/aks_test_orangutan/aks_test_orangutan_processed.mp4" type="video/mp4">
</video>

### Basic Kubectl usage
You can use kubectl to perform basic monitoring. Use the following commands:
```bash
# monitor pods
!kubectl get pods

# print logs from a pod (<pod-name> can be found when calling 'get pods')
!kubectl logs <pod-name>

# check all services running on the cluster
!kubectl get services

# delete a service
!kubectl delete services <service-name>

# delete a deployment
!kubectl delete -f scoring_app_deployment.json
!kubectl delete -f flask_app_deployment.json
```

### Monitor in kubernetes dashboard
You can use the Kubernetes dashboard to monitor the cluster using the following commands:

```bash
# use the kube_dashboard_access.yaml to create a deployment
!kubectl create -f kube_dashboard_access.yaml

# use this command to browse
!az aks browse -n {get_key(env_path, "AKS_CLUSTER")} -g {get_key(env_path, "RESOURCE_GROUP")}
```

If you're not able to access the dashboard, follow the instructions [here](https://blog.tekspace.io/kubernetes-dashboard-remote-access/).

### Additional commands for AKS

Scale your AKS cluster:

```bash 
!az aks scale \
    --name {get_key(env_path, "AKS_CLUSTER")} \
    --resource-group {get_key(env_path, "RESOURCE_GROUP")} \
    --node-count 10
```

Scale your deployment:
```bash
!kubectl scale deployment.apps/aks-app --replicas=10
```

---

Continue to the next [notebook](/notebooks/05_deploy_logic_app.ipynb).