# Install Kubeflow on Azure Kubernetes Services (AKS)

In this notebook, we will first provision an AKS cluster and install Kubeflow tensorflow serving component.

In [75]:
from dotenv import set_key, get_key, find_dotenv
from pathlib import Path

## Setup

Let's first define the names and configurations of the resources that will be provisioned on Azure.

In [43]:
subscription_id = 'your-subscription-id'
resource_group = 'your-resource-group' #  i.e.'kuberg'
location = 'your-cluster-region' # i.e. 'eastus'
agent_size = 'your-agent-size' # i.e. 'Standard_NC6'
aks_name = 'your-aks-name' # i.e. 'kubeaks'
agent_count = 1 # agent count is the number of VMs that will be provisioned in the cluster, you can pick any number.
storage_account = 'your_storage_account' # i.e. 'kubest'
storage_container = 'your_storage_container' # i.e. 'blobfuse'

Create and initialize a dotenv file for storing parameters used in multiple notebooks.

In [76]:
env_path = find_dotenv()
if env_path == "":
    Path(".env").touch()
    env_path = find_dotenv()

In [77]:
set_key(env_path, 'subscription_id', subscription_id) 
set_key(env_path, 'resource_group', resource_group)
set_key(env_path, 'storage_account', storage_account)
set_key(env_path, 'storage_container', storage_container)

(True, 'storage_container', 'blobfuse')

## Create resource group and AKS cluster

In [3]:
!az account set -s {subscription_id}

[0m

In [71]:
!az group create --name {resource_group} --location {location}

In [53]:
!az aks create --node-vm-size {agent_size} --resource-group {resource_group} --name {aks_name} --node-count {agent_count} --kubernetes-version 1.11.6  --generate-ssh-keys --query 'provisioningState'

Install kubectl to connect to the Kubernetes cluster.

In [24]:
!sudo az aks install-cli

[33mDownloading client to "/usr/local/bin/kubectl" from "https://storage.googleapis.com/kubernetes-release/release/v1.13.3/bin/linux/amd64/kubectl"[0m
[33mPlease ensure that /usr/local/bin is in your search PATH, so the `kubectl` command can be found.[0m


Now, let's connect to AKS cluster and get the nodes.

In [25]:
!az aks get-credentials --resource-group {resource_group} --name {aks_name}

Merged "fboylukubeaks" as current context in /home/fboylu/.kube/config
[0m

In [4]:
!kubectl get nodes

NAME                       STATUS   ROLES   AGE   VERSION
aks-nodepool1-38912874-0   Ready    agent   3h    v1.11.6


Let's check the first node.

In [41]:
node_names = !kubectl get nodes -o name
!kubectl describe node {node_names[0].strip('node/')}

Deploy the following deamonset to enable GPU support in Kubernetes.

In [41]:
!kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.11/nvidia-device-plugin.yml

daemonset.extensions/nvidia-device-plugin-daemonset created


In [6]:
!kubectl get pods --all-namespaces

NAMESPACE     NAME                                    READY   STATUS    RESTARTS   AGE
kube-system   heapster-5d6f9b846c-49cx7               2/2     Running   0          3h
kube-system   kube-dns-autoscaler-746998ccf6-zfr9d    1/1     Running   0          3h
kube-system   kube-dns-v20-7c7d7d4c66-4pj7b           4/4     Running   0          3h
kube-system   kube-dns-v20-7c7d7d4c66-npvps           4/4     Running   0          3h
kube-system   kube-proxy-dq2g7                        1/1     Running   0          3h
kube-system   kube-svc-redirect-mbn57                 2/2     Running   0          3h
kube-system   kubernetes-dashboard-67bdc65878-62vr5   1/1     Running   1          3h
kube-system   metrics-server-5cbc77f79f-6fqzv         1/1     Running   1          3h
kube-system   nvidia-device-plugin-daemonset-6z2p2    1/1     Running   0          2h
kube-system   tunnelfront-9d6ff8797-7frcr             1/1     Running   0          3h


## Attach blobfuse on AKS

We will use [blobfuse](https://docs.microsoft.com/en-us/azure/storage/blobs/storage-how-to-mount-container-linux) using [blobfuse volume driver for Kubernetes](https://github.com/Azure/kubernetes-volume-drivers/tree/master/flexvolume/blobfuse) to store the model servables for Kubeflow tensorflow serving component to serve the model from. The driver requires that a storage account and a container created in the same region with the kubernetes cluster. 

### Create storage account and copy model servable

Let's first create that storage account.

In [54]:
!az storage account create -n {storage_account} -g {resource_group} --query 'provisioningState'

Let's get the first storage acount key.

In [51]:
key = !az storage account keys list --account-name {storage_account} -g {resource_group} --query '[0].value'
storage_account_key = str(key[0][1:-1]) # clean up key

Create the container to be used by blobfuse driver.

In [52]:
!az storage container create --account-name {storage_account} --account-key {storage_account_key} --name {storage_container}

{
  "created": true
}
[0m

Now, we can upload the model servables to the container. This step requires that you have azcopy installed.

In [65]:
destination = 'https://{}.blob.core.windows.net/{}'.format(storage_account, storage_container)

'https://fboylukubest.blob.core.windows.net/blobfuse'

In [67]:
!azcopy --source models --destination {destination} --dest-key {storage_account_key} --recursive

[?1h=[6nFinished: 0 file(s), 0 B; Average Speed:0 B/s.                                 [6n[1;1H[6nFinished: 0 file(s), 0 B; Average Speed:0 B/s.                                 [6n[1;1H[6nFinished: 0 file(s), 0 B; Average Speed:0 B/s.                                 [6n[1;1H[6nFinished: 0 file(s), 0 B; Average Speed:0 B/s.                                 [6n[1;1H[6nFinished: 0 file(s), 51.148 KB; Average Speed:5.15 KB/s.                       [6n[1;1H[6nFinished: 0 file(s), 188.201 MB; Average Speed:15.72 MB/s.                     [6n[1;1H[6nFinished: 2 file(s), 188.201 MB; Average Speed:13.42 MB/s.                     [6n[1;1H[6nFinished: 3 file(s), 188.201 MB; Average Speed:11.71 MB/s.                     [6n[1;1H[6n                                                                               [6n[1;1HFinished 3 of total 3 file(s).
[6n                                                                               [6n[1;1H[2019/02/08 16:49:42] Transfer s

### Install blobfuse driver on AKS

We will deploy the following deamonset to enable blobfuse on every node of the cluster.

In [68]:
! kubectl create -f https://raw.githubusercontent.com/Azure/kubernetes-volume-drivers/master/flexvolume/blobfuse/deployment/blobfuse-flexvol-installer-1.9.yaml

namespace/flex created
daemonset.apps/blobfuse-flexvol-installer created


In [69]:
!kubectl get pods --all-namespaces

NAMESPACE     NAME                                    READY   STATUS    RESTARTS   AGE
flex          blobfuse-flexvol-installer-bgqq2        1/1     Running   0          49s
kube-system   heapster-5d6f9b846c-49cx7               2/2     Running   0          1d
kube-system   kube-dns-autoscaler-746998ccf6-zfr9d    1/1     Running   0          1d
kube-system   kube-dns-v20-7c7d7d4c66-4pj7b           4/4     Running   0          1d
kube-system   kube-dns-v20-7c7d7d4c66-npvps           4/4     Running   0          1d
kube-system   kube-proxy-dq2g7                        1/1     Running   0          1d
kube-system   kube-svc-redirect-mbn57                 2/2     Running   0          1d
kube-system   kubernetes-dashboard-67bdc65878-62vr5   1/1     Running   1          1d
kube-system   metrics-server-5cbc77f79f-6fqzv         1/1     Running   1          1d
kube-system   nvidia-device-plugin-daemonset-6z2p2    1/1     Running   0          1d
kube-system   tunnelfront-9d6ff8797-7frcr

In [70]:
!kubectl describe daemonset blobfuse-flexvol-installer --namespace=flex

Name:           blobfuse-flexvol-installer
Selector:       name=blobfuse
Node-Selector:  beta.kubernetes.io/os=linux
Labels:         k8s-app=blobfuse
Annotations:    deprecated.daemonset.template.generation: 1
Desired Number of Nodes Scheduled: 1
Current Number of Nodes Scheduled: 1
Number of Nodes Scheduled with Up-to-date Pods: 1
Number of Nodes Scheduled with Available Pods: 1
Number of Nodes Misscheduled: 0
Pods Status:  1 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:  name=blobfuse
  Containers:
   blobfuse-flexvol-installer:
    Image:        mcr.microsoft.com/k8s/flexvolume/blobfuse-flexvolume
    Port:         <none>
    Host Port:    <none>
    Environment:  <none>
    Mounts:
      /etc/kubernetes/volumeplugins/ from volplugins (rw)
      /var/log/ from varlog (rw)
  Volumes:
   varlog:
    Type:          HostPath (bare host directory volume)
    Path:          /var/log/
    HostPathType:  
   volplugins:
    Type:          H

Now, we can move on to [installing Kubeflow and serving the model on AKS with kubeflow tensorflow serving component](03_ServeWithKubeflow.ipynb).