# 6 - Deploy a Model with GPU Support

In this notebook we will deploy our existing model as a webservice with GPU support. Since GPU support is needed, we will be leveraging [Azure Kubernetes Service (AKS)](https://azure.microsoft.com/en-us/services/kubernetes-service/) instead of [Azure Container Instances (ACI)](https://azure.microsoft.com/en-us/services/container-instances/) which we leveraged previously.

## Important Note on Cost

This notebook will spin up a GPU cluster for AKS.  This cluster will incur charges from the moment you start it.  Because of this, **make sure you delete the service and cluster**.  The steps to do that are at the end of this notebook.

## Preparation

We first need to import some modules before we can launch our GPU compute capabilities:

In [None]:
from azureml.core import Workspace
from azureml.exceptions imp
ort ComputeTargetException
from azureml.core.compute import ComputeTarget, AksCompute

Next, we need to get a reference to our workspace:

In [None]:
# Connect to the workspace
ws = Workspace.from_config()
print("Azure ML Workspace")
print(f'Name: {ws.name}')
print(f'Location: {ws.location}')
print(f'Resource Group: {ws.resource_group}')

## GPU Compute Cluster

The next step will be a new compute cluster with GPU capabilities targeted for use with Azure Kubernetes Service (AKS).  **In most cases, this step will take 10-25 minutes to complete**.

### Quotas

You may receive an error related to quotas when trying to launch this cluster.  If this happens, you can follow these steps to request an increase:

1. From the Azure Portal, navigate to your subscription.
1. Select the `Usage + Quotas` option from the menu.
1. Select the `Request Increase` button.
1. In the form, fill in the fields with your information. Select `Compute-VM (cores-vCPUs) subscription limit increases` for the `Quota Type`. Select `Next`.
1. In the next view, fill out the information and then press the `Provide Details` option at the top of the form.
1. In this modal, fill out the details to match the following image (edit specific for your location and use case).
1. Submit your request.

<img src="quota.png" alt="Request Quota Increase" width="600"/>

In [None]:
# Create a name for our new cluster
gpu_cluster_name = 'gpu-cluster'

# Verify that cluster does not exist already
try:
    gpu_cluster = AksCompute(workspace=ws, 
                             name=gpu_cluster_name)
    print('Cluster already exists.')
except ComputeTargetException:
    compute_config = AksCompute.provisioning_configuration(vm_size="Standard_NC6_Promo")
    gpu_cluster = AksCompute.create(ws, 
                                    name=gpu_cluster_name,
                                    provisioning_configuration=compute_config)

gpu_cluster.wait_for_completion(show_output=True)

## Scoring File

Next, we need to create the scoring file that will be leveraged by the webservice.  Just as before, we have to implement both the `init` and the `run` methods:

In [None]:
%%writefile score_gpu.py
import json
import numpy as np
import os
from keras.models import load_model

# Initialize the model
def init():
    global model
    model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'mnist.h5')
    model = load_model(model_path)

# Run inference against data that is passed in
def run(raw_data):
    data = np.array(json.loads(raw_data)['data'])
    # make prediction
    results = model.predict(data)
    output = []
    for result in results:
        output.append(construct_output(result))
    return output

# Utility function to construct output data per item passed in
def construct_output(result):
    result_index = np.argmax(result)
    result_value = result[result_index]
    output = { 'value': str(result_index) }
    output['certainty'] = result[result_index].item()
    possibilities = {}
    for i, val in enumerate(result): 
        possibilities[i] = val.item() 
    output['possibilities'] = possibilities    
    return output

## Environment

Next, we need to configure the Environment for our webservice.  In this case, we want to be sure to include the GPU version of TensorFlow by adding the conda package `tensorflow-gpu`.

In [None]:
from azureml.core.conda_dependencies import CondaDependencies 

myenv = CondaDependencies()
myenv.add_conda_package("tensorflow-gpu")
myenv.add_conda_package("keras")

with open("gpuenv.yml","w") as f:
    f.write(myenv.serialize_to_string())
    
# Review environment file
with open("gpuenv.yml","r") as f:
    print(f.read())

## Deployment & Inference Configuration

Next, we need to create our deployment configuration:

In [None]:
from azureml.core.webservice import AksWebservice

gpu_aks_config = AksWebservice.deploy_configuration(autoscale_enabled=False,
                                                    num_replicas=3,
                                                    cpu_cores=2,
                                                    memory_gb=4)

Then we can configure our `InferenceConfig` for the webservice:

In [None]:
from azureml.core.model import InferenceConfig

inference_config = InferenceConfig(runtime="python",
                                   entry_script="score_gpu.py",
                                   conda_file="gpuenv.yml",
                                   enable_gpu=True)

## Deploy our Model

The final step in the process is to deploy our model using the configuration that we have put in place:

In [None]:
from azureml.core.model import Model

aks_service_name = 'keras-mnist-gpu-svc'
# Get our model
model = Model(ws, "keras-mnist")
# Deploy our model
aks_service = Model.deploy(ws,
                           models=[model],
                           inference_config=inference_config,
                           deployment_config=gpu_aks_config,
                           deployment_target=gpu_cluster,
                           name=aks_service_name)

aks_service.wait_for_deployment(show_output=True)
print(aks_service.state)

Next, we can get the URL for the webservice endpoint:

In [None]:
print(f'Scoring URL: {aks_service.scoring_uri}')

## Validate our Deployment

Next, we want to validate our deployment using both the SDK and HTTP calls.

### Load Data

First, we will need to download our data locally so that we can submit it to the service:

In [None]:
%matplotlib inline
import numpy as np
import os
import matplotlib.pyplot as plt

# DATA FOLDER
# Make sure we have the data folder created locally
data_folder = os.path.join(os.getcwd(), 'data')
os.makedirs(data_folder, exist_ok=True)

# LOAD DATA
training_images = load_data(os.path.join(data_folder, "train-images-idx3-ubyte.gz"), False) / 255.0
training_images = np.reshape(training_images, (-1, 28,28)).astype('float32')
test_images = load_data(os.path.join(data_folder, "t10k-images-idx3-ubyte.gz"), False) / 255.0
test_images = np.reshape(test_images, (-1, 28,28)).astype('float32')

training_labels = load_data(os.path.join(data_folder, "train-labels-idx1-ubyte.gz"), True).reshape(-1)
test_labels = load_data(os.path.join(data_folder, "t10k-labels-idx1-ubyte.gz"), True).reshape(-1)

print(f'Training Image: {training_images.shape}')
print(f'Training Labels: {training_labels.shape}')
print(f'Test Images: {test_images.shape}')
print(f'Test Labels: {test_labels.shape}')

### Test with SDK

Next,we can validate our service using the SDK:

In [None]:
import json

# Get a sample index
sample_indices = np.random.permutation(test_images.shape[0])[0:1]

# Structure input data
test_samples = json.dumps({"data": test_images[sample_indices].tolist()})
print("JSON Input: " + test_samples)
test_samples = bytes(test_samples, encoding='utf8')

# Execute the predictions
results = aks_service.run(input_data=test_samples)

# Utility function to display the result
def display_result(image, result):
    fig = plt.figure(figsize=(12, 5))
    grid = plt.GridSpec(1, 3, wspace=0.4, hspace=0.3)
    plt.subplot(grid[0, 0])
    plt.imshow(image, cmap='gray_r')
    print("\n\n")
    print(f'Predicted Value: {result["value"]}')
    print(f'Certainty: {str(result["certainty"])}')
    print(f'Raw Result: {str(result)}')
 
# Show the results
for i, val in enumerate(results):
    display_result(test_images[sample_indices[i]], val)

### Validate over HTTP

Finally, we can validate our deployment using an HTTP call.  Our first step in this process is to get the API key for the AKS service and add that to the headers for our call:

In [None]:
# Get API Key and Construct Headers for call
api_key = aks_service.get_keys()[0]
headers = {'Content-Type': 'application/json',
           'Authorization': ('Bearer ' + api_key)}

With that in place, we can now perform a `POST` request against our inference endpoint:

In [None]:
import requests

# Get a sample index
sample_indices = np.random.permutation(test_images.shape[0])[0:1]
    
# Structure input data
test_samples = json.dumps({"data": test_images[sample_indices].tolist()})
print("JSON Input: " + test_samples)
test_samples = bytes(test_samples, encoding='utf8')

# Perform a POST request
resp = requests.post(aks_service.scoring_uri, test_samples, headers=headers)

print("POST to url", aks_service.scoring_uri)

# Read the returned JSON data
results = json.loads(resp.text)

# Show the results
for i, val in enumerate(results):
    display_result(test_images[sample_indices[i]], val)

## Delete Service

Given the cost of running this cluster, you will want to delete both the service and the cluster:

In [None]:
aks_service.delete()
gpu_cluster.delete()