# Develop Model Driver

In this notebook, we will develop the API that will call our model. This module initializes the model, transforms the input so that it is in the appropriate format and defines the scoring method that will produce the predictions. The API will expect the input to be in JSON format. Once  a request is received, the API will convert the json encoded request body into the image format. There are two main functions in the API. The first function loads the model and returns a scoring function. The second function process the images and uses the first function to score them.

In [16]:
import logging
from testing_utilities import img_url_to_json
from pprint import pprint
from azureml.core.webservice import Webservice, AksWebservice
from azureml.core import Workspace

In [17]:
logging.basicConfig(level=logging.DEBUG) # We are setting logging to debug so we can see log messages from our driver
# Coincidently setting the log level to DEBUG allows us to see what the azureml libraries are doing, 
# which can be a handy way to diagnose issues

We use the writefile magic to write the contents of the below cell to driver.py which includes the driver methods. It is important that the file have two methods ```init``` and ```run```. These two functions define the contract with the Flask web application. Have a look here at another example
https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-deploy-models-with-aml

In [18]:
%%writefile driver.py 
import base64
import json
import logging
import os
import timeit as t
from io import BytesIO

import PIL
import numpy as np
import torch
import torch.nn as nn
import torchvision
from PIL import Image
from torchvision import models, transforms
import sys
from azureml.core.model import Model
from glob import glob

logging.basicConfig(level=logging.DEBUG, stream=sys.stdout) # TODO: remove


_LABEL_FILE = os.getenv("LABEL_FILE", "synset.txt")
_MODEL_NAME = os.getenv("MODEL_NAME", "pytorch_resnet152")
_NUMBER_RESULTS = 3


def _create_label_lookup(label_path):
    with open(label_path, "r") as f:
        label_list = [l.rstrip() for l in f]

    def _label_lookup(*label_locks):
        return [label_list[l] for l in label_locks]

    return _label_lookup


def _load_model():
    logger = logging.getLogger("model_driver")
    # Load the model
    model_path = Model.get_model_path(_MODEL_NAME)
    logger.debug('Loading {model_path}'.format(model_path=model_path))
    
    # ResNet 152
    model = models.ResNet(models.resnet.Bottleneck, [3, 8, 36, 3])
    model.load_state_dict(torch.load(model_path))
    
    model = model.cuda()
    softmax = nn.Softmax(dim=1).cuda()
    model = model.eval()

    preprocess_input = transforms.Compose(
        [
            torchvision.transforms.Resize((224, 224), interpolation=PIL.Image.BICUBIC),
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
        ]
    )

    def predict_for(image):
        image = preprocess_input(image)
        with torch.no_grad():
            image = image.unsqueeze(0)
            image_gpu = image.type(torch.float).cuda()
            outputs = model(image_gpu)
            pred_proba = softmax(outputs)
        return pred_proba.cpu().numpy().squeeze()

    return predict_for


def _base64img_to_pil_image(base64_img_string):
    if base64_img_string.startswith("b'"):
        base64_img_string = base64_img_string[2:-1]
    base64Img = base64_img_string.encode("utf-8")

    # Preprocess the input data
    decoded_img = base64.b64decode(base64Img)
    img_buffer = BytesIO(decoded_img)

    # Load image with PIL (RGB)
    pil_img = Image.open(img_buffer).convert("RGB")
    return pil_img


def create_scoring_func(label_path=_LABEL_FILE):
    logger = logging.getLogger("model_driver")

    start = t.default_timer()
    labels_for = _create_label_lookup(label_path)
    predict_for = _load_model()
    end = t.default_timer()

    loadTimeMsg = "Model loading time: {0} ms".format(round((end - start) * 1000, 2))
    logger.info(loadTimeMsg)

    def _call_model(image, number_results=_NUMBER_RESULTS):
        pred_proba = predict_for(image).squeeze()
        selected_results = np.flip(np.argsort(pred_proba), 0)[:number_results]
        labels = labels_for(*selected_results)
        return list(zip(labels, pred_proba[selected_results].astype(np.float64)))

    return _call_model


def get_model_api():
    logger = logging.getLogger("model_driver")
    scoring_func = create_scoring_func()

    def _process_and_score(images_dict, number_results=_NUMBER_RESULTS):
        start = t.default_timer()

        results = {}
        for key, base64_img_string in images_dict.items():
            rgb_image = _base64img_to_pil_image(base64_img_string)
            results[key] = scoring_func(rgb_image, number_results=number_results)

        end = t.default_timer()

        logger.debug("Predictions: {0}".format(results))
        logger.info("Predictions took {0} ms".format(round((end - start) * 1000, 2)))
        return (results, "Computed in {0} ms".format(round((end - start) * 1000, 2)))

    return _process_and_score

def version():
    return torch.__version__

def init():
    """ Initialise the model and scoring function
    """
    global process_and_score
    process_and_score = get_model_api()

def run(raw_data):
    """ Make a prediction based on the data passed in using the preloaded model
    """
    return process_and_score(json.loads(raw_data)['input'])

Overwriting driver.py


Let's test the module.

We run the file driver.py which will bring everything into the context of the notebook.

In [19]:
%run driver.py

In [20]:
ws = Workspace.from_config()
ws.get_details()

DEBUG:azureml.core.authentication:AzureCliAuthentication acquired lock in 9.5367431640625e-07 s.
DEBUG:cli.azure.cli.core:Current cloud config:
AzureCloud
DEBUG:adal-python:b83addd9-fcf9-449e-b19f-4c06e208dd2b - Authority:Performing instance discovery: ...
DEBUG:adal-python:b83addd9-fcf9-449e-b19f-4c06e208dd2b - Authority:Performing static instance discovery
DEBUG:adal-python:b83addd9-fcf9-449e-b19f-4c06e208dd2b - Authority:Authority validated via static instance discovery
INFO:adal-python:b83addd9-fcf9-449e-b19f-4c06e208dd2b - TokenRequest:Getting token from cache with refresh if necessary.
DEBUG:adal-python:b83addd9-fcf9-449e-b19f-4c06e208dd2b - CacheDriver:finding with query keys: {'_clientId': '...', 'userId': '...'}
DEBUG:adal-python:b83addd9-fcf9-449e-b19f-4c06e208dd2b - CacheDriver:Looking for potential cache entries: {'_clientId': '...', 'userId': '...'}
DEBUG:adal-python:b83addd9-fcf9-449e-b19f-4c06e208dd2b - CacheDriver:Found 4 potential entries.
DEBUG:adal-python:b83addd9-fc

Found the config file in: /home/mat/repos/AKSDeploymentTutorial_AML/Pytorch/aml_config/config.json


DEBUG:urllib3.connectionpool:https://login.microsoftonline.com:443 "POST /72f988bf-86f1-41af-91ab-2d7cd011db47/oauth2/token HTTP/1.1" 200 3220
INFO:adal-python:b83addd9-fcf9-449e-b19f-4c06e208dd2b - OAuth2Client:Get Token Server returned this correlation_id: b83addd9-fcf9-449e-b19f-4c06e208dd2b
DEBUG:adal-python:b83addd9-fcf9-449e-b19f-4c06e208dd2b - CacheDriver:Created new cache entry from refresh response.
DEBUG:adal-python:b83addd9-fcf9-449e-b19f-4c06e208dd2b - CacheDriver:Removing entry.
DEBUG:adal-python:b83addd9-fcf9-449e-b19f-4c06e208dd2b - CacheDriver:Adding entry AccessTokenId: b'+QYziA35GkK014g6IrsR0J7n+jwivS2OWUV3HS+ls+c=', RefreshTokenId: b'vpGnkn1WetKuY4sIyctLSF3X9NgpAqUPUbHgYUxt5v8='
DEBUG:adal-python:b83addd9-fcf9-449e-b19f-4c06e208dd2b - CacheDriver:Updating 3 cached refresh tokens
DEBUG:adal-python:b83addd9-fcf9-449e-b19f-4c06e208dd2b - CacheDriver:Remove many: 3
DEBUG:adal-python:b83addd9-fcf9-449e-b19f-4c06e208dd2b - CacheDriver:Add many: 3
INFO:adal-python:b83addd9-

{'id': '/subscriptions/edf507a2-6235-46c5-b560-fd463ba2e771/resourceGroups/amlakspytorchrg/providers/Microsoft.MachineLearningServices/workspaces/pytorch_workspace',
 'name': 'pytorch_workspace',
 'location': 'eastus2',
 'type': 'Microsoft.MachineLearningServices/workspaces',
 'description': '',
 'friendlyName': 'pytorch_workspace',
 'containerRegistry': '/subscriptions/edf507a2-6235-46c5-b560-fd463ba2e771/resourcegroups/amlakspytorchrg/providers/microsoft.containerregistry/registries/pytorchwacrrmwuvfch',
 'keyVault': '/subscriptions/edf507a2-6235-46c5-b560-fd463ba2e771/resourcegroups/amlakspytorchrg/providers/microsoft.keyvault/vaults/pytorchwkeyvaultgzktbsnx',
 'applicationInsights': '/subscriptions/edf507a2-6235-46c5-b560-fd463ba2e771/resourcegroups/amlakspytorchrg/providers/microsoft.insights/components/pytorchwinsightslwbyjuau',
 'batchaiWorkspace': '/subscriptions/edf507a2-6235-46c5-b560-fd463ba2e771/resourcegroups/amlakspytorchrg/providers/microsoft.batchai/workspaces/pytorchwb

When we later package up our driver and model into a Docker container the model will be downloaded to a specific location in the container. In order to simulate that here we have to download the model locally to the expected location. This is done with the command below.

In [29]:
model_path = Model.get_model_path(_MODEL_NAME, _workspace=ws)

DEBUG:azureml.core.model:RunEnvironmentException: Failed to load a submitted run, if outside of an execution context, use project.start_run to initialize an azureml.core.Run.
DEBUG:azureml.core.authentication:InteractiveLoginAuthentication acquired lock in 1.1920928955078125e-06 s.
DEBUG:azureml.core.authentication:AzureCliAuthentication acquired lock in 9.5367431640625e-07 s.
DEBUG:cli.azure.cli.core:Current cloud config:
AzureCloud
DEBUG:adal-python:4b8e7336-e762-488b-8acc-ff5974ad3964 - Authority:Performing instance discovery: ...
DEBUG:adal-python:4b8e7336-e762-488b-8acc-ff5974ad3964 - Authority:Performing static instance discovery
DEBUG:adal-python:4b8e7336-e762-488b-8acc-ff5974ad3964 - Authority:Authority validated via static instance discovery
INFO:adal-python:4b8e7336-e762-488b-8acc-ff5974ad3964 - TokenRequest:Getting token from cache with refresh if necessary.
DEBUG:adal-python:4b8e7336-e762-488b-8acc-ff5974ad3964 - CacheDriver:finding with query keys: {'_clientId': '...', 'us

DEBUG:azureml.core.model:Artifact has prefix id LocalUpload/181105T230039-32e663a7/pytorch_resnet152.tar.gz
DEBUG:azureml.ArtifactsClient:Fetching files for prefix in LocalUpload, 181105T230039-32e663a7, pytorch_resnet152.tar.gz
DEBUG:azureml._restclient.clientbase.WorkerPool.CreateFutureFunc: _execute_with_base_arguments:[START]
DEBUG:msrest.service_client:Accept header absent and forced to application/json
DEBUG:azureml._restclient.clientbase.WorkerPool.CreateFutureFunc: _execute_with_base_arguments:[STOP]
DEBUG:msrest.universal_http.requests:Configuring retry: max_retries=3, backoff_factor=0.8, max_backoff=90
DEBUG:azureml.ArtifactsClient.list_sas_by_prefix:Using basic handler - no exception handling
DEBUG:azureml.core.authentication:InteractiveLoginAuthentication acquired lock in 9.5367431640625e-07 s.
DEBUG:azureml.ArtifactsClient.list_sas_by_prefix.WaitingTask:[START]
DEBUG:azureml.core.authentication:AzureCliAuthentication acquired lock in 4.76837158203125e-07 s.
DEBUG:cli.azure

You will notice we have a similar command in our driver script except for the ```_workspace``` argument. We don't need the workspace argument in the driver since the model will be placed in the correct location when the Docker image is created.

We will use the same Lynx image we used ealier to check that our driver works as expected.

In [30]:
IMAGEURL = "https://upload.wikimedia.org/wikipedia/commons/thumb/6/68/Lynx_lynx_poing.jpg/220px-Lynx_lynx_poing.jpg"

First we initialise the model. Don't worry about the message below about _Failed to load submitted run_ this is simply because we are testing things outside of an AML context which we will be doing once we put everything in a container.

In [31]:
init()

DEBUG:azureml.core.model:RunEnvironmentException: Failed to load a submitted run, if outside of an execution context, use project.start_run to initialize an azureml.core.Run.
DEBUG:azureml.core.model:version is None. Latest version is 2
DEBUG:azureml.core.model:Found model path at azureml-models/pytorch_resnet152/2/resnet152-b121ed2d.pth
DEBUG:model_driver:Loading azureml-models/pytorch_resnet152/2/resnet152-b121ed2d.pth
INFO:model_driver:Model loading time: 1721.96 ms


In [32]:
predict_for = get_model_api()

DEBUG:azureml.core.model:RunEnvironmentException: Failed to load a submitted run, if outside of an execution context, use project.start_run to initialize an azureml.core.Run.
DEBUG:azureml.core.model:version is None. Latest version is 2
DEBUG:azureml.core.model:Found model path at azureml-models/pytorch_resnet152/2/resnet152-b121ed2d.pth
DEBUG:model_driver:Loading azureml-models/pytorch_resnet152/2/resnet152-b121ed2d.pth
INFO:model_driver:Model loading time: 1767.0 ms


In [33]:
jsonimg = img_url_to_json(IMAGEURL)

In [34]:
resp = run(jsonimg)

DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
DEBUG:PIL.PngImagePlugin:STREAM b'iCCP' 41 292
DEBUG:PIL.PngImagePlugin:iCCP profile name b'ICC Profile'
DEBUG:PIL.PngImagePlugin:Compression method 0
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 345 65536
DEBUG:model_driver:Predictions: {'image': [('n02127052 lynx, catamount', 0.9965722560882568), ('n02128757 snow leopard, ounce, Panthera uncia', 0.0013256857637315989), ('n02128385 leopard, Panthera pardus', 0.0009192737634293735)]}
INFO:model_driver:Predictions took 82.41 ms


In [35]:
pprint(resp[0])

{'image': [('n02127052 lynx, catamount', 0.9965722560882568),
           ('n02128757 snow leopard, ounce, Panthera uncia',
            0.0013256857637315989),
           ('n02128385 leopard, Panthera pardus', 0.0009192737634293735)]}


Next, we can move on to [building our docker image](02_BuildImage.ipynb).