# Quickstart - MLOps for Cognitive Services
Using Custom Vision trained model as an example. This template will implement step **(1)~(5)** in the architecture below

Author:     Jixin Jia (Gin)   
Version:    1.0     
Date:       2022.03.15

![image-alt-text](mlops-custom-vision.png)

## Check Prerequisites
Upgrade Azure ML Python SDK to >1.39 if not already installed

In [1]:
try:
    from azureml.core import Workspace, Dataset, Datastore, Environment, Experiment, Run, Model, ScriptRunConfig, VERSION
    from azureml.core.webservice import AciWebservice
    from azureml.core.conda_dependencies import CondaDependencies
    print('Current Azure ML Python SDK version:', VERSION)

    if float('.'.join(VERSION.split('.')[:2])) < 1.39:
        print('Upgrading SDK to 1.39...Restart kernel once done')
        %pip install --upgrade azureml-core
    else:
        print('Done!')

except Exception as e:
    print(e.args)

Current Azure ML Python SDK version: 1.39.0
Done!


## Runtime parameters
Configure appropriate value as per necessary

In [2]:
experiment_name = 'mlops-custom-vision'
datastore_name = 'jixjiastoragegbb_public'
dataset_name = 'mlops-custom-vision'
comute_cluster_name = 'prod-ds4v2-x4'
model_name = 'custom_vision_model'
model_description = 'Custom Vision model for detecting vehicle license plate'
inference_env_name = 'tf2.3-inference-opencv4.5'

# This two should come from notification queue service when Cognitive Service 
# (i.e. our Custom Vision trained model) has finished training model and exported it to the designated Azure Blob Storage
# for simplicity purpose, I'm omitting this notification mechanism from this template
# and assume we have received the latest model artifacts in the following blob path
model_path = 'models/saved_model.pb'
label_path = 'models/labels.txt'

## Initiate a Workspace and Experiment
Centrally track model, environment, webservice container image as well as the Auto-Test validation result 

In [3]:
import azureml.core
from azureml.core import Workspace, Experiment

# sepcficy workspace using current active config
ws = Workspace.from_config()

# setup Experiment for tracking
exp = Experiment(workspace=ws, name=experiment_name)
exp

Name,Workspace,Report Page,Docs Page
mlops-custom-vision,ws-jixjia-azureml,Link to Azure Machine Learning studio,Link to Documentation


## Secure Access to Model using Datastore & Dataset
Retrieve test dataset (images) and trained model (Custom Vision) via registered Datasets for two major reasons:

1. Secure access by abstracting away connection strings and credential   
2. Version control and dataset tracking   

**NOTE**   
Please set up Datastore and Dataset first with appropriate level of SAS 

In [43]:
from azureml.core import Datastore
from azureml.core import Dataset

# get a named datastore from the current workspace
datastore = Datastore.get(ws, datastore_name=datastore_name)

# list all registered datastores in current workspace
if not datastore_name:
    for name, datastore in ws.datastores.items():
        print(name, datastore.datastore_type)


# get the dataset with specified version
dataset = Dataset.get_by_name(workspace=ws, name=dataset_name)

print('Datastore:\n', datastore, '\n')
print('Dataset:\n', dataset)

# retrieve latest trained model, label and test dataset (images) to compute instance 
dataset.download(target_path='.', overwrite=True)

Datastore:
 {
  "name": "jixjiastoragegbb_public",
  "container_name": "public",
  "account_name": "jixjiastoragegbb",
  "protocol": "https",
  "endpoint": "core.windows.net"
} 

Dataset:
 FileDataset
{
  "source": [
    "('jixjiastoragegbb_public', 'mlops-custom-vision/')"
  ],
  "definition": [
    "GetDatastoreFiles"
  ],
  "registration": {
    "id": "cd253466-de24-4583-b1c4-8dbc018adaf9",
    "name": "mlops-custom-vision",
    "version": 1,
    "description": "Vehicle license plate detection model and test dataset on jixjiastoragegbb public container",
    "workspace": "Workspace.create(name='ws-jixjia-azureml', subscription_id='d7d72c6d-f9bf-48e3-b11e-6b6c9196e6bc', resource_group='rg-azureml')"
  }
}


['/mnt/batch/tasks/shared/LS_root/mounts/clusters/dev-cpu-ds4v2/code/Users/jixinjia/mlops-custom-vision/dataset/224636.43.jpg',
 '/mnt/batch/tasks/shared/LS_root/mounts/clusters/dev-cpu-ds4v2/code/Users/jixinjia/mlops-custom-vision/dataset/224658.77.jpg',
 '/mnt/batch/tasks/shared/LS_root/mounts/clusters/dev-cpu-ds4v2/code/Users/jixinjia/mlops-custom-vision/dataset/224700.41.jpg',
 '/mnt/batch/tasks/shared/LS_root/mounts/clusters/dev-cpu-ds4v2/code/Users/jixinjia/mlops-custom-vision/dataset/224859.19.jpg',
 '/mnt/batch/tasks/shared/LS_root/mounts/clusters/dev-cpu-ds4v2/code/Users/jixinjia/mlops-custom-vision/dataset/IMG_0201.JPG',
 '/mnt/batch/tasks/shared/LS_root/mounts/clusters/dev-cpu-ds4v2/code/Users/jixinjia/mlops-custom-vision/dataset/IMG_0209.JPG',
 '/mnt/batch/tasks/shared/LS_root/mounts/clusters/dev-cpu-ds4v2/code/Users/jixinjia/mlops-custom-vision/dataset/IMG_0224.JPG',
 '/mnt/batch/tasks/shared/LS_root/mounts/clusters/dev-cpu-ds4v2/code/Users/jixinjia/mlops-custom-vision/dat

## Attach Compute Target
Compute Target is used to validate if current model performs better than previous one on defined dataset and metrics.    
Skip this part if not auto-test not required

In [4]:
from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget
import os

# check if specified compute cluster exists
compute_name = os.environ.get("AML_COMPUTE_CLUSTER_NAME", comute_cluster_name)

if compute_name in ws.compute_targets:
    compute_target = ws.compute_targets[compute_name]
    if compute_target and type(compute_target) is AmlCompute:
        print("Set compute target to the specified clsuter: " + compute_name)

else:
    print("Creating new compute target...")
    
    # define specification of the cluster to create
    compute_min_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MIN_NODES", 0)
    compute_max_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MAX_NODES", 4)
    vm_size = os.environ.get("AML_COMPUTE_CLUSTER_SKU", "STANDARD_NC6")
    provisioning_config = AmlCompute.provisioning_configuration(vm_size = vm_size,
                                                                min_nodes = compute_min_nodes, 
                                                                max_nodes = compute_max_nodes)

    # create the cluster
    compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)
    
    # can poll for a minimum number of nodes and for a specific timeout. 
    # if no min node count is provided it will use the scale settings for the cluster
    compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)
    
     # For a more detailed view of current AmlCompute status, use get_status()
    print(compute_target.get_status().serialize())

Set compute target to the specified clsuter: prod-ds4v2-x4


## Build Environment (CI)
In this CI process we will:    
1. Package runtime and dependencies into a container image    
2. Register it on Azure ML and push container image to ACR

This step will be used for Auto-Test and Deployment

### Conda YAML definition

In [44]:
%%writefile inference_conda.yml

dependencies:
- python=3.8.1
- pip:
  - azureml-dataset-runtime[pandas,fuse]
  - azureml-defaults
  - opencv-python-headless==4.5.4.60
  - tensorflow==2.3.1
  - numpy==1.18.5
  - pandas==1.1.4
  - imutils==0.5.3
  - Pillow==8.0.1
  - requests==2.24.0
  - matplotlib==3.2.2
  - urllib3==1.25.11  
  - requests==2.24.0

Writing inference_conda.yml


### Register the Environment for reuse so that we avoid re-building it every time 

In [24]:
from azureml.core import Environment

# register env for inference
inference_env = Environment.from_conda_specification(name=inference_env_name, file_path='./inference_conda.yml')

# register inference env
inference_env.register(workspace=ws)

{
    "databricks": {
        "eggLibraries": [],
        "jarLibraries": [],
        "mavenLibraries": [],
        "pypiLibraries": [],
        "rcranLibraries": []
    },
    "docker": {
        "arguments": [],
        "baseDockerfile": null,
        "baseImage": "mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:20220208.v1",
        "baseImageRegistry": {
            "address": null,
            "password": null,
            "registryIdentity": null,
            "username": null
        },
        "enabled": false,
        "platform": {
            "architecture": "amd64",
            "os": "Linux"
        },
        "sharedVolumes": true,
        "shmSize": null
    },
    "environmentVariables": {
        "EXAMPLE_ENV_VAR": "EXAMPLE_VALUE"
    },
    "inferencingStackVersion": null,
    "name": "tf2.3-inference-opencv4.5",
    "python": {
        "baseCondaEnvironment": null,
        "condaDependencies": {
            "dependencies": [
                "python=3.8.1",
          

## Auto-Test

### Prepare for test environment

In [41]:
import os
import shutil

# package all test artifacts into a `test` folder
script_folder = os.path.join(os.getcwd(), 'test')
os.makedirs(script_folder, exist_ok=True)

# move utils and dependent modules to script_folder
for src in('utils', 'models'):
    dst = os.path.join(script_folder, src)
    shutil.copytree(src, dst, dirs_exist_ok=True)

### Prepare for `test.py` and add `Azure ML Run Experiment`

In [127]:
%%writefile $script_folder/auto-test.py

'''
Author:     Jixin Jia (Gin)
Date:       2022.03.15
Version:    1.0
Purpose:    This script evaluates model performance by running it against a defined set of test images, 
            and track its performance in Run Experiments 
'''

import cv2
import os
import numpy as np
import base64
import json
from urllib.parse import urlparse
import argparse

# App utilities
from utils import config
from utils import vehicle_plate as vp
from utils import video_utilities as vu

# experiment tracking with azureml/mlflow
from azureml.core.run import Run
run = Run.get_context()

# get FUSE mount dataset
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--data_folder", type=str, default="datasets", help="path to test dataset on FUSE mount")
args = ap.parse_args()

# debug
dataset_path = os.path.join(args.data_folder,'dataset')
print('[GIN] dataset path on remote compute:', dataset_path)

# load trained model (.pb)
try:
    print('Loading vehicle plate detection model...', end='')
    model_path = config.MODEL_PATH
    label_path = config.LABEL_PATH
    
    model = vp.Model(model_path)

    with open(label_path) as f:
        labels = [l.strip() for l in f.readlines()]
    
    print('Done')

except Exception as e:
    print(e.args)

# create an outputs folder for tracking artifacts
os.makedirs('outputs', exist_ok=True)

# inference on test dataset
image_count = 0
results = []

for f in os.listdir(dataset_path):

    if f.lower().endswith(('jpg','png')):
        
        # output name
        output_name = f'auto-test_{f}'
        output_path = os.path.join('outputs',output_name)

        # read image
        image = cv2.imread(os.path.join(dataset_path, f))

        # detect vehicle plates
        preds = model.predict(image)

        # inspect preds
        for pred in zip(*preds):
            lpBBOX, prob = pred[:2]
            
            if prob > config.LP_CONFIDENCE:
                # get safe bbox
                startX, startY, endX, endY = model.translate_bbox(image, lpBBOX)

                # pack into output list
                results.append({'bbox': (startX, startY, endX, endY), 'prob': prob})

                # draw
                image = vu.draw_bbox_with_label(image, f'{prob:.3f}', (startX, startY, endX, endY))

        # log image
        cv2.imwrite(output_path, image, [int(cv2.IMWRITE_JPEG_QUALITY), config.JPEG_QUALITY])
        run.log_image(name=output_name, path=output_path, plot=None, description='auto-test')
        
        image_count += 1

# simple evaluation logic, EXAMPLE only
total_predicted = len(results)
avg_prob = np.mean([i['prob'] for i in results])
test_acc = round(total_predicted / image_count, 2)

# log metrics
run.log('total_predicted', total_predicted)
run.log('avg_prob', avg_prob)
run.log('test_acc', test_acc)

Overwriting /mnt/batch/tasks/shared/LS_root/mounts/clusters/dev-cpu-ds4v2/code/Users/jixinjia/mlops-custom-vision/test/auto-test.py


### FUSE mount test dataset onto Compute Cluster

In [44]:
# mount the dataset onto compute target 
input_data = dataset.as_mount()

# show dataset's path (TOP 5)
dataset.to_path()[:5]

['/dataset/224636.43.jpg',
 '/dataset/224658.77.jpg',
 '/dataset/224700.41.jpg',
 '/dataset/224859.19.jpg',
 '/dataset/IMG_0201.JPG']

### Submit test to Compute Cluster

In [45]:
# if azureml.widgets not installed
# %pip install azureml-widgets

from azureml.core import ScriptRunConfig
from azureml.widgets import RunDetails

# auto-test job config
src = ScriptRunConfig(source_directory = script_folder,
                      script = 'auto-test.py',
                      arguments = ['--data_folder', input_data],
                      compute_target = compute_target,
                      environment = inference_env)

# submit job and start tracking
run = exp.submit(config=src)

# monitor run
RunDetails(run).show()

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

### Eye ball on auto-test results

![image-alt-text](auto-test-result-for-eyeballing.jpg)

## Track model performance
![image-alt-text](track-test-performance.jpg)

## Register Model
If unit test has passed and had shown a **better performance** on the pre-defined metrics than its predecessors, we will move the new model to production (i.e. Gated release)    
Unfortunately right now the Azure ML does not support **"human in-the-loop" approval process**

In [13]:
from azureml.core.model import Model
from azureml.core.resource_configuration import ResourceConfiguration

# register the latest Custom Vision model for version tracking (auto increment)
model_dir = os.path.dirname(model_path)

model = Model.register(workspace = ws,
                       model_name = model_name,
                       model_path = model_dir,
                       model_framework = Model.Framework.TENSORFLOW,
                       tags = {'source': "custom_vision", 'type': "object_detection"},
                       description = model_description,
                       resource_configuration = ResourceConfiguration(cpu=1, memory_in_gb=2),
                       )

print('Registered:', model.name, model.version)

Registering model custom_vision_model
Registered: custom_vision_model 9


## Deploy (CD)
In this CD process we will:    
1. Create and register packaged environment (docker pull)   
2. Setup IaC for provisioning host service   
3. Buid WSGI webservice based on Falsk/Gunicorn/Nginx   

### Create a `deploy` folder to package all serving components    
This folder contains source code for running the model as an API webservice

In [19]:
import os
import shutil

# create a `deploy` folder consolidating all inference artifacts
deploy_folder = os.path.join(os.getcwd(), 'deploy')
os.makedirs(deploy_folder, exist_ok=True)

# move utils and dependent modules to deploy_folder
src = 'utils'
dst = os.path.join(deploy_folder, src)
shutil.copytree(src, dst, dirs_exist_ok=True)

'/mnt/batch/tasks/shared/LS_root/mounts/clusters/dev-cpu-ds4v2/code/Users/jixinjia/mlops-custom-vision/deploy/utils'

### Write Webservice/API function into `deploy`   
This scoring function will handling incoming inference data (i.e. image blob), run it against our model and produce JSON response

In [46]:
%%writefile $deploy_folder/score.py

'''
Author:     Jixin Jia (Gin)
Date:       2022.03.15
Version:    1.0
Purpose:    Create this scoring function to serve vehicle license plate detection as an API webservice
'''

import cv2
import os
import numpy as np
import base64
import json
from urllib.parse import urlparse

# load model class and utilities
from utils import config
from utils import vehicle_plate as vp
from utils import video_utilities as vu

def init():
    global model

    # AZUREML_MODEL_DIR is an environment variable created during deployment for ACS/AKS (./azureml-models/$MODEL_NAME/$VERSION)
    model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), config.MODEL_PATH)
    label_path = config.LABEL_PATH

    # load serialized trained model (.pb)
    try:
        print('Loading vehicle plate detection model...')
        model = vp.Model(model_path)

        print('[DEBUG] loaded model ->', model)
        print('Done!')

    except Exception as e:
        print('[DEBUG]', e.args)


def run(raw_data):

    # parse json payload
    data = json.loads(raw_data)

    try:
        # check if data contains a URL
        parsed = urlparse(data['data'])

        if all([parsed.scheme, parsed.netloc]):
            # retrieve image via the supplied URL
            image = vu.download_image(data['data'])
        
        # else decode base64 image string
        else:
            image = base64.b64decode(data['data'])
            img_array = np.asarray(bytearray(image), dtype=np.uint8)
            image = cv2.imdecode(img_array, -1)
    
    except Exception as e:
        return {'message':'Unable to decode supplied image. Supply a valid URL or a base64 encoded image string.'}


    # detect vehicle plates
    preds = model.predict(image)

    # inspect predictions
    results = []
    for pred in zip(*preds):
        lpBBOX, prob = pred[:2]

        # convert 0d tensor to numpy
        prob = prob.numpy()
        
        if prob > config.LP_CONFIDENCE:
            # get safe bbox
            startX, startY, endX, endY = model.translate_bbox(image, lpBBOX)

            # pack into output list
            results.append({'startX':startX, 'startY':startY, 'endX':endX, 'endY':endY, 'prob': float(prob)})
    
    return results

Overwriting /mnt/batch/tasks/shared/LS_root/mounts/clusters/dev-cpu-ds4v2/code/Users/jixinjia/mlops-custom-vision/deploy/score.py


### Release to Production (ACS or AKS)

In [39]:
%%time
import uuid
from azureml.core.webservice import Webservice
from azureml.core.model import InferenceConfig
from azureml.core.environment import Environment
from azureml.core import Workspace
from azureml.core.model import Model
from azureml.core.webservice import AciWebservice

# container host
aci_config = AciWebservice.deploy_configuration(cpu_cores = 1, 
                                               memory_gb = 2, 
                                               tags = {'model':model_name, 'framework':inference_env_name}, 
                                               description = model_description)

# inference config
inference_config = InferenceConfig(entry_script = "score.py", 
                                   source_directory = deploy_folder, 
                                   environment = inference_env,
                                   enable_gpu = False)

# container host name (using model version as suffix)
service_name = f'mlops-custom-vision-v{model.version}'

service = Model.deploy(workspace = ws, 
                       name = service_name, 
                       models = [model], 
                       inference_config = inference_config, 
                       deployment_config = aci_config,
                       overwrite = True)

service.wait_for_deployment(show_output=True)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2022-03-16 01:43:21+00:00 Creating Container Registry if not exists.
2022-03-16 01:43:21+00:00 Registering the environment.
2022-03-16 01:43:22+00:00 Use the existing image.
2022-03-16 01:43:22+00:00 Generating deployment configuration.
2022-03-16 01:43:23+00:00 Submitting deployment to compute.
2022-03-16 01:43:26+00:00 Checking the status of deployment mlops-custom-vision-v9..
2022-03-16 01:46:59+00:00 Checking the status of inference endpoint mlops-custom-vision-v9.
Succeeded
ACI service creation operation finished, operation "Succeeded"
CPU times: user 412 ms, sys: 63.8 ms, total: 476 ms
Wall time: 3min 41s


### Get container host service endpoint (URL)
As long as the model version and environment remains same, this endpoint **WILL NOT** change

In [40]:
print(service.scoring_uri)

http://32ae3bd2-1ffb-43db-9ae7-d30cf996323e.japaneast.azurecontainer.io/score
