Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

# Deploy Model to AKS

Estimated time to complete: 30-60 minutes

Once you've tested the model and are satisfied with the results, deploy the model as a web service hosted in [Azure Kubernetes Service cluster](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-azure-kubernetes-service).

To deploy the model to AKS, we need the following:

1. A scoring script to show how to use the model
2. An environment file to show what packages need to be installed
3. Deployment configuration for the ML service deployed on AKS
4. The model you trained before


### Prerequisites
You need to complete the steps in 01_Prerequisites and 02_Train_Model NoteBooks. Note that some of the code is duplicated between ACI and AKS deployment Notebooks. This is to ensure that entire code for AKS or ACI deployment is in one Notebook. 

## 1.0 Connect to workspace

Initialize a Workspace object from the existing workspace you created in the Prerequisites step

In [None]:
from azureml.core import Workspace

try:
    ws = Workspace.from_config()
    print(ws.name, ws.location, ws.resource_group, ws.location, sep='\t')
    print('Library configuration succeeded')
except:
    print('Workspace not found')

## 2.0 Create scoring script
Create the scoring script, called score.py, used by the web service call to show how to use the model.

You must include two required functions into the scoring script:

The init() function, which typically loads the model into a global object. This function is run only once when the Docker container is started.

The run(input_data) function uses the model to predict a value based on the input data. Inputs and outputs to the run typically use JSON for serialization and de-serialization, but other formats are supported.

Input data from KM will be in the following JSON format

```json
{
	"raw_data": {
		"text": "The Bing News Search API makes it easy to integrate Bing's cognitive news searching capabilities into your applications. If your Cosmos DB account is used by other Azure services like Azure Cognitive Search , or is accessed from Stream analytics or Power BI , you allow access by selecting Accept connections from within global Azure datacenters . "
	}
}
```


Output data from ML servcie should be in the following fomat
```json
{
	"tags": {
		"products": [
			"Bing News Search",
			"Cosmos DB",
			"Cognitive Search",
			"Stream analytics"
		],
		"features": []
	}
}
```

In [None]:
import json
with open('labelfile.txt', 'r') as fp:
    labelmap = json.load(fp)
labelmap

In [None]:
score_text= '''

import json
import argparse
import os
import random
import sys
from tempfile import TemporaryDirectory
from azureml.core import Dataset, Run
import pandas as pd
import torch

# Inference schema for schema discovery
from inference_schema.schema_decorators import input_schema, output_schema
from inference_schema.parameter_types.numpy_parameter_type import NumpyParameterType
from inference_schema.parameter_types.standard_py_parameter_type import StandardPythonParameterType

from utils_nlp.common.pytorch_utils import dataloader_from_dataset
from utils_nlp.common.timer import Timer
from utils_nlp.dataset.ner_utils import preprocess_conll
from utils_nlp.models.transformers.named_entity_recognition import (
    TokenClassificationProcessor, TokenClassifier)

from azureml.core.model import Model

model_name = "bert-base-cased"
DO_LOWER_CASE = False
TRAILING_PIECE_TAG = "X"
DEVICE = "cuda"
test_fraction = 0.2
train_file = "ner_dataset2"
max_len = 256
CACHE_DIR = "./temp"
label_map=  %s
BATCH_SIZE = 5
device = "cpu"


def init():
    global model
        
    # load the pretrained model
    model = TokenClassifier(model_name=model_name, num_labels=len(label_map), cache_dir=CACHE_DIR )
    # Load the fine tuned weights
    model_path = Model.get_model_path('bertkm_ner')
    # apply the fine tuned weights to pretrained model
    model.model.load_state_dict(torch.load(model_path, map_location=device))

# Inference schema for schema discovery
standard_sample_input = {'text': 'a sample input record containing some text' }
standard_sample_output = {'tags': {'products': ['Cognitive Search', 'Cosmos DB'], "features": ['indexer']}}

@input_schema('raw_data', StandardPythonParameterType(standard_sample_input))
@output_schema(StandardPythonParameterType(standard_sample_output))
def run(raw_data):
    input_txt = ""
    try:
        input_txt = raw_data["text"]
        tag_list = []
        processor = TokenClassificationProcessor(model_name=model_name, to_lower=DO_LOWER_CASE, cache_dir=CACHE_DIR)


        product=False
        feature=False
        product_temp=None 
        feature_temp=None

        input_tokens = input_txt.split() 

        sample_dataset = processor.preprocess_for_bert(
            text=[input_tokens],
            max_len=max_len,
            labels=None,
            label_map=label_map,
            trailing_piece_tag=TRAILING_PIECE_TAG,
        )
        sample_dataloader = dataloader_from_dataset(
            sample_dataset, batch_size=BATCH_SIZE, num_gpus=None, shuffle=False, distributed=False
        )
        #for AKS deployment remove the Verbose flag
        preds = model.predict(
                test_dataloader=sample_dataloader,
                num_gpus=None,
                verbose=False
            )
        tags_predicted = model.get_predicted_token_labels(
            predictions=preds,
            label_map=label_map,
            dataset=sample_dataset
        )
        
        tags = {"products": [],"features": []}
        loc = 0
        product_temp=""
        feature_temp=""    
        for i in input_tokens:
            if(loc<256 and loc < len(tags_predicted[0])):
                if tags_predicted[0][loc] == 'B-Product':
                    product = True
                    product_temp=i
                elif tags_predicted[0][loc] == 'I-Product':                
                    product_temp += " " +i
                elif tags_predicted[0][loc] == 'B-Feature':
                    feature = True
                    feature_temp=i
                elif tags_predicted[0][loc] == 'I-Feature':
                    feature_temp += " " +i            
                else:
                    if(product==True):
                        tags["products"].append(product_temp)
                        product=False
                    elif(feature==True):
                        tags["features"].append(feature_temp)
                        feature=False                    
                loc = loc+1



        output = {"tags": tags}  

        return(output)
    except Exception as e:
        result = str(e)
        # return error message back to the client
        return json.dumps({"error": result})
    
'''%str(labelmap)


There is a way to debug Scoring script locally. Refer to the script 04_Debug_Score_Script.ipynb

In [None]:
with open("score.py", "w") as stream:
   stream.write(score_text)

## 3.0 Deploy Model to AKS
Next we will go through the steps to deploy the model in AKS.

### 3.1 Create a custom environment
Specify the model's runtime environment by creating an Environment object and providing the CondaDependencies needed by your model.

In [None]:
PIP_PACKAGES = ["azureml-defaults", "azureml-monitoring", "seqeval[gpu]", "torch==1.4", "tqdm==4.31.1", "transformers==2.8.0", "nltk==3.5", "azureml-sdk==1.3.0", "inference-schema"]
CONDA_PACKAGES = ["numpy", "scikit-learn", "pandas"]
utils_nlp_file="./nlp-recipes-utils/utils_nlp-2.0.0-py3-none-any.whl"
PYTHON_VERSION = "3.6.8"
USE_GPU = True

In [None]:
# conda env setup
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core import ScriptRunConfig
from azureml.core.environment import Environment, DEFAULT_GPU_IMAGE

myenv = Environment(name="myenv")

conda_dependencies = CondaDependencies.create(
    conda_packages=CONDA_PACKAGES,
    pip_packages=PIP_PACKAGES,
    python_version=PYTHON_VERSION,
)

nlp_repo_whl = Environment.add_private_pip_wheel(
    workspace=ws,
    file_path=utils_nlp_file,
    exist_ok=True,
)
#we can also add using the approach mentioned at https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-environments#add-packages-to-an-environment

conda_dependencies.add_pip_package(nlp_repo_whl)

# Adds dependencies to PythonSection of myenv
myenv.python.conda_dependencies=conda_dependencies

### 3.2 Create the InferenceConfig
Create the inference config that will be used when deploying the model

In [None]:
from azureml.core.model import InferenceConfig
inf_config = InferenceConfig(entry_script='score.py', environment=myenv)

### 3.3 Provision AKS Cluster
This is a one time setup. You can reuse this cluster for multiple deployments after it has been created. If you delete the cluster or the resource group that contains it, then you would have to recreate it.
Here we are creating AKS Cluster which is SSL enabled with certificate from Microsoft. If you need to enable SSL with your own certificate, follow the steps mentioned [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-secure-web-service#enable).

To get more details about creating a new AKS cluster refer this [link](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-deploy-azure-kubernetes-service#create-a-new-aks-cluster).

This takes about 20 minutes.

In [None]:
from azureml.core.compute import AksCompute, ComputeTarget
# Config used to create a new AKS cluster and enable SSL
prov_config = AksCompute.provisioning_configuration()

# Leaf domain label generates a name using the formula
#  "<leaf-domain-label>######.<azure-region>.cloudapp.azure.net"
#  where "######" is a random series of characters
prov_config.enable_ssl(leaf_domain_label = "contoso")

aks_name = 'kmaml-aks' 

#Use existing clusters
#aks_target = ComputeTarget(ws, 'kmaml-aks')

# Create the cluster   
aks_target = ComputeTarget.create(workspace = ws, 
                                  name = aks_name, 
                                   provisioning_configuration = prov_config)

In [None]:
# Wait for the create process to complete
aks_target.wait_for_completion(show_output = True)

### 3.4 Create configuration file
Create a deployment configuration file and specify the number of CPUs and gigabyte of RAM needed. While it depends on your model, the default of 1 core and 1 gigabyte of RAM is usually sufficient for many models. If you feel you need more later, you would have to recreate the image and redeploy the service.`

It takes 5-10  mins to deploy the service.

In [None]:
%%time
from azureml.core.model import Model
from azureml.core.webservice import Webservice
from azureml.core.image import ContainerImage
from azureml.core.webservice import AksWebservice, Webservice
from azureml.core.compute import AksCompute, ComputeTarget


# If deploying to a cluster configured for dev/test, ensure that it was created with enough
# cores and memory to handle this deployment configuration. Note that memory is also used by
# things such as dependencies and AML components.

aks_config = AksWebservice.deploy_configuration(autoscale_enabled=True, 
                                                       autoscale_min_replicas=1, 
                                                       autoscale_max_replicas=3, 
                                                       autoscale_refresh_seconds=10, 
                                                       autoscale_target_utilization=70,
                                                       auth_enabled=True, 
                                                       cpu_cores=1, memory_gb=2, 
                                                       scoring_timeout_ms=5000, 
                                                       replica_max_concurrent_requests=2, 
                                                       max_request_wait_time=5000)

### 3.5 Deploy to AKS

In [None]:
service_name = 'kmaml-aks'
model=Model(ws, 'bertkm_ner')

aks_service = Model.deploy(workspace=ws,
                           name=service_name,
                           models=[model],
                           inference_config=inf_config,
                           deployment_config=aks_config,
                           deployment_target=aks_target,
                           overwrite = True)

aks_service.wait_for_deployment(show_output = True)
print(aks_service.state)
print(aks_service.get_logs())

In [None]:
primary, secondary = aks_service.get_keys()
print(primary)
print(aks_service.state)

Get the scoring web service's HTTP endpoint, which accepts REST client calls. We will test the ML web service using this REST API endpoint.



In [None]:
print(aks_service.scoring_uri)

## 4.0 Test deployed service in AKS
We test the web sevice by passing data. Run() method retrieves API keys behind the scenes to make sure that call is authenticated.

Note that in the following example the model identifies Bing News Search as a Product. This entity was not part of the Tarining data. That is the power of BERT model.

In [None]:
import requests
import json

import time


# send a random row from the test set to score
input_data = """{"raw_data": {"text": "The Bing News Search API makes it easy to integrate Bing's cognitive news searching capabilities into your applications. If your Cosmos DB account is used by other Azure services like Azure Cognitive Search , or is accessed from Stream analytics or Power BI , you allow access by selecting Accept connections from within global Azure datacenters . "}}"""


# for AKS deployment that is key auth enabled, you'd need to get the service key in the header as well
primary, secondary = aks_service.get_keys()

# Set the content type
headers = {'Content-Type': 'application/json'}
# If authentication is enabled, set the authorization header
headers['Authorization'] = f'Bearer {primary}'

# Make the request and display the response
t0 = time.ctime(time.time())
print("t0:", t0)

resp = requests.post(aks_service.scoring_uri, input_data, headers=headers)
print(resp.text)
t1 = time.ctime(time.time())
print("t1:", t1)

## Next Step

Now that we have deployed ML model as a servcie in AKS, we need to integrate that with Cognitive Search. 