Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

# Inferencing with TensorFlow 2.0 and ONNX on Azure Machine Learning

## Introduction

This tutorial shows how to convert Tensorflow 2.0 BERT model to ONNX and deploy on Azure Machine Learning. BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications.

For more information about the BERT, please read this [paper](https://arxiv.org/pdf/1810.04805.pdf)

## Overview of Workshop

This workshop shows how to convert a TF 2.0 BERT model, convert to ONNX and deploy the model as Webservice in step-by-step fashion:

 1. Initilize your workspace
 2. Download a previous saved model (saved on Azure Machine Learning)
 3. Test the downloaded model
 4. Convert TF 2.0 model to ONNX
 5. Register ONNX model
 6. Create a scoring script
 7. Defining an Azure Environment
 8. Deploy Model as Webservice
 9. Test Deployment
 10. Clean up Webservice

## Prerequisites
* Understand the **architecture and terms**(add link) introduced by Azure Machine Learning
* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the **configuration notebook**(add link) to:
    * install the AML SDK
    * create a workspace and its configuration file (config.json)
* For local scoring test, you will also need to have tensorflow and keras installed in the current Jupyter kernel.

* Run through [1_Bert_StackOverflow_Training](1_Bert_StackOverflow_Training.ipynb) Notebook first to register your model

In [None]:
# Check core SDK version number
import azureml.core

print("SDK version:", azureml.core.VERSION)

## Initialize Workspace

Initialize a [Workspace](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. Workspace.from_config() creates a workspace object from the details stored in config.json.

In [None]:
from azureml.core import Workspace

ws = Workspace.from_config()
print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep = '\n')

## Download the saved model
You can dowload your BERT Model created in the first notebook. [1_Bert_StackOverflow_Training](1_Bert_StackOverflow_Training.ipynb) Notebook. The model is saved into two files, ``model.json`` and ``model.h5``. Azure ML automatically uploaded and is associated with the registered model. We can use the model object to download the model files.

In [None]:
from azureml.core.model import Model

model = model = ws.models['bert-stackoverflow-v2-local']
model.download(target_dir='.', exist_ok=False, exists_ok=None)

## Predict on the test set
Let's check the version of the local Keras. Make sure it matches with the version number printed out in the training script. Otherwise you might not be able to load the model properly.

In [None]:
import keras
import tensorflow as tf

print("Keras version:", keras.__version__)
print("Tensorflow version:", tf.__version__)

#### Load the BERT model
Load the downloaded BERT model

In [None]:
from transformers import BertTokenizer, TFBertForSequenceClassification
model_dir = "./exports"
loaded_model = TFBertForSequenceClassification.from_pretrained(model_dir, num_labels=len(labels), force_download=True)
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
print("Model loaded from disk.")

Feed in test sentence to test the BERT model. And time the duration of the prediction.

In [None]:
%%time
import json 

# Input test sentences
raw_data = json.dumps({
    'text': 'I need help with importing a module with tensorflow 2.0'
})

text = json.loads(raw_data)['text']
inputs = tokenizer.encode_plus(text, add_special_tokens=True, return_tensors='tf')
predictions = loaded_model.predict(inputs)

#Map labels to the predictions
results = zip(labels,predictions[0])
for prediction in results:
    print(prediction)

As you can see based on the sample sentence the model can predict the probablity of the stackover flow tags related to that sentence.

## Convert To ONNX

Lets convert our TF 2.0 model to ONNX. ONNX is an open format to represent deep learning models. With ONNX, AI developers can more easily move models between state-of-the-art tools and choose the combination that is best for them. ONNX is developed and supported by a community of partners.

Read more about ONNX [here](https://onnx.ai/)

In [None]:
%%time
from azureml.core import Model
import numpy as np
#import keras2onnx
import tensorflow as tf
from transformers import TFBertForSequenceClassification
#import onnxruntime
labels = ['c#', '.net', 'java', 'asp.net', 'c++', 'javascript', 'php', 'python', 'sql', 'sql-server'] # should use blob storage location

# loaded_model = TFBertForSequenceClassification.from_pretrained(os.path.join('./exports/'), num_labels=len(labels), force_download=True)

# # convert to onnx model
# onnx_model = keras2onnx.convert_keras(loaded_model, model.name)
# onnx_model

## Register ONNX Model
Register an existing trained model, add descirption and tags.

In [None]:
#Register the model
from azureml.core.model import Model
model_onnx = Model.register(model_path = "bert-stackoverflow-onnx.onnx", # this points to a local file
                       model_name = "bert-stackoverflow-onnx", # this is the name the model is registered as
                       tags={"model": "BERT",  "method" : "tensorflow"},
                       description='BERT multilabel model for tagging stackoverflow posts v2.',
                       workspace = ws)

print(model_onnx.name, model_onnx.description, model_onnx.version)

## Deploy as web service

Now we are ready to deploy the model as a web service running in Azure Container Instance [ACI](https://azure.microsoft.com/en-us/services/container-instances/). Azure Machine Learning accomplishes this by constructing a Docker image with the scoring logic and model baked in.

Once you've tested the model and are satisfied with the results, deploy the model as a web service hosted in ACI. 

To build the correct environment for ACI, provide the following:
* A scoring script to show how to use the model
* An environment file to show what packages need to be installed
* A configuration file to build the ACI
* The model you trained before

### Create score.py

First, we will create a scoring script that will be invoked by the web service call.

* Note that the scoring script must have two required functions, init() and run(input_data).
    * In init() function, you typically load the model into a global object. This function is executed only once when the Docker container is started.
    * In run(input_data) function, the model is used to predict a value based on the input data. The input and output to run typically use JSON as serialization and de-serialization format but you are not limited to that.

In [None]:
%%writefile score.py
import os
import json
import pandas as pd
import numpy as np
import tensorflow as tf
from transformers import BertTokenizer, TFBertForSequenceClassification
from azureml.core import Model

labels = ['c#', '.net', 'java', 'asp.net', 'c++', 'javascript', 'php', 'python', 'sql', 'sql-server']

def init():
    global tokenizer, model
    model_dir = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'exports')
    tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
    model = TFBertForSequenceClassification.from_pretrained(model_dir, num_labels=len(labels))

def run(raw_data):
    text = json.loads(raw_data)['text']
    inputs = tokenizer.encode_plus(text, add_special_tokens=True, return_tensors='tf')
    predictions = model.predict(inputs)
    print(predictions)
    return predictions

### Create Environment

You can now create and/or use an Environment object when deploying a Webservice. The Environment can have been previously registered with your Workspace, or it will be registered with it as a part of the Webservice deployment. Only Environments that were created using azureml-defaults version 1.0.48 or later will work with this new handling however.

In [None]:
from azureml.core import Environment
from azureml.core.conda_dependencies import CondaDependencies 

myenv = CondaDependencies.create(conda_packages=['numpy','pandas'],
                                 pip_packages=['numpy','pandas','inference-schema[numpy-support]','azureml-defaults','tensorflow==2.0.0','transformers==2.0.0'])

with open("myenv.yml","w") as f:
    f.write(myenv.serialize_to_string())

Review the content of the `myenv.yml` file.

In [None]:
with open("myenv.yml","r") as f:
    print(f.read())

### Create configuration file

Create a deployment configuration file and specify the number of CPUs and gigabyte of RAM needed for your ACI container. While it depends on your model, the default of 1 core and 1 gigabyte of RAM is usually sufficient for many models. If you feel you need more later, you would have to recreate the image and redeploy the service.

In [None]:
from azureml.core.webservice import AciWebservice, Webservice

aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, 
                                               memory_gb=1, 
                                               tags={"model": "BERT",  "method" : "tensorflow"}, 
                                               description='Predict StackoverFlow tags with BERT')

## Create Inference Configuration

There is now support for a source directory, you can upload an entire folder from your local machine as dependencies for the Webservice.
Note: in that case, your entry_script, conda_file, and extra_docker_file_steps paths are relative paths to the source_directory path.

Sample code for using a source directory:

```python
inference_config = InferenceConfig(source_directory="C:/abc",
                                   runtime= "python", 
                                   entry_script="x/y/score.py",
                                   conda_file="env/myenv.yml", 
                                   extra_docker_file_steps="helloworld.txt")
```

 - source_directory = holds source path as string, this entire folder gets added in image so its really easy to access any files within this folder or subfolder
 - runtime = Which runtime to use for the image. Current supported runtimes are 'spark-py' and 'python
 - entry_script = contains logic specific to initializing your model and running predictions
 - conda_file = manages conda and python package dependencies.
 - extra_docker_file_steps = optional: any extra steps you want to inject into docker file

In [None]:
from azureml.core.model import InferenceConfig

inference_config = InferenceConfig(source_directory=".",
                                   runtime= "python", 
                                   entry_script="score.py",
                                   conda_file="../myenv.yml")

### Deploy in ACI
Estimated time to complete: **about 5-10 minutes**

Configure the image and deploy. The following code goes through these steps:

* Build an image using:
   * The scoring file (`score.py`)
   * The environment file (`myenv.yml`)
   * The model file
* Register that image under the workspace. 
* Send the image to the ACI container.
* Start up a container in ACI using the image.
* Get the web service HTTP endpoint.

In [None]:
%%time
from azureml.core.webservice import Webservice
from azureml.exceptions import WebserviceException

aci_service_name = 'bert-stackoverflow-aciservice-onnx'

try:
    # if you want to get existing service below is the command
    # since aci name needs to be unique in subscription deleting existing aci if any
    # we use aci_service_name to create azure ac
    service = Webservice(ws, name=aci_service_name)
    if service:
        service.delete()
except WebserviceException as e:
    print()

service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)

service.wait_for_deployment(True)
print(service.state)

This is the scoring web service endpoint:

In [None]:
print(service.scoring_uri)

## Test the deployed model

Let's test the deployed model. Pick a random samples about Tenserflow 2.0, and send it to the web service hosted in ACI. Note here we are using the run API in the SDK to invoke the service. You can also make raw HTTP calls using any HTTP tool such as curl.

After the invocation, we print the returned predictions.

In [None]:
%%time
import json
test_sample = json.dumps({
    'text': 'I need help with importing a module with tensorflow 2.0'
})

#test_sample_encoded = bytes(test_sample, encoding='utf8')
prediction = service.run(input_data=test_sample)
print(prediction)

#### Using HTTP call

We can retreive the API keys used for accessing the HTTP endpoint.

In [None]:
# retreive the API keys. two keys were generated.
key1, Key2 = service.get_keys()
print(key1)

We can now send construct raw HTTP request and send to the service. Don't forget to add key to the HTTP header.

In [None]:
import requests

# send a random row from the test set to score
random_index = np.random.randint(0, len(X_test)-1)
input_data = "{\"text\": \"I need help with importing a module with tensorflow 2.0 \"}"

headers = {'Content-Type':'application/json', 'Authorization': 'Bearer ' + key1}

resp = requests.post(service.scoring_uri, input_data, headers=headers)

print("POST to url", service.scoring_uri)
print("prediction:", resp.text)

#### Summary of workspace
Let's look at the workspace after the web service was deployed. You should see

* a registered model named 'bert-stackoverflow-onnx' and with the id 'bert-stackoverflow-onnx:1'
* a webservice called 'bert-stackoverflow-aciservice-onnx' with some scoring URL

In [None]:
models = ws.models
for name, model in models.items():
    print("Model: {}, ID: {}".format(name, model.id))
    
webservices = ws.webservices
for name, webservice in webservices.items():
    print("Webservice: {}, scoring URI: {}".format(name, webservice.scoring_uri))

### View ACI Logs (Debug, when something goes wrong )
**Tip: If something goes wrong with the deployment, the first thing to look at is the logs from the service by running the following command:** Run this cell

In [None]:
import pprint
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(service.get_logs())

## Delete ACI to clean up
You can delete the ACI deployment with a simple delete API call.

In [None]:
service.delete()