Copyright (c) Microsoft Corporation. All rights reserved.  
Licensed under the MIT License.

# Inference Bert Model for High Performance with ONNX Runtime on AzureML #

This tutorial takes a pre-trained BERT model, converts it to ONNX, and deploys the ONNX model with ONNX Runtime through AzureML.
In the following sections, we are going to use the Bert model trained with Stanford Question Answering Dataset (SQuAD) dataset as an example. Bert SQuAD model is used in question answering scenarios, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.

## Contents

**Prerequisites** to set up your Azure ML work environments

**Obtain model and convert to ONNX**

**Deploy Bert model using ONNX Runtime and AzureML**

## Prerequisites

To run on AzureML, you need:
* Azure subscription
* Azure Machine Learning Workspace
* the Azure Machine Learning SDK

You might also find the following resources useful:
* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning
* The [Azure Portal](https://portal.azure.com) allows you to track the status of your deployments.

In [1]:
# To install dependencies directly run the following
!pip install torch
!pip install transformers
!pip install azureml azureml.core
!pip install onnxruntime
!pip install matplotlib

# To create a a Jupter kernel from your conda environment, run the following. replacing <kernel name> with your own name
#   conda install -c anaconda ipykernel
#   python -m ipykernel install --user --name=<kernel name>

Collecting azureml.core
  Using cached azureml_core-1.40.0-py3-none-any.whl (2.7 MB)
[31mERROR: azureml-widgets 1.34.0 has requirement azureml-core~=1.34.0, but you'll have azureml-core 1.40.0 which is incompatible.[0m
[31mERROR: azureml-train-core 1.34.0 has requirement azureml-core~=1.34.0, but you'll have azureml-core 1.40.0 which is incompatible.[0m
[31mERROR: azureml-train-automl-runtime 1.34.0.post1 has requirement azureml-core~=1.34.0, but you'll have azureml-core 1.40.0 which is incompatible.[0m
[31mERROR: azureml-train-automl-client 1.34.0 has requirement azureml-core~=1.34.0, but you'll have azureml-core 1.40.0 which is incompatible.[0m
[31mERROR: azureml-tensorboard 1.34.0 has requirement azureml-core~=1.34.0, but you'll have azureml-core 1.40.0 which is incompatible.[0m
[31mERROR: azureml-telemetry 1.34.0 has requirement azureml-core~=1.34.0, but you'll have azureml-core 1.40.0 which is incompatible.[0m
[31mERROR: azureml-sdk 1.34.0 has requirement azureml-core

## Obtain and convert PyTorch model to ONNX format

In the code below, we obtain a BERT model fine-tuned for question answering with the SQUAD dataset from HuggingFace.

If you'd like to pre-train a BERT model from scratch, follow the instructions in
[Pretraining of the BERT model](https://github.com/microsoft/AzureML-BERT/blob/master/pretrain/PyTorch/notebooks/BERT_Pretrain.ipynb). 
And if you would like to fine-tune the model with your own dataset, refer to  [AzureML Bert Eval Squad](https://github.com/microsoft/AzureML-BERT/blob/master/finetune/PyTorch/notebooks/BERT_Eval_SQUAD.ipynb)
or [AzureML Bert Eval GLUE](https://github.com/microsoft/AzureML-BERT/blob/master/finetune/PyTorch/notebooks/BERT_Eval_GLUE.ipynb).


### Define the tokenizer and model

In [1]:
model_name = "bert-large-uncased-whole-word-masking-finetuned-squad"
model_path = "./" + model_name + ".onnx"

from transformers import BertTokenizer, BertForQuestionAnswering

model = BertForQuestionAnswering.from_pretrained(model_name)

In [7]:
max_seq_len = 1024

### Sample input and question

In [8]:
#question = "What is a major importance of Southern California in relation to California and the United States?"
#context = "Southern California, often abbreviated SoCal, is a geographic and cultural region that generally comprises California's southernmost 10 counties. The region is traditionally described as \"eight counties\", based on demographics and economic ties: Imperial, Los Angeles, Orange, Riverside, San Bernardino, San Diego, Santa Barbara, and Ventura. The more extensive 10-county definition, including Kern and San Luis Obispo counties, is also used based on historical political divisions. Southern California is a major economic center for the state of California and the United States."
question = "What is my name"
context = "My name is Natalie and my friend's name is Jane"

### Run the PyTorch model

Define the pre processing code, which encodes the input question and context into token ids and segment ids. This function also returns the list of tokens in the input, used during post processing to map the output of the model back to a phrase.

Define the post processing code which takes the start and end token ids from the model output, determines if they are sequential, and maps them back into words to make an answer phrase.

Run the PyTorch model

### Export the model

In [3]:
import torch

model_path = "./" + model_name + ".onnx"

# set the model to inference mode
# It is important to call torch_model.eval() or torch_model.train(False) before exporting the model, 
# to turn the model to inference mode. This is required since operators like dropout or batchnorm 
# behave differently in inference and training mode.
model.eval()

# Generate dummy inputs to the model. Adjust if neccessary
inputs = {
        'input_ids':   torch.randint(32, [1, 32], dtype=torch.long), # list of numerical ids for the tokenized text
        'token_type_ids':  torch.ones([1, 32], dtype=torch.long)     # dummy list of ones
    }

symbolic_names = {0: 'batch_size', 1: 'max_seq_len'}
torch.onnx.export(model,                                         # model being run
                  (inputs['input_ids'], 
                   inputs['token_type_ids']),                    # model input (or a tuple for multiple inputs)
                  model_path,                                    # where to save the model (can be a file or file-like object)
                  opset_version=11,                              # the ONNX version to export the model to
                  do_constant_folding=True,                      # whether to execute constant folding for optimization
                  input_names=['input_ids', 
                               'segment_ids'],                   # the model's input names
                  output_names=['start_logits', "end_logits"],   # the model's output names
                  dynamic_axes={'input_ids': symbolic_names,              
                                'segment_ids' : symbolic_names,
                                'start_logits' : symbolic_names, 
                                'end_logits': symbolic_names})   # variable length axes

## Run the ONNX model with ONNX Runtime



The following code runs the ONNX model with ONNX Runtime. You can test it locally before deploying it to Azure Machine Learning.

The `init()` function is called at startup, performing the one-off operations such as creating the tokenizer and the ONNX Runtime session.

The `run()` function is called when we run the model using the Azure ML web service.
Add neccessary `preprocess()` and `postprocess()` steps.

The following score.py file assumes the inputs will be in the format of the example above. 

In [10]:
%%writefile score.py
import os
import logging
import json
import numpy as np
import onnxruntime
import transformers
import torch


def preprocess(question, context):
    print("Question:", question)
    print("Context: ", context)
    encoded_input = tokenizer(question, context)
    tokens = tokenizer.convert_ids_to_tokens(encoded_input.input_ids)
    print(tokens)
    return (encoded_input.input_ids, encoded_input.token_type_ids, tokens)

def postprocess(tokens, start, end):
    print("Start:", start)
    print("End:", end)
    results = {}
    answer_start = np.argmax(start)
    answer_end = np.argmax(end)
    print("Start: ", answer_start)
    print("End: ", answer_end)
    if answer_end >= answer_start:
        answer = tokens[answer_start]
        for i in range(answer_start+1, answer_end+1):
            if tokens[i][0:2] == "##":
                answer += tokens[i][2:]
            else:
                answer += " " + tokens[i]
        results['answer'] = answer.capitalize()
    else:
        results['error'] = "I am unable to find the answer to this question. Can you please ask another question?"
    return results

def init():
    global tokenizer, session, model

    model_name = "bert-large-uncased-whole-word-masking-finetuned-squad"
    model = transformers.BertForQuestionAnswering.from_pretrained(model_name)

    # use AZUREML_MODEL_DIR to get your deployed model(s). If multiple models are deployed, 
    # model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), '$MODEL_NAME/$VERSION/$MODEL_FILE_NAME')
    model_dir = os.getenv('AZUREML_MODEL_DIR')
    if model_dir == None:
        model_dir = "./"
    model_path = os.path.join(model_dir, model_name + ".onnx")

    # Create the tokenizer
    tokenizer = transformers.BertTokenizer.from_pretrained(model_name)

    # Create an ONNX Runtime session to run the ONNX model
    session = onnxruntime.InferenceSession(model_path, providers=["CPUExecutionProvider"])  


def run_pytorch(raw_data):
    inputs = json.loads(raw_data)

    model.eval()

    logging.info("Question:", inputs["question"])
    logging.info("Context: ", inputs["context"])

    input_ids, segment_ids, tokens = preprocess(inputs["question"], inputs["context"])

    print(input_ids)

    model_outputs = model(torch.tensor([input_ids]),  token_type_ids=torch.tensor([segment_ids]))

    outputs = postprocess(tokens, model_outputs.start_logits.detach().numpy(), model_outputs.end_logits.detach().numpy())

    print(outputs)


def run(raw_data):
    logging.info("Request received")
    inputs = json.loads(raw_data)

    logging.info(inputs)

    # Preprocess the question and context into tokenized ids
    input_ids, segment_ids, tokens = preprocess(inputs["question"], inputs["context"])

    print(input_ids)
    
    # Format the inputs for ONNX Runtime
    model_inputs = {
        'input_ids':   [input_ids], 
        'segment_ids': [segment_ids]
        }
                  
    outputs = session.run(['start_logits', 'end_logits'], model_inputs)
    
    # Post process the output of the model into an answer (or an error if the question could not be answered)
    return postprocess(tokens, outputs[0], outputs[1])


if __name__ == '__main__':
    init()

    #input = "{\"question\": \"What is my name?\", \"context\": \"My name is Natalie, and my sister's name is also Nathalie and my brother's name is dufas and my friend's name is Remy\"}"

    #input = "{\"question\": \"What is Dolly Parton's middle name?\", \"context\": \"Dolly Rebecca Parton (born January 19, 1946) is an American singer-songwriter, actress, and businesswoman, known primarily for her work in country music. After achieving success as a songwriter for others, Parton made her album debut in 1967 with Hello, I'm Dolly, which led to success during the remainder of the 1960s (both as a solo artist and with a series of duet albums with Porter Wagoner), before her sales and chart peak came during the 1970s and continued into the 1980s. Parton's albums in the 1990s did not sell as well, but she achieved commercial success again in the new millennium and has released albums on various independent labels since 2000, including her own label, Dolly Records. She has sold more than 100 million records worldwide.\"}"

    input = "{\"question\": \"What is Dolly Parton's middle name?\", \"context\": \"Dolly Rebecca Parton is an American singer-songwriter\"}"

    run_pytorch(input)
    print(run(input))



Overwriting score.py


In [11]:
%run score.py

{'question': 'What is my name', 'answer': 'Natalie'}


## Deploy model with ONNX Runtime through AzureML

Now that we have the ONNX model and the code to run it with ONNX Runtime, we can deploy it using Azure ML.



## Check your environment

In [12]:
# Check core SDK version number
import azureml.core
import onnxruntime
import torch
import transformers

print("Transformers version: ", transformers.__version__)
torch_version = torch.__version__
print("Torch (ONNX exporter) version: ", torch_version)
print("Azure SDK version:", azureml.core.VERSION)
print("ONNX Runtime version: ", onnxruntime.__version__)


Transformers version:  4.17.0
Torch (ONNX exporter) version:  1.10.0
Azure SDK version: 1.40.0
ONNX Runtime version:  1.11.0


### Load your Azure ML workspace

We begin by instantiating a workspace object from the existing workspace created earlier in the configuration notebook.

In [13]:
from azureml.core import Workspace

ws = Workspace.from_config()
print(ws.name, ws.location, ws.resource_group, sep = '\n')

ort_training_dev
australiaeast
onnx_training


## Register your model with Azure ML

Now we upload the model and register it in the workspace.


In [14]:
from azureml.core.model import Model

model = Model.register(model_path = model_path,                 # Name of the registered model in your workspace.
                       model_name = model_name,            # Local ONNX model to upload and register as a model
                       model_framework=Model.Framework.ONNX ,   # Framework used to create the model.
                       model_framework_version=torch_version,   # Version of ONNX used to create the model.
                       tags = {"onnx": "demo"},
                       description = "HuggingFace Bert model fine-tuned with SQuAd and exported from PyTorch",
                       workspace = ws)

./bert-large-uncased-whole-word-masking-finetuned-squad.onnx
bert-large-uncased-whole-word-masking-finetuned-squad
Registering model bert-large-uncased-whole-word-masking-finetuned-squad


#### Displaying your registered models

You can list out all the models that you have registered in this workspace.

In [15]:
models = ws.models
for name, m in models.items():
    print("Name:", name,"\tVersion:", m.version, "\tDescription:", m.description, m.tags)
    
#     # If you'd like to delete the models from workspace
#     model_to_delete = Model(ws, name)
#     model_to_delete.delete()

Name: hf-gpt2.onnx 	Version: 1 	Description: ONNX version of base HuggingFace GPT-2 {}
Name: hf-gpt2.pt 	Version: 1 	Description: GPT-2 model saved from pre-trained HuggingFace {}
Name: pytorch-hf-gpt-onnx-int8 	Version: 1 	Description: None {}
Name: pytorch-hf-gpt2-wikitext103 	Version: 1 	Description: None {}
Name: pt-ort-hf-gpt2-wt103-full 	Version: 1 	Description: HuggingFace GPT-2 fine-tuned with PyTorch ORT using Wikitext103 {}
Name: sample-densenet-onnx-model 	Version: 1 	Description: None {}
Name: bert-large-uncased-whole-word-masking-finetuned-squad 	Version: 2 	Description: HuggingFace Bert model fine-tuned with SQuAd and exported from PyTorch {'onnx': 'demo'}


## Deploy the model 

We are now going to deploy our ONNX model on Azure ML using ONNX Runtime.

Firstly we will test the deployment using an Azure Container Instance, then deploy the model for production using an Azure ML endpoint.



### Dependencies

We create a YAML file that specifies the dependencies of the inference application

In [23]:
from azureml.core.conda_dependencies import CondaDependencies 

myenv = CondaDependencies.create(pip_packages=["numpy","onnxruntime","transformers", "torch", "azureml-core", "azureml-defaults", "socks"])

with open("myenv.yml","w") as f:
    f.write(myenv.serialize_to_string())

We're all set! Let's get our model chugging.

## Deploy Model as Webservice on Azure Container Instance

The following cell will likely take a few minutes to run as well.

In [24]:
from random import randint

from azureml.core.webservice import Webservice
from azureml.core.webservice import AciWebservice
from azureml.core.model import InferenceConfig
from azureml.core.environment import Environment

myenv = Environment.from_conda_specification(name="myenv", file_path="myenv.yml")
inference_config = InferenceConfig(entry_script="score.py", environment=myenv)

In [25]:
aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, 
                                               memory_gb = 4, 
                                               tags = {'demo': 'onnx'}, 
                                               description = 'Web service for Bert-squad-large-uncased ONNX model')


In [26]:
# ACI deployment names must be 32 characters or less
aci_service_name = model_name[:28] + '-' + str(randint(0,100))
print("ACI service name: ", aci_service_name)

aci_service = Model.deploy(ws, 
                           aci_service_name, 
                           [model], 
                           inference_config, 
                           aciconfig)

aci_service.wait_for_deployment(True)
print("ACI service state: ", aci_service.state)

ACI service name:  bert-large-uncased-whole-wor-57
Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2022-04-08 19:06:09+00:00 Creating Container Registry if not exists.
2022-04-08 19:06:09+00:00 Registering the environment.
2022-04-08 19:06:11+00:00 Building image..
2022-04-08 19:12:27+00:00 Generating deployment configuration..
2022-04-08 19:12:29+00:00 Submitting deployment to compute..
2022-04-08 19:12:43+00:00 Checking the status of deployment bert-large-uncased-whole-wor-57..
2022-04-08 19:15:30+00:00 Checking the status of inference endpoint bert-large-uncased-whole-wor-57.
Failed


Service deployment polling reached non-successful terminal state, current service state: Failed
Operation ID: 5614dcfb-01bc-440e-957f-3d29b4e5f764
More information can be found using '.get_logs()'
Error:
{
  "code": "AciDeploymentFailed",
  "statusCode": 400,
  "message": "Aci Deployment failed with exception: Your container application crashed. This may be caused by errors in your scoring file's init() function.
	1. Please check the logs for your container instance: bert-large-uncased-whole-wor-57. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs.
	2. You can interactively debug your scoring file locally. Please refer to https://docs.microsoft.com/azure/machine-learning/how-to-debug-visual-studio-code#debug-and-troubleshoot-deployments for more information.
	3. You can also try to run image orttrainingdf7604408.azurecr.io/azureml/azureml_a4c082e32e5658087810cef35b655aa2 locally. Please refer to https://aka.ms/debugimage#service-launc

WebserviceException: WebserviceException:
	Message: Service deployment polling reached non-successful terminal state, current service state: Failed
Operation ID: 5614dcfb-01bc-440e-957f-3d29b4e5f764
More information can be found using '.get_logs()'
Error:
{
  "code": "AciDeploymentFailed",
  "statusCode": 400,
  "message": "Aci Deployment failed with exception: Your container application crashed. This may be caused by errors in your scoring file's init() function.
	1. Please check the logs for your container instance: bert-large-uncased-whole-wor-57. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs.
	2. You can interactively debug your scoring file locally. Please refer to https://docs.microsoft.com/azure/machine-learning/how-to-debug-visual-studio-code#debug-and-troubleshoot-deployments for more information.
	3. You can also try to run image orttrainingdf7604408.azurecr.io/azureml/azureml_a4c082e32e5658087810cef35b655aa2 locally. Please refer to https://aka.ms/debugimage#service-launch-fails for more information.",
  "details": [
    {
      "code": "CrashLoopBackOff",
      "message": "Your container application crashed. This may be caused by errors in your scoring file's init() function.
	1. Please check the logs for your container instance: bert-large-uncased-whole-wor-57. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs.
	2. You can interactively debug your scoring file locally. Please refer to https://docs.microsoft.com/azure/machine-learning/how-to-debug-visual-studio-code#debug-and-troubleshoot-deployments for more information.
	3. You can also try to run image orttrainingdf7604408.azurecr.io/azureml/azureml_a4c082e32e5658087810cef35b655aa2 locally. Please refer to https://aka.ms/debugimage#service-launch-fails for more information."
    },
    {
      "code": "AciDeploymentFailed",
      "message": "Your container application crashed. Please follow the steps to debug:
	1. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs. Please refer to https://aka.ms/debugimage#dockerlog for more information.
	2. If your container application crashed. This may be caused by errors in your scoring file's init() function. You can try debugging locally first. Please refer to https://aka.ms/debugimage#debug-locally for more information.
	3. You can also interactively debug your scoring file locally. Please refer to https://docs.microsoft.com/azure/machine-learning/how-to-debug-visual-studio-code#debug-and-troubleshoot-deployments for more information.
	4. View the diagnostic events to check status of container, it may help you to debug the issue.
"RestartCount": 3
"CurrentState": {"state":"Waiting","startTime":null,"exitCode":null,"finishTime":null,"detailStatus":"CrashLoopBackOff: Back-off restarting failed"}
"PreviousState": {"state":"Terminated","startTime":"2022-04-08T19:18:48.073Z","exitCode":111,"finishTime":"2022-04-08T19:19:32.595Z","detailStatus":"Error"}
"Events":
{"count":1,"firstTimestamp":"2022-04-08T19:12:47Z","lastTimestamp":"2022-04-08T19:12:47Z","name":"Pulling","message":"pulling image "orttrainingdf7604408.azurecr.io/azureml/azureml_a4c082e32e5658087810cef35b655aa2@sha256:6b50a385b65eff99d469848ef04872eb611f4f4e0005a37a948d69a8d007055e"","type":"Normal"}
{"count":1,"firstTimestamp":"2022-04-08T19:14:30Z","lastTimestamp":"2022-04-08T19:14:30Z","name":"Pulled","message":"Successfully pulled image "orttrainingdf7604408.azurecr.io/azureml/azureml_a4c082e32e5658087810cef35b655aa2@sha256:6b50a385b65eff99d469848ef04872eb611f4f4e0005a37a948d69a8d007055e"","type":"Normal"}
{"count":4,"firstTimestamp":"2022-04-08T19:15:21Z","lastTimestamp":"2022-04-08T19:18:48Z","name":"Started","message":"Started container","type":"Normal"}
{"count":4,"firstTimestamp":"2022-04-08T19:16:07Z","lastTimestamp":"2022-04-08T19:19:32Z","name":"Killing","message":"Killing container with id 563254c556a9d45aa8278833deb47c37a5baebd03087c7cf1f0025ff3313c18e.","type":"Normal"}
"
    }
  ]
}
	InnerException None
	ErrorResponse 
{
    "error": {
        "message": "Service deployment polling reached non-successful terminal state, current service state: Failed\nOperation ID: 5614dcfb-01bc-440e-957f-3d29b4e5f764\nMore information can be found using '.get_logs()'\nError:\n{\n  \"code\": \"AciDeploymentFailed\",\n  \"statusCode\": 400,\n  \"message\": \"Aci Deployment failed with exception: Your container application crashed. This may be caused by errors in your scoring file's init() function.\n\t1. Please check the logs for your container instance: bert-large-uncased-whole-wor-57. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs.\n\t2. You can interactively debug your scoring file locally. Please refer to https://docs.microsoft.com/azure/machine-learning/how-to-debug-visual-studio-code#debug-and-troubleshoot-deployments for more information.\n\t3. You can also try to run image orttrainingdf7604408.azurecr.io/azureml/azureml_a4c082e32e5658087810cef35b655aa2 locally. Please refer to https://aka.ms/debugimage#service-launch-fails for more information.\",\n  \"details\": [\n    {\n      \"code\": \"CrashLoopBackOff\",\n      \"message\": \"Your container application crashed. This may be caused by errors in your scoring file's init() function.\n\t1. Please check the logs for your container instance: bert-large-uncased-whole-wor-57. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs.\n\t2. You can interactively debug your scoring file locally. Please refer to https://docs.microsoft.com/azure/machine-learning/how-to-debug-visual-studio-code#debug-and-troubleshoot-deployments for more information.\n\t3. You can also try to run image orttrainingdf7604408.azurecr.io/azureml/azureml_a4c082e32e5658087810cef35b655aa2 locally. Please refer to https://aka.ms/debugimage#service-launch-fails for more information.\"\n    },\n    {\n      \"code\": \"AciDeploymentFailed\",\n      \"message\": \"Your container application crashed. Please follow the steps to debug:\n\t1. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs. Please refer to https://aka.ms/debugimage#dockerlog for more information.\n\t2. If your container application crashed. This may be caused by errors in your scoring file's init() function. You can try debugging locally first. Please refer to https://aka.ms/debugimage#debug-locally for more information.\n\t3. You can also interactively debug your scoring file locally. Please refer to https://docs.microsoft.com/azure/machine-learning/how-to-debug-visual-studio-code#debug-and-troubleshoot-deployments for more information.\n\t4. View the diagnostic events to check status of container, it may help you to debug the issue.\n\"RestartCount\": 3\n\"CurrentState\": {\"state\":\"Waiting\",\"startTime\":null,\"exitCode\":null,\"finishTime\":null,\"detailStatus\":\"CrashLoopBackOff: Back-off restarting failed\"}\n\"PreviousState\": {\"state\":\"Terminated\",\"startTime\":\"2022-04-08T19:18:48.073Z\",\"exitCode\":111,\"finishTime\":\"2022-04-08T19:19:32.595Z\",\"detailStatus\":\"Error\"}\n\"Events\":\n{\"count\":1,\"firstTimestamp\":\"2022-04-08T19:12:47Z\",\"lastTimestamp\":\"2022-04-08T19:12:47Z\",\"name\":\"Pulling\",\"message\":\"pulling image \"orttrainingdf7604408.azurecr.io/azureml/azureml_a4c082e32e5658087810cef35b655aa2@sha256:6b50a385b65eff99d469848ef04872eb611f4f4e0005a37a948d69a8d007055e\"\",\"type\":\"Normal\"}\n{\"count\":1,\"firstTimestamp\":\"2022-04-08T19:14:30Z\",\"lastTimestamp\":\"2022-04-08T19:14:30Z\",\"name\":\"Pulled\",\"message\":\"Successfully pulled image \"orttrainingdf7604408.azurecr.io/azureml/azureml_a4c082e32e5658087810cef35b655aa2@sha256:6b50a385b65eff99d469848ef04872eb611f4f4e0005a37a948d69a8d007055e\"\",\"type\":\"Normal\"}\n{\"count\":4,\"firstTimestamp\":\"2022-04-08T19:15:21Z\",\"lastTimestamp\":\"2022-04-08T19:18:48Z\",\"name\":\"Started\",\"message\":\"Started container\",\"type\":\"Normal\"}\n{\"count\":4,\"firstTimestamp\":\"2022-04-08T19:16:07Z\",\"lastTimestamp\":\"2022-04-08T19:19:32Z\",\"name\":\"Killing\",\"message\":\"Killing container with id 563254c556a9d45aa8278833deb47c37a5baebd03087c7cf1f0025ff3313c18e.\",\"type\":\"Normal\"}\n\"\n    }\n  ]\n}"
    }
}

In case the deployment fails, you can check the logs. Make sure to delete your aci_service before trying again.

In [None]:
if aci_service.state != 'Healthy':
    # run this command for debugging.
    print(aci_service.get_logs())
    aci_service.delete()

## Success!

If you've made it this far, you've deployed a working web service that does image classification using an ONNX model. You can get the URL for the webservice with the code below.

In [None]:
print(aci_service.scoring_uri)

## Step 2.5 - Inference Bert Model using our WebService

**Input**: Context paragraph and questions as formatted in `inputs.json`

**Task**: For each question about the context paragraph, the model predicts a start and an end token from the paragraph that most likely answers the questions.

**Output**: The best answer for each question.

In [None]:
# Use the inputs from step 2.2
print("========= INPUT DATA =========")
print(json.dumps(inputs_json, indent=2))
azure_result = aci_service.run(json.dumps(inputs_json))
print("\n")
print("========= RESULT =========")
print(json.dumps(azure_result, indent=2))

In [None]:
res = azure_result['result']
inference_time = np.round(azure_result['total_time'] * 1000, 2)
time_per_item = np.round(azure_result['time_per_item'] * 1000, 2)

print('========================================')
print('Final predictions are: ')
for key in res:
    print("Question: ", inputs_json['data'][0]['paragraphs'][0]['qas'][int(key) - 1]['question'])
    print("Best Answer: ", res[key])
    print()

print('========================================')
print('Inference time: ' + str(inference_time) + " ms")
print('Average inference time for each question: ' + str(time_per_item) + " ms")
print('========================================')

When you are eventually done using the web service, remember to delete it.

In [None]:
aci_service.delete()