Copyright (C) 2022-2023, Intel Corporation

SPDX-License-Identifier: MIT

# ONNX Runtime Inference using AzureML

In this sample we are running inference on a Question-Answering usecase, with quantized BERT Model using OpenVINO Execution provider through Optimum ONNX Runtime. The quantized BERT model is generated through Quantization Aware Training

The question answer scenario takes a question and a piece of text called a context, and produces answer to the question extracted from the context. The questions & contexts are tokenized and encoded, fed as inputs into the transformer model. The answer is extracted from the output of the model which is the most likely start and end tokens in the context, which are then mapped back into words.

# Prerequisites
To run on AzureML, you will need:

- Azure subscription
- Azure Machine Learning Workspace (see this notebook for creation of the workspace if you do not already have one: AzureML configuration notebook)
- the Azure Machine Learning SDK
- the Azure CLI and the Azure Machine learning CLI extension (> version 2.2.2)

The following resources can be of help:

- Understand the [architecture and terms](https://learn.microsoft.com/en-us/azure/machine-learning/concept-azure-machine-learning-v2?tabs=cli) introduced by Azure Machine Learning
- The [Azure Portal](https://portal.azure.com/#home) allows you to track the status of your deployments.

This notebook is made with the reference of examples mentioned on the below links:  
- [Train with custom image](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-train-with-custom-image)
- [Quickstart create resources](https://learn.microsoft.com/en-us/azure/machine-learning/quickstart-create-resources)

# Getting Required Files
We need to get the required files from the repository [here](https://github.com/intel/nlp-training-and-inference-openvino/tree/v1.1).

In [1]:
import os
SCRIPT_DIR = "bert_inference_onnxruntime/"
if not os.path.exists(SCRIPT_DIR):
    os.mkdir(SCRIPT_DIR)

if not os.path.exists(SCRIPT_DIR+"bert_inference_optimum_ort_ovep.py"):
    !cd bert_inference_onnxruntime/ && wget https://raw.githubusercontent.com/intel/nlp-training-and-inference-openvino/v1.1/question-answering-bert-qat/onnxovep_optimum_inference/bert_inference_optimum_ort_ovep.py
if not os.path.exists(SCRIPT_DIR+"input.csv"):
    !cd bert_inference_onnxruntime/ && wget https://raw.githubusercontent.com/intel/nlp-training-and-inference-openvino/v1.1/question-answering-bert-qat/onnxovep_optimum_inference/data/input.csv



# Import necessary Libraries
First we need to import the necessary libraries to perform the desired task

In [2]:
from pathlib import Path
from azureml.core import Workspace
from azureml.core import ScriptRunConfig, Experiment, Environment

# Initialize Workspace

The Azure Machine Learning workspace is the top-level resource for the service. It gives you a centralized place to work with all the artifacts that you create. In the Python SDK, you can access the workspace artifacts by creating a Workspace object.

In [3]:
from azureml.core import Workspace
ws = Workspace.from_config()

# Create or attach a compute target

A Compute target is a machine where we intend to run our code. It can be a compute instance or a compute clusters.  
Here we are using a compute cluster. If the cluster already exists it'll attach our workspace to that cluster, else it'll create a cluster according to the specification mentioned and attach to our workspace.

In [None]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Choose a name for your cluster.
cluster_name = "cpu-clusters4"

try:
    compute_target = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing compute target.')
except ComputeTargetException:
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D16DS_V4',
                                                           max_nodes=4)
    # Create the cluster.
    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)

    compute_target.wait_for_completion(show_output=True)

# Use get_status() to get a detailed status for the current AmlCompute.
print(compute_target.get_status().serialize())

# Getting the Paths right
All scripts & files present in the `script_dir` script folder are uploaded to the compute target, data stores are mounted or copied, and the script is executed.  
Outputs from stdout and the ./logs folder are streamed to the run history and can be used to monitor the run. For further details please refer [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-train-pytorch#what-happens-during-run-execution).

In [5]:
script_dir = "bert_inference_onnxruntime/"
script_name = "bert_inference_optimum_ort_ovep.py"

## Azure ML Environment Declarations

Assigning a name to our environment for easier tracking and monitoring

In [6]:
environment_name = "int8_inf-example"
experiment_name = "int8_inf-test"

## Environment Definition

Environment definition allows us to define a custom Docker Environment with all the required dependencies making sure our script runs as expected.

In [7]:
# Copyright (C) 2022-2023 Intel Corporation
# SPDX-License-Identifier: MIT
# Specify Docker steps as a string. 
dockerfile = r"""
FROM openvino/ubuntu20_runtime:2022.2.0

USER root

RUN apt-get update && apt-get install -y \
    python3.8 \
    python3.8-venv; \
    rm -rf /var/lib/apt/lists/*;

RUN update-alternatives --install /usr/bin/python python /usr/bin/python3.8 70; \
    update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.8 70;

RUN apt-get update
RUN apt-get install -y cifs-utils
RUN python3 -m pip install --upgrade pip
RUN python3 -m pip install --no-cache-dir onnxruntime-openvino==1.13.1 pandas==1.5.2
RUN python3 -m pip install --no-cache-dir optimum==1.5.1

USER openvino

"""

## Creating Environment

In [8]:
env = Environment(environment_name)
env.docker.base_image = None
env.docker.base_dockerfile = dockerfile
env.python.user_managed_dependencies = True

## Mounting Azure Storage for easier Access.
- We need access to the corresponding keys for our storage.
- The keys can be accessed from [here](https://learn.microsoft.com/en-us/azure/storage/common/storage-account-keys-manage?toc=%2Fazure%2Fstorage%2Fblobs%2Ftoc.json&tabs=azure-portal#view-account-access-keys)
- We mount the storage in the container for easy access of large files.

## Creating job config for runnning the model


Defining the set of commands that will be used to perform our desired task.  
    1. Exporting some environment variables which are accessed for execution configurations during inference.  
    2. Running the inference using OpenVINOExecutionProvider

## Azure Credentials
Here we declare variables which will be used to connect to our Azure account. Storage account details such as storage account name and key can be accessed from :- Go to Azure portal > Storage Account >  Security + networking > Access keys. Also file share name can accessed from:- Go to corresponding file shares > Settings > Properties.

In [9]:
# Storage account name
STORAGEACCT= ""
# Storage account key
STORAGEKEY = ''
# Share name
SHARE = ''

MOUNT_PATH = "/mnt/MyAzureFileShare"
modelname = "bert-large-uncased-whole-word-masking-finetuned-squad"

# Path to the Finetuned INT8 ONNX Model directory.
# e.g. modelpath = f"{MOUNT_PATH}/models"
modelpath = f"{MOUNT_PATH}/"


## Azure Storage Paths (Models & Inputs)
Here we are declaring the paths to the quantized model and the inputs file.
The input file is a csv file with 2 columns:
- Context
- Question

This file will be read and inference will be performed and the corresponding outputs will be saved as `outputs.csv`.

## Custom Single Inputs

You also have an option to pass sample inputs for testing.  
In the below cell, the variables `context` and `question` are used to facilitate this.  
These variables are passed as an argument to the inference script. If they are empty strings,
then the default behaviour is to read the `inputs.csv` file, otherwise they will be processed for question answering.


In [10]:
# -------------------------------------------------------------------------- #
# Multiple User Input - using csv file
# Path to Input csv file. In input csv, first field should be context and second field should be question.
# e.g. inputpath = f"{MOUNT_PATH}/input.csv"
inputpath = f"{MOUNT_PATH}/"
# Path to Output csv file.
# e.g. outputpath = f"{MOUNT_PATH}/output.csv"
outputpath = f"{MOUNT_PATH}/"

# -------------------------------------------------------------------------- #
# Single User Input - using context and question parameters
# if you are providing input csv please pass empty string to context and question(e.g context='""' ,question ='""')
context = """ "In its early years, the new convention center failed to meet attendance and revenue expectations.[12] By 2002, many Silicon Valley businesses were choosing the much larger Moscone Center in San Francisco over the San Jose Convention Center due to the latter's limited space. A ballot measure to finance an expansion via a hotel tax failed to reach the required two-thirds majority to pass. In June 2005, Team San Jose built the South Hall, a $6.77 million, blue and white tent, adding 80,000 square feet (7,400 m2) of exhibit space" """
question = """ "how may votes did the ballot measure need?" """
# -------------------------------------------------------------------------- #

provider = "OpenVINOExecutionProvider"

command = """
mkdir -p {mount_path}
mount -t cifs //{storageacct}.file.core.windows.net/{share} /mnt/MyAzureFileShare -o vers=3.0,username={storageacct},password={storagekey},dir_mode=0777,file_mode=0777,serverino

ls -al {modelpath}
echo Checking Model Path...
ls -al {inputpath}

python bert_inference_optimum_ort_ovep.py --modelname {modelname} \
    --modelpath {modelpath} \
    --provider {provider} \
    --inputpath {inputpath} \
    --outputpath {outputpath} \
    --context {context} \
    --question {question}
""".format(modelpath = modelpath,
           modelname = modelname,
           provider = provider,
           mount_path = MOUNT_PATH,
           inputpath = inputpath,
           outputpath = outputpath,
           context=context,
           question=question,
           storageacct = STORAGEACCT,
           storagekey = STORAGEKEY,
           share = SHARE)

# Create job config

Job config allows us to define how we want to execute our training procedure. We need to pass the following informations to `ScriptRunConfig` object to initialize the job config instance.
- `source_directory` $\rightarrow$ All the contents of this source directory are copied to the compute target instance.  
- `command` $\rightarrow$ Our desired set of commands we wish to execute to perform the task.
- `env` $\rightarrow$ Our target environment to execute the `script`
- `compute_target` $\rightarrow$ Our target compute preference (i.e. cluster or instance) to run the `script`

### Running the Script

In [None]:
src = ScriptRunConfig(source_directory=script_dir,
                      command=command,
                      environment=env,
                      compute_target=cluster_name)

## Submit job
After submitting the job, we can see logs from the Outputs + logs tab of the Web View link generated.

In [None]:
exp = Experiment(ws, experiment_name)
run = exp.submit(src, tags={"tag": "OVEP"})
run.wait_for_completion(show_output=True)

### Accessing the user output logs

In [13]:
run.download_file(name=run.get_file_names()[-1], output_file_path=f'logs/'+run.get_file_names()[-1])
with open('logs/'+run.get_file_names()[-1], 'r') as f:
    logs = f.readlines()

### Printing the output logs

In [None]:
for idx, log in enumerate(logs):
    if log.startswith('Inference'):
        break

print(*logs[idx:], sep='\n')