Copyright (C) 2022, Intel Corporation

SPDX-License-Identifier: Apache-2.0

# Quantization Aware Training using AzureML

In this sample we are demonstrating quantization aware training using Neural Networks Compression Framework.   
Here we're using a BERT Model from the HuggingFace hub (transformers library) for Question-Answering usecase.


This notebook is made with the reference of examples mentioned on the below links:  
- https://learn.microsoft.com/en-us/azure/machine-learning/how-to-train-with-custom-image
- https://learn.microsoft.com/en-us/azure/machine-learning/quickstart-create-resources


# Getting Required Files

We need to get the required files from the repository [here](https://github.com/intel/nlp-training-and-inference-openvino/tree/bert_qa_azureml).

In [None]:
import os
SCRIPT_DIR = 'bert_qat/'
if not os.path.exists(SCRIPT_DIR):
    os.mkdir(SCRIPT_DIR)
if not os.path.exists(SCRIPT_DIR+"trainer_qa.py"):
    !cd bert_qat/ && wget https://raw.githubusercontent.com/intel/nlp-training-and-inference-openvino/bert_qa_azureml/question-answering-bert-qat/quantization_aware_training/training_scripts/trainer_qa.py

if not os.path.exists(SCRIPT_DIR+"run_qa.py"):
    !cd bert_qat/ && wget https://raw.githubusercontent.com/intel/nlp-training-and-inference-openvino/bert_qa_azureml/question-answering-bert-qat/quantization_aware_training/training_scripts/run_qa.py

if not os.path.exists(SCRIPT_DIR+"utils_qa.py"):
    !cd bert_qat/ && wget https://raw.githubusercontent.com/intel/nlp-training-and-inference-openvino/bert_qa_azureml/question-answering-bert-qat/quantization_aware_training/training_scripts/utils_qa.py


# Import necessary Libraries

First we need to import the necessary libraries to perform the desired task

In [2]:
import os
from pathlib import Path
from azureml.core import Workspace
from azureml.core import ScriptRunConfig, Experiment, Environment

# Initialize Workspace

The Azure Machine Learning workspace is the top-level resource for the service. It gives you a centralized place to work with all the artifacts that you create. In the Python SDK, you can access the workspace artifacts by creating a Workspace object.

In [3]:
from azureml.core import Workspace
from azureml.core import Datastore
ws = Workspace.from_config()

# Create or attach a compute target

A Compute target is a machine where we intend to run our code. It can be a compute instance or a compute clusters.  
Here we are using a compute cluster. If the cluster already exists it'll attach our workspace to that cluster, else it'll create a cluster according to the specification mentioned and attach to our workspace.

In [None]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Choose a name for your cluster.
cluster_name = "cpu-cluster4"

try:
    compute_target = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing compute target.')
except ComputeTargetException:
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D16DS_V4',
                                                           max_nodes=4)
    # Create the cluster.
    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)

    compute_target.wait_for_completion(show_output=True)

# Use get_status() to get a detailed status for the current AmlCompute.
print(compute_target.get_status().serialize())

# Setting the correct input paths

All scripts & files present in the `script_dir` script folder are uploaded to the compute target, data stores are mounted or copied, and the script is executed.  
Outputs from stdout and the `./logs` folder are streamed to the run history and can be used to monitor the run. For further details please refer [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-train-pytorch#what-happens-during-run-execution).

In [5]:
script_dir = os.path.abspath(SCRIPT_DIR)+"/"
script_name = "run_qa.py"

- # Environment Definition

Environment definition allows us to define a custom Docker Environment with all the required dependencies making sure our script runs as expected.

## Location of the Dockerfile


In [6]:
# Copyright (C) 2021-2022 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
dockerfile = r"""
FROM openvino/ubuntu20_runtime:2022.2.0

USER root

RUN dpkg --get-selections | grep -v deinstall | awk '{print $1}' > base_packages.txt

RUN apt-get update && apt-get install -y wget\
    python3.8 \
    python3.8-venv; \
    rm -rf /var/lib/apt/lists/*;

RUN update-alternatives --install /usr/bin/python python /usr/bin/python3.8 70; \
    update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.8 70;

RUN apt-get update && apt-get install -y git

RUN python -m pip install --upgrade pip

RUN pip install --no-cache-dir "git+https://github.com/huggingface/optimum-intel.git@v1.5.2#egg=optimum-intel[openvino,nncf]"
RUN pip install --no-cache-dir "git+https://github.com/AlexKoff88/nncf_pytorch.git@ak/qdq_per_channel#egg=nncf"
RUN pip install --no-cache-dir "protobuf==3.19.4"
RUN pip install --no-cache-dir "seqeval==1.2.2"
RUN pip install --no-cache-dir "accelerate==0.15.0"
RUN pip install --no-cache-dir "evaluate==0.3.0"
RUN pip install --no-cache-dir "datasets==2.7.1"
RUN pip install --no-cache-dir "torch==1.12.0"
RUN pip install --no-cache-dir "openvino-dev==2022.2.0"
WORKDIR /home/training

RUN chown openvino -R /home/training

USER openvino

RUN ls -l
"""

## Azure ML settings
Assigning a name to our environment for easier tracking and monitoring

In [7]:
environment_name = "qat-example"
experiment_name = "qat-test"

## Creating Environment

In [8]:
env = Environment(environment_name)
env.docker.base_image = None
env.docker.base_dockerfile = dockerfile
env.python.user_managed_dependencies = True

## Arguments for Quantization Aware Training

To run the Quantization Aware Training we need to pass certain arguments to the script. The arguments can be easily passed by combining them together in a list in the below order:  
`arguments = ['--first_arg', first_val, '--second_arg', second_val, ...]`

For further details please refer [here](https://azure.github.io/azureml-cheatsheets/docs/cheatsheets/python/v1/cheatsheet/)

In [9]:
model_name =  "bert-large-uncased-whole-word-masking-finetuned-squad"
arguments = ["--model_name_or_path", model_name,
             "--dataset_name", "squad",
             "--do_train", True,
             "--do_eval", True,
             "--max_seq_length", 256,
             "--per_device_train_batch_size", 3,
             "--max_train_samples",10,
             "--max_eval_samples",10,
             "--learning_rate", 3e-5,
             "--num_train_epochs", 2,
             "--output_dir", "./outputs/bert_finetuned_model"]

## Create job config

Job config allows us to define how we want to execute our training procedure. We need to pass the following informations to `ScriptRunConfig` object to initialize the job config instance.
- `source_directory` $\rightarrow$ All the contents of this source directory are copied to the compute target instance.  
- `script` $\rightarrow$ Our desired python script which we wish to execute  
- `arguments` $\rightarrow$ Necessary arguments for the `script`  
- `env` $\rightarrow$ Our target environment to execute the `script`
- `compute_target` $\rightarrow$ Our target compute preference (i.e. cluster or instance) to run the `script`

In [10]:
src = ScriptRunConfig(source_directory=script_dir,
                      script=script_name,
                      arguments=arguments,
                      environment=env,
                      compute_target=cluster_name)

# Submit job
After submitting the job, we can see logs from the Outputs + logs tab of the Web View link generated. Once job is completed, we can see output model from Web View link > Outputs + logs > outputs

In [None]:
run = Experiment(ws, experiment_name).submit(src)
run.wait_for_completion(show_output=True)

## Download the model outputs

In [None]:
run.get_file_names()
print('filenames',run.get_file_names())
run.download_file(name='./outputs/bert_finetuned_model/model.onnx',
                 output_file_path='./models/model.onnx')
run.download_file(name='./outputs/bert_finetuned_model/config.json',
                 output_file_path='./models/config.json')