Copyright (C) 2022 Intel Corporation
 
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
 
http://www.apache.org/licenses/LICENSE-2.0
 
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions
and limitations under the License.
 

SPDX-License-Identifier: Apache-2.0

# General Description

Version: 1.0
Date: Sep 19, 2022

This notebook outlines the general usage of NLP Training using Intel's CPU, PyTorch - with IPEX optimization, and HuggingFace model on Azure Machine Learning platform. A BERT base model is fine-tuned using HuggingFace's trainer class with distributed training and IPEX optimization.

Users may wish to base on parts of the code and customize those to suit their purposes.

# Prerequisite
Log in Azure - please go to the terminal/console and use the command below to login Azure. Follow the instructions shown in the terminal to perform interactive authentication.

Command:
'az login'

# Step 1: Create an Azure workspace environment to perform the training.

Note that some companies may have set some policies in Azure to control the access of the workspaces, storage accounts etc. This may result a deny when creating the workspace using the code provided. If that is the case, users may wish to create the workspace through Azure Machine Learning website manually. 

After creating the workspace , users may download the config.json through the Microsoft Azure ML website by clicking 'the_workspace_name' -> 'Overview' -> 'Download config.json'

After that, copy the config.json to the local environment and re-run the code block below:

In [None]:
from azureml.core import Workspace

try:
    ws = Workspace.from_config('./config.json')
    print('Loaded existing workspace configuration')
except:
    ws = Workspace.create(name='intel_azureml_ws',
            subscription_id='----USER AZURE SUBSCRIPTION ID----',  #Please fill in the azure-subscription-id 
            resource_group='intel_azureml_resource',    #
            create_resource_group=True,
            location='westus2'
            )
    ws.write_config(path="./", file_name="config.json")

# Step 2: Create an environment by building a docker image

The following code will build a docker image for the cluster to load as the runtime environment. The dockerfile contains all the necessary packages for HugingFace model training and with Intel's optimizations.

In [None]:
from azureml.core.environment import Environment
from azureml.core import Image
azure_ddp_ipex_hf_environment = Environment(name="azure_ddp_ipex_hf_environment")

# Specify docker steps as a string. 
dockerfile = r"""
FROM mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04

#Install necessary packages
RUN apt-get update && DEBIAN_FRONTEND=noninteractive TZ=Etc/UTC apt-get -y install tzdata

RUN apt-get update && \
    apt-get install wget -y && \
    apt-get install python3-pip -y && \
    apt-get install libgl1 -y &&\
    apt-get install python3-opencv -y &&\
    apt-get install git -y &&\
    apt-get install build-essential -y &&\
    apt-get install libtool -y &&\
    apt-get install autoconf -y &&\
    apt-get install unzip -y &&\
    apt-get install libssl-dev -y

#Install the PyTorch
RUN pip install torch==1.12.1
RUN pip install cerberus==1.3.4
RUN pip install flatbuffers==2.0
RUN pip install h5py==3.7.0
RUN pip install numpy==1.23.1
RUN pip install packaging==21.3
RUN pip install sympy==1.10.1
RUN pip install setuptools==63.2.0

#Set the environment variable to define protobuf behavior
ENV PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION='python'

#Install the environment
RUN pip install intel-extension-for-pytorch==1.12.100
RUN pip install transformers==4.21.1 
RUN pip install datasets==2.4.0
RUN pip install pandas==1.2.5
RUN pip install PyYAML==5.4.1
RUN pip install neural-compressor==1.14
RUN pip install onnxruntime==1.12.1
RUN pip install onnx==1.12.0
RUN pip install azureml-defaults #Install the latest package for Azure ML inference/training
RUN pip install protobuf==3.20.1 #Need to be in the last to maintain the protobuf version
"""

# Set base image to None, because the image is defined by dockerfile.
azure_ddp_ipex_hf_environment.docker.base_image = None
azure_ddp_ipex_hf_environment.docker.base_dockerfile = dockerfile
azure_ddp_ipex_hf_environment.python.user_managed_dependencies=True
azure_ddp_ipex_hf_environment.register(workspace=ws) #

#Build the docker image
build = azure_ddp_ipex_hf_environment.build(workspace=ws)
build.wait_for_completion(show_output=True)

# Step 3: Retrieve the built docker image and set it as the runtime environment of the cluster

In [None]:
from azureml.core import Workspace, Environment
from azureml.core.environment import Environment
from azureml.core import Image

azure_ddp_ipex_hf_environment = Environment.get(ws, 'azure_ddp_ipex_hf_environment')
azure_ddp_ipex_hf_environment.python.user_managed_dependencies=True

# Step 4: Create a cluster for the distributed training
The following code block will create a cluster for the distributed triannig. Users are encouraged to change the variables - 'node_type' and 'num_of_nodes' to manage the cluster type and size. It is recommended to use Intel's IceLake CPU or higher generations to ultilize oneDNN and VNNI instructions.

More information regarding to the node_type can be found here:
https://docs.microsoft.com/en-us/azure/machine-learning/concept-compute-target

In [None]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Please change the following 3 variables to suit the use case
cpu_cluster_name = "cpuCluster2xD64DSV4"
node_type = 'STANDARD_D64DS_V4'
num_of_nodes = 2

try:
    cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
except ComputeTargetException:
    print('No existing cluster name - ' + cpu_cluster_name + ' . Will start to create a cluster.' )
    compute_config = AmlCompute.provisioning_configuration(vm_size=node_type, max_nodes=num_of_nodes) #Ddsv4-series run on the 3rd Generation Intel® Xeon® Platinum 8370C (Ice Lake) or the Intel® Xeon® Platinum 8272CL (Cascade Lake).
    cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)

cpu_cluster.wait_for_completion(show_output=True)

# Step 5: Start the distributed training
The following code block will initiate the AzureML to load the created cluster and environment to start the training job. It is recommended to use the Gloo backend for the PyTorch DDP training. 

Users are also encouraged to change the content of the train.py to alter the training behavior/suit users' use case. 

Once the training is completed, the trained model should be downloaded automatically under './fp32_model_output'. If this is not the case, users can locate the PyTorch model file (in HuggingFace format) in the webpage of Azure Machine Learning. 'work_space_name' -> 'Jobs' -> 'the_jobs_id' -> 'Outputs + logs' -> 'outputs'.

In [None]:
from azureml.core import Workspace, ScriptRunConfig, Environment, Experiment
from azureml.core.runconfig import MpiConfiguration, PyTorchConfiguration
from azureml.core import Workspace
from azureml.data.datapath import DataPath
from azureml.core import Dataset

distr_config = PyTorchConfiguration(communication_backend='Gloo', process_count=num_of_nodes, node_count=num_of_nodes)
compute_target = cpu_cluster

script_params = [
    '--epochs',
    '3',
    '--model_name',
    'bert-base-uncased',
    '--sm-model-dir',
    '~/output'
]

run_config = ScriptRunConfig(
  source_directory= '../src/training_container/',
  script='train.py',
  compute_target=compute_target,
  environment=azure_ddp_ipex_hf_environment,
  distributed_job_config=distr_config,
  arguments = script_params
)

# submit the run configuration to start the job
run = Experiment(ws, "IntelIPEX_HuggingFace_DDP").submit(run_config)
run.wait_for_completion(show_output=True)
run.get_file_names()
run.download_files(output_directory='./fp32_model_output', output_paths='outputs2')