# Accelerate finetuning of GPT2 model for Language Modeling task using ONNX Runtime Training
This notebook contains a walkthrough of using ONNX Runtime Training in Azure Machine Learning service to finetune [GPT2](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) models. This example uses ONNX Runtime Training to fine-tune the GPT2 PyTorch model maintained at https://github.com/huggingface/transformers.
Specificaly, we showcase finetuning the [pretrained GPT2-medium](https://huggingface.co/transformers/pretrained_models.html), which has 345M parameters using ORT.

Steps:
- Intialize an AzureML workspace
- Register a datastore to use preprocessed data for training
- Create an AzureML experiment
- Provision a compute target
- Create a PyTorch Estimator
- Configure and Run

Prerequisites
If you are using an Azure Machine Learning [Compute Instance](https://docs.microsoft.com/en-us/azure/machine-learning/concept-compute-instance) you are all set. Otherwise, you need to setup your environment by installing AzureML Python SDK to run this notebook. Refer to [How to use Estimator in Azure ML](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/training-with-deep-learning/how-to-use-estimator/how-to-use-estimator.ipynb) notebook first if you haven't already to establish your connection to the AzureML Workspace. 

Refer to instructions at https://github.com/microsoft/onnxruntime-training-examples/blob/master/huggingface-gpt2/README.md before running the steps below.

### Check SDK installation

In [None]:
import os
import requests
import sys
import re

# AzureML libraries
import azureml.core
from azureml.core import Experiment, Workspace, Datastore, Run
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.container_registry import ContainerRegistry
from azureml.core.runconfig import MpiConfiguration, RunConfiguration, DEFAULT_GPU_IMAGE
from azureml.train.dnn import PyTorch
from azureml.train.estimator import Estimator
from azureml.widgets import RunDetails

from azure.common.client_factory import get_client_from_cli_profile
from azure.mgmt.containerregistry import ContainerRegistryManagementClient

# Check core SDK version number
print("SDK version:", azureml.core.VERSION)

### AzureML Workspace setup

In [None]:
# Create or retrieve Azure machine learning workspace
# see https://docs.microsoft.com/en-us/python/api/overview/azure/ml/?view=azure-ml-py
ws = Workspace.get(name="myworkspace", subscription_id='<azure-subscription-id>', resource_group='myresourcegroup')

# Print workspace attributes
print('Workspace name: ' + ws.name, 
      'Workspace region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep = '\n')

### Register Datastore
Before running the step below, data prepared using the instructions at https://github.com/microsoft/onnxruntime-training-examples/blob/master/huggingface-gpt2/README.md should be transferred to an Azure Blob container referenced in the `Datastore` registration step. Refer to the documentation at https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-access-data for details on using data in Azure ML experiments.

In [None]:
# Create a datastore from blob storage containing training data.
# Consult README.md for instructions downloading and uploading training data.
ds = Datastore.register_azure_blob_container(workspace=ws, 
                                             datastore_name='<datastore-name>',
                                             account_name='<storage-account-name>', 
                                             account_key='<storage-account-key>',
                                             container_name='<storage-container-name>')

In [None]:
# Print datastore attributes
print('Datastore name: ' + ds.name, 
      'Container name: ' + ds.container_name, 
      'Datastore type: ' + ds.datastore_type, 
      'Workspace name: ' + ds.workspace.name, sep = '\n')

### Create AzureML Compute Cluster
This recipe is supported on Azure Machine Learning Service using 16 x Standard_NC24rs_v3 or 8 x Standard_ND40rs_v2 VMs. In the next step, you will create an AzureML Compute cluster of Standard_NC40s_v2 GPU VMs with the specified name, if it doesn't already exist in your workspace. 

In [None]:
# Create GPU cluster
gpu_cluster_name = "ortgptfinetune" 
try:
    gpu_compute_target = ComputeTarget(workspace=ws, name=gpu_cluster_name)
    print('Found existing compute target.')
except ComputeTargetException:
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='Standard_ND40rs_v2', min_nodes=0, max_nodes=8)
    gpu_compute_target = ComputeTarget.create(ws, gpu_cluster_name, compute_config)
    gpu_compute_target.wait_for_completion(show_output=True)

In [None]:
# Create experiment for training
experiment_name = 'gpt2_medium-ort-finetuning'
experiment = Experiment(ws, name=experiment_name)

### Create Estimator
Notes before running the following step:
* Update the following step to replace two occurences of `<blob-path-to-training-data>` with the actual path in the datastore to the training data.
* If you followed instructions at https://github.com/microsoft/onnxruntime-training-examples/blob/master/huggingface-gpt2/README.md to prepare data, make sure that the data and others files that are not code or config are moved out `workspace` directory. Data files should have been moved to a `Datastore` to use in training. 
* Update the occurance of `<tagged-onnxruntime-gpt-container>` with the tag of the built docker image pushed to a container registry. Similarly, update the `<azure-subscription-id>` and `<container-registry-resource-group>` with the contair registry's subscription ID and resource group.


| VM SKU             | GPU memory   | gpu_count |    ORT_batch_size    |
| ------------------ |:----------------:|:---------:|:-------:|
| Standard_ND40rs_v2 | 32 GB            | 8         | 4   |
| Standard_NC24rs_v3 | 16 GB            | 4         | 1   |



In [None]:
# this directory should contain run_language_modeling_ort.py, after files copied over based on the instructions at https://github.com/microsoft/onnxruntime-training-examples/blob/master/huggingface-gpt2/README.md 
project_folder = '/path/to/onnxruntime-training-examples/huggingface-gpt2/transformers/examples'

container_image = '<tagged-onnxruntime-gpt-container>'
subscription_id = '<azure-subscription-id>'
container_registry_resource_group = '<container-registry-resource-group>'
registry_details = None

acr = re.match('^((\w+).azurecr.io)/(.*)', container_image)
if acr:
    # Extract the relevant parts from the container image
    #   e.g. onnxtraining.azurecr.io/onnxruntime-gpt:latest
    registry_address = acr.group(1) # onnxtraining.azurecr.io
    registry_name = acr.group(2) # onnxtraining
    container_image = acr.group(3) # onnxruntime-gpt:latest

    registry_client = get_client_from_cli_profile(ContainerRegistryManagementClient, subscription_id=subscription_id)
    registry_credentials = registry_client.registries.list_credentials(container_registry_resource_group, registry_name)

    registry_details = ContainerRegistry()
    registry_details.address = registry_address
    registry_details.username = registry_credentials.username
    registry_details.password = registry_credentials.passwords[0].value

# set MPI configuration
# set processes per node to be equal to GPU count on SKU.
# this will change based on NC v/s ND series VMs
mpi = MpiConfiguration()
mpi.process_count_per_node = 8

import uuid
output_id = uuid.uuid1().hex

# Define the script parameters.
# To run using PyTorch instead of ORT, remove the --ort_trainer flag.
script_params = {
    '--model_type' : 'gpt2-medium', 
    '--model_name_or_path' : 'gpt2-medium', 
    '--tokenizer_name' : 'gpt2-medium', 
    '--config_name' : 'gpt2-medium', 
    '--do_eval' : '', 
    '--do_train': '', 
    '--train_data_file' : ds.path('benchmarking/WIKI/wikitext-2/wiki.train.tokens').as_mount(),
    '--eval_data_file' : ds.path('benchmarking/WIKI/wikitext-2/wiki.valid.tokens').as_mount(), 
    '--output_dir' : ds.path(f'output/{experiment_name}/{output_id}/').as_mount(), 
    '--per_gpu_train_batch_size' : '4', 
    '--per_gpu_eval_batch_size' : '4', 
    '--gradient_accumulation_steps' : '4',
    '--block_size' : '1024', 
    '--weight_decay' : '0.01', 
    '--overwrite_output_dir' : '', 
    '--num_train_epocs' : '5',
    '--ort_trainer' : ''
    }

# Define training estimator for ORT run
# Consult https://docs.microsoft.com/en-us/azure/machine-learning/how-to-train-ml-models
# Fill in blob path to training data in argument below
# AzureML Estimator that describes how to run the Experiment
estimator_ort = PyTorch(source_directory=project_folder,

                    # Compute configuration
                    compute_target = gpu_compute_target,
                    node_count=4,
                    distributed_training = mpi,
                    use_gpu = True,
                    
                    # supply Docker image
                    use_docker = True,
                    custom_docker_image = container_image,
                    image_registry_details=registry_details,
                    user_managed = True,
                    
                    # Training script parameters
                    script_params = script_params,
                    entry_script = 'run_language_modeling.py',
                   )

### Run AzureML experiment

In [None]:
# Submit ORT run (check logs from Outputs + logs tab of corresponding link)
run = experiment.submit(estimator_ort)
RunDetails(run).show()
print(run.get_portal_url())