# Walk through training a model on Azure ML using DeepSpeed, registering the resulting model

**Goals**: demonstrate how to fine-tune an NLP model on a training cluster with DeepSpeed. 

This notebook provides a minimal example of configuring and launching a model training run on Azure ML using DeepSpeed as an accelerator for distributed training. We rely on a minimal external training script that is intended to be used as a starting point for adaptation to other workflows: `src/train.py`. Here we'll

- Import a configuration file that sets parameters for our training run
- Connect to Azure ML and locate the DeepSpeed environment we've built there
- Configure a run, using a minimal external training script
- Submit our run to a compute cluster
- Look at a completed run and the model it registered 

First we will import the relevant libraries

In [2]:
import yaml
import azureml.core
import transformers

## Using an external config.yml

The `src/config.yml` file contains most of the settings that will need to be customized for a new job type. Reading through it gives a sense of how to configure a run, what sort of information to think about when defining a new experiment. Additionally, by storing values for this notebook and `train.py` in a config file (rather than passing them by argument) we make them far easier to track. Our `train.py` script will ensure that the input config file is copied to the run outputs, preserving it with the run for later reference. 


In [3]:
with open('src/config.yml', 'r') as f:
    config = yaml.safe_load(f)

## Connect to the workspace and environment 

As before, the Azure ML workspace hosts our compute, dataset, future models, and environment. We instantiate a connection to it using a configuration file that is automatically provided on the Azure ML compute instances but which [we must create](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-configure-environment#workspace) if we run this notebook on our own desktop or laptop machine. 

In [4]:
workspace = azureml.core.Workspace.from_config()

The environment we'll connect to is named in the `config.yml` file. It is hosted as a docker image in an AzureContainerRepository and requires no recompilation to start our run, greatly decreasing the time a run takes to begin. We can examine the environment within AzureML studio. 

In [5]:
azureml.core.Environment.list(workspace).keys()

dict_keys(['aml-scikit-learn', 'ray-on-aml-3734189943', 'ray-on-aml-2220177819', 'deepspeed-transformers', 'deepspeed-transformers-datasets', 'AzureML-ACPT-pytorch-1.13-py38-cuda11.7-gpu'])

In [6]:
# environment = azureml.core.Environment.get(workspace, config['environment'])
env_name = "deepspeed-transformers-datasets"
environment = azureml.core.Environment.get(workspace, name=env_name)
print(environment)

Environment(Name: deepspeed-transformers-datasets,
Version: 4)


## Experiment

Connect to (or create) the experiment that will host the training run we'll launch. A single experiment can host many runs, each exploring a different set of parameters, architecture, or other approach to a the same problem. Metrics from multiple runs within a single experiment can be plotted against each other in AzureML studio. 

In [7]:
experiment = azureml.core.Experiment(workspace, config['experiment'])

## Configure and submit the run

Our run requires several configuration components. It is worth examining the relevant entries in the `sec/config.yml` to see what we are passing and the structure of `src/train.py` to see what the training script does on each node. A summary:

- a distributed job config controls the underlying PyTorch parallelization process, here this means the machine and GPUs/machine counts
- a [ScriptRunConfig](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.scriptrunconfig?view=azure-ml-py) that describes the run to the AzureML run controller, where we glue together the compute target, the environment, and the training command
- a command run by the ScriptRunConfig, here a simple call out to `src/train.py`
- an implicit Deepspeed configuration, referenced within the `src/train.py`

In [8]:
distributed_job_config = azureml.core.runconfig.PyTorchConfiguration(**config['pytorch_configuration'])
aml_config = azureml.core.ScriptRunConfig(
             source_directory=config['source_directory'],
             command=config['training_command'],
             environment=environment,
             compute_target=config['compute_target'],
             distributed_job_config=distributed_job_config,
    )

With the run configured we submit it and tag it with metadata tags that will be helpful in understanding it later. Each of these tags is discoverable within the `src/config.yml` which will be included with the run's output, but by writing them here as well they become visible within the AzureML studio interface. We link to the AzureML studio entry for this run in the url below.

In [9]:
run = experiment.submit(aml_config)
run.set_tags({
    "model":config['model'], 
    "task":config['task'],
    "metric":config['metric'],
    "environment":config['environment'],
    "num_train_epochs":str(config['training_args']['num_train_epochs']),
    "batch_size":str(config['training_args']['per_device_train_batch_size'])
})

print(f"View run details:\n{run.get_portal_url()}")

View run details:
https://ml.azure.com/runs/training-quickstart_1701749848_41715b81?wsid=/subscriptions/f3692ca7-e0d1-4bd3-92f8-49832ab6be7d/resourcegroups/eyast-rg/workspaces/xcs224&tid=16b3c013-d300-468d-ac64-7eda0820b6d3


## Working with the run output

Now we wait for that run to complete, checking AzureML studio or calling  `run.get_status()` to verify that it is done.

In [None]:
print(f"Run status: {run.get_status()}")

In [None]:
print(f"Ending {config['metric']}: {run.get_metrics()['eval_'+config['metric']][-1]:.2f}")
print(f"View run details:\n{run.get_portal_url()}")

We can download the model that was registered. The model name and version, uniquely identifying it, are metadata tags associated with the run.

In [None]:
aml_model = azureml.core.Model(workspace,
                               name=run.tags['registered_model_name'],
                               version=run.tags['registered_model_version']
                                    )
aml_model.download(exist_ok=True);

With the model copied to our local drive, we can convert it to ONNX format, push it to the Triton model serving container, ready it for compression, or perform some testing inference with it as we do now. We load the model and the associated tokenizer using the transformers library that wrote them out in `src/train.py` and chain them into a classification pipeline.


In [None]:
tokenizer = transformers.AutoTokenizer.from_pretrained("model")
model = transformers.AutoModelForSequenceClassification.from_pretrained("model", id2label=['negative', 'positive'])
pipeline = transformers.TextClassificationPipeline(model=model, tokenizer=tokenizer)

The task this model was trained on, [CoLA](https://nyu-mll.github.io/CoLA/), attempts to classify sentences as grammatical or ungrammatical. We'll feed it some examples.

In [None]:
sentences = ("i like pie", "books sent other the students")
for sentence in sentences:
    print(f"Is '{sentence}' grammatical?: {pipeline(sentence)[0]['label']}")

In this notebook we've:

- connected to an AzureML workspace
- started a training run using a specified training script and environment
- retrieved results from that run
- used those results to perform some inferential steps