Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

# BERT fine-tuning with Azure ML

## Parameters

In [1]:
# subscription_id = 'a89228ac-8bf4-4646-9d2c-442b1cb5d622'
# resource_group = 'lauri-ml'
# aml_workspace = 'lauri-ml'
# cluster_name = 'bert-gpu'
# azure_dataset_name = 'Azure Services Dataset'
# azure_dataset_path = 'azure-service-classifier/data'
# azure_dataset_descr = 'Dataset containing azure related posts on Stackoverflow'
# data_dir = 'C:/Users/LauriLehman/Documents/Projects/bert-ms-lauri/1-Training/data'
dataset_remotepath = 'bert-testing/training/'
dataset_name = 'BERT training'
experiment_name = 'bert-classifier-test'
training_script = 'scripts/train.py'
model_output_dir = '../outputs/model'
# local_python_path = 'C:\\Users\\LauriLehman\\AppData\\Local\\Programs\\Python\\Python36\\python.exe'
local_python_path = 'C:/Users/LauriLehman/AppData/Local/Microsoft/WindowsApps/python.exe'

## Dependencies

In [2]:
from azureml.core import Workspace, Dataset, Experiment, Environment
from azureml.core.environment import CondaDependencies # PythonSection, DockerSection
from azureml.core.runconfig import RunConfiguration, DEFAULT_GPU_IMAGE
from azureml.data.data_reference import DataReference
from azureml.widgets import RunDetails
from azureml.train.dnn import TensorFlow

## Environment status check

### Check Python version

In [3]:
import sys

print('Python runtime version: {}.{}.{}'.format(sys.version_info[0], sys.version_info[1], sys.version_info[2]))

Python runtime version: 3.7.6


### Check Azure Machine Learning Python SDK version

This tutorial requires version 1.0.69 or higher. Let's check the version of the SDK:

In [4]:
import azureml.core

print('Azure Machine Learning Python SDK version:', azureml.core.VERSION)

Azure Machine Learning Python SDK version: 1.0.85


## Connect To Workspace

In [None]:
# aml_workspace = Workspace(
#     subscription_id=subscription_id, resource_group=resource_group, workspace_name=aml_workspace
# )
# aml_workspace.write_config()

In [3]:
# from azureml.core.authentication import InteractiveLoginAuthentication
# interactive_auth = InteractiveLoginAuthentication()
# workspace = Workspace.from_config(auth=interactive_auth)

workspace = Workspace.from_config()
print('Workspace name: ' + workspace.name, 
      'Azure region: ' + workspace.location, 
      'Subscription id: ' + workspace.subscription_id, 
      'Resource group: ' + workspace.resource_group, sep = '\n')

Workspace name: lauri-ml
Azure region: westeurope
Subscription id: a89228ac-8bf4-4646-9d2c-442b1cb5d622
Resource group: lauri-ml


## Choose Compute Target

If the compute target has already been created, then you (and other users in your workspace) can directly run this cell.

In [12]:
# compute_target = workspace.compute_targets[cluster_name]

You can also use a local compute target:

In [4]:
local_runcfg = RunConfiguration()
compute_target = local_runcfg.target

In [18]:
# python_cfg = PythonSection()
# python_cfg.interpreter_path = local_python_path
# python_cfg.user_managed_dependencies = True
# aml_env = Environment('local-lauri')
# aml_env.python = python_cfg

# local_runcfg.environment.python.interpreter_path = local_python_path
# local_runcfg.environment.python.user_managed_dependencies = True

In [9]:
# docker_cfg = DockerSection()
# docker_cfg.enabled = True
# # docker_cfg.base_image = 'base-gpu:openmpi3.1.2-cuda10.1-cudnn7-ubuntu18.04'
# # docker_cfg.base_image = 'mcr.microsoft.com\azureml\base-gpu:openmpi3.1.2-cuda10.1-cudnn7-ubuntu18.04'
# docker_cfg.base_image = 'mcr.microsoft.com/azureml/base-gpu:openmpi3.1.2-cuda10.1-cudnn7-ubuntu18.04'
# aml_env = Environment('local-lauri')
# aml_env.docker = docker_cfg

local_runcfg.environment = Environment(name='tfenv')

local_runcfg.environment.docker.enabled = True
docker_image = 'mcr.microsoft.com/azureml/base-gpu:openmpi3.1.2-cuda10.1-cudnn7-ubuntu18.04'
# docker_image = DEFAULT_GPU_IMAGE
local_runcfg.environment.docker.base_image = docker_image
# local_runcfg.environment.python.user_managed_dependencies = True
# local_runcfg.environment.python.interpreter_path = local_python_path
# ====
# conda_dep = CondaDependencies()
# conda_dep.add_conda_package('azureml-dataprep[fuse,pandas]')
# conda_dep.add_conda_package('transformers')
# conda_dep.add_tensorflow_conda_package(core_type='gpu', version=None)
# ====
# local_runcfg.environment.python.conda_dependencies.add_conda_package('azureml-dataprep[fuse,pandas]')
# local_runcfg.environment.python.conda_dependencies.add_conda_package('transformers')
# local_runcfg.environment.python.conda_dependencies.add_tensorflow_conda_package(core_type='gpu', version='2.0')

local_runcfg.environment.python.conda_dependencies.add_pip_package('azureml-dataprep[fuse,pandas]')
local_runcfg.environment.python.conda_dependencies.add_pip_package('transformers')
local_runcfg.environment.python.conda_dependencies.add_tensorflow_pip_package(core_type='gpu', version='2.0')

# local_runcfg.environment.python.conda_dependencies = CondaDependencies('C:/Users/LauriLehman/OneDrive - Kompozure Oy/Projektit/AzureMLDeploy/conda-tfenv-deps.txt')

In [10]:
aml_env = local_runcfg.environment

If the datastore has already been registered, then you (and other users in your workspace) can directly run this cell.

In [8]:
# datastore = workspace.datastores[datastore_name]

If the dataset has already been registered, then you (and other users in your workspace) can directly run this cell.

In [6]:
# azure_dataset = workspace.datasets[azure_dataset_name]

## Debugging in TensorFlow 2.0 Eager Mode

Eager mode is new feature in TensorFlow 2.0 which makes understanding and debugging models easy. Let's start by configuring our remote debugging environment.

#### Configure VS Code Remote connection to Notebook VM

* **ACTION**: Install [Microsoft VS Code](https://code.visualstudio.com/) on your local machine.

* **ACTION**: Follow this [configuration guide](https://github.com/danielsc/azureml-debug-training/blob/master/Setting%20up%20VSCode%20Remote%20on%20an%20AzureML%20Notebook%20VM.md) to setup VS Code Remote connection to Notebook VM.

#### Debug training code using step-by-step debugger

* **ACTION**: Open Remote VS Code session to your Notebook VM.
* **ACTION**: Open file `/home/azureuser/cloudfiles/code/<username>/bert-stack-overflow/1-Training/train_eager.py`.
* **ACTION**: Set break point in the file and start Python debugging session. 


## Perform Experiment

Now that we have our compute target, dataset, and training script working locally, it is time to scale up so that the script can run faster. We will start by creating an [experiment](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.experiment.experiment?view=azure-ml-py). An experiment is a grouping of many runs from a specified script. All runs in this tutorial will be performed under the same experiment. 

In [7]:
experiment = Experiment(workspace, name=experiment_name)

#### Create TensorFlow Estimator

The Azure Machine Learning Python SDK Estimator classes allow you to easily construct run configurations for your experiments. They allow you too define parameters such as the training script to run, the compute target to run it on, framework versions, additional package requirements, etc. 

You can also use a generic [Estimator](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.estimator.estimator?view=azure-ml-py) to submit training scripts that use any learning framework you choose.

For popular libaries like PyTorch and Tensorflow you can use their framework specific estimators. We will use the [TensorFlow Estimator](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.dnn.tensorflow?view=azure-ml-py) for our experiment.

In [8]:
input_name = 'bert_training'

datastore = workspace.get_default_datastore()
dataref = DataReference(
    datastore=datastore,
    data_reference_name=input_name,
    path_on_datastore=dataset_remotepath
)
# dataset = workspace.datasets[dataset_name]

In [11]:
estimator = TensorFlow(
    source_directory='.',
    entry_script=training_script,
    compute_target=compute_target, 
    script_params = {
        # '--data_dir': dataset.as_named_input(input_name).as_mount(),
        # '--data_dir': data_dir,
        '--data_dir': dataref.as_download(),
        '--max_seq_length': 128,
        '--batch_size': 32,
        '--learning_rate': 3e-5,
        '--steps_per_epoch': 150,
        '--num_epochs': 3,
        '--export_dir': model_output_dir
    },
    framework_version='2.0',
    environment_definition=aml_env
    # use_gpu=True,
    # use_docker=False,
    # pip_packages=['transformers', 'azureml-dataprep[fuse,pandas]']
    # pip_packages=['transformers==2.0.0', 'azureml-dataprep[fuse,pandas]==1.1.38']
)



In [12]:
print(estimator.run_config)

{
    "script": "scripts/train.py",
    "arguments": [
        "--data_dir",
        "$AZUREML_DATAREFERENCE_bert_training",
        "--max_seq_length",
        128,
        "--batch_size",
        32,
        "--learning_rate",
        3e-05,
        "--steps_per_epoch",
        150,
        "--num_epochs",
        3,
        "--export_dir",
        "../outputs/model"
    ],
    "target": "local",
    "framework": "Python",
    "communicator": "None",
    "maxRunDurationSeconds": null,
    "nodeCount": 1,
    "environment": {
        "name": "tfenv",
        "version": null,
        "environmentVariables": {
            "EXAMPLE_ENV_VAR": "EXAMPLE_VALUE"
        },
        "python": {
            "userManagedDependencies": false,
            "interpreterPath": "python",
            "condaDependenciesFile": null,
            "baseCondaEnvironment": null,
            "condaDependencies": {
                "name": "project_environment",
                "dependencies": [
                 

A quick description for each of the parameters we have just defined:

- `source_directory`: This specifies the root directory of our source code. 
- `entry_script`: This specifies the training script to run. It should be relative to the source_directory.
- `compute_target`: This specifies to compute target to run the job on. We will use the one created earlier.
- `script_params`: This specifies the input parameters to the training script. Please note:

    1) *azure_dataset.as_named_input('azureservicedata').as_mount()* mounts the dataset to the remote compute and provides the path to the dataset on our datastore. 
    
    2) All outputs from the training script must be outputted to an './outputs' directory as this is the only directory that will be saved to the run. 
    
    
- `framework_version`: This specifies the version of TensorFlow to use. Use Tensorflow.get_supported_verions() to see all supported versions.
- `use_gpu`: This will use the GPU on the compute target for training if set to True.
- `pip_packages`: This allows you to define any additional libraries to install before training.

#### 2) Add Metrics Logging

So we were able to clone a Tensorflow 2.0 project and run it without any changes. However, with larger scale projects we would want to log some metrics in order to make it easier to monitor the performance of our model. 

We can do this by adding a few lines of code into our training script:

```python
# 1) Import SDK Run object
from azureml.core.run import Run

# 2) Get current service context
run = Run.get_context()

# 3) Log the metrics that we want
run.log('val_accuracy', float(logs.get('val_accuracy')))
run.log('accuracy', float(logs.get('accuracy')))
```
We've created a *train_logging.py* script that includes logging metrics as shown above. 

*  **ACTION**: Explore _train_logging.py_ using [Azure ML studio > Notebooks tab](images/azuremlstudio-notebooks-explore.png)

#### 1) Submit First Run 

We can now train our model by submitting the estimator object as a [run](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.run.run?view=azure-ml-py).

Huom! Docker on oltava käynnissä ennen kuin ajo aloitetaan

In [13]:
run = experiment.submit(estimator)

We can view the current status of the run and stream the logs from within the notebook.

In [25]:
RunDetails(run).show()

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

You cancel a run at anytime which will stop the run and scale down the nodes in the compute target.

In [None]:
# run.cancel()

Now if we view the current details of the run, you will notice that the metrics will be logged into graphs.

#### 3) Monitoring metrics with Tensorboard

Tensorboard is a popular Deep Learning Training visualization tool and it's built-in into TensorFlow framework. We can easily add tracking of the metrics in Tensorboard format by adding Tensorboard callback to the **fit** function call.
```python
    # Add callback to record Tensorboard events
    model.fit(train_dataset, epochs=FLAGS.num_epochs, 
              steps_per_epoch=FLAGS.steps_per_epoch, validation_data=valid_dataset, 
              callbacks=[
                  AmlLogger(),
                  tf.keras.callbacks.TensorBoard(update_freq='batch')]
             )
```

#### Launch Tensorboard
Azure ML service provides built-in integration with Tensorboard through **tensorboard** package.

While the run is in progress (or after it has completed), we can start Tensorboard with the run as its target, and it will begin streaming logs.

In [24]:
# from azureml.tensorboard import Tensorboard

# The Tensorboard constructor takes an array of runs, so be sure and pass it in as a single-element array here
# tb = Tensorboard([run])

# If successful, start() returns a string with the URI of the instance.
# tb.start()

https://lauri-notebooks-6006.westeurope.notebooks.azureml.net


'https://lauri-notebooks-6006.westeurope.notebooks.azureml.net'

#### Stop Tensorboard
When you're done, make sure to call the stop() method of the Tensorboard object, or it will stay running even after your job completes.

In [25]:
# tb.stop()

## Check the model performance

Last training run produced model of decent accuracy. Let's test it out and see what it does. First, let's check what files our latest training run produced and download the model files.

#### Download model files

In [26]:
run.get_file_names()

['azureml-logs/20_image_build_log.txt',
 'azureml-logs/55_azureml-execution-tvmps_09fffe7baed25fd27e6e55140e10b7eafd54a9383e58feb74aa4d80ac7920972_d.txt',
 'azureml-logs/65_job_prep-tvmps_09fffe7baed25fd27e6e55140e10b7eafd54a9383e58feb74aa4d80ac7920972_d.txt',
 'azureml-logs/70_driver_log.txt',
 'azureml-logs/75_job_post-tvmps_09fffe7baed25fd27e6e55140e10b7eafd54a9383e58feb74aa4d80ac7920972_d.txt',
 'azureml-logs/process_info.json',
 'azureml-logs/process_status.json',
 'logs/azureml/129_azureml.log',
 'logs/azureml/job_prep_azureml.log',
 'logs/azureml/job_release_azureml.log',
 'logs/train/events.out.tfevents.1581356306.66f9c6bf321f438ca2c8589644c412b6000000.129.11285.v2',
 'logs/train/events.out.tfevents.1581356333.66f9c6bf321f438ca2c8589644c412b6000000.profile-empty',
 'logs/train/plugins/profile/2020-02-10_17-38-53/local.trace',
 'logs/validation/events.out.tfevents.1581356679.66f9c6bf321f438ca2c8589644c412b6000000.129.42775.v2',
 'outputs/model/config.json',
 'outputs/model/tf_mo

In [27]:
run.download_files(prefix='../outputs/model')

# If you haven't finished training the model then just download pre-made model from datastore
# datastore.download('./',prefix="azure-service-classifier/model")

#### Instantiate the model

Next step is to import our model class and instantiate fine-tuned model from the model file.

In [28]:
from model import TFBertForMultiClassification
from transformers import BertTokenizer
import tensorflow as tf

In [29]:
def encode_example(text, max_seq_length):
    # Encode inputs using tokenizer
    inputs = tokenizer.encode_plus(
        text,
        add_special_tokens=True,
        max_length=max_seq_length
        )
    input_ids, token_type_ids = inputs["input_ids"], inputs["token_type_ids"]
    # The mask has 1 for real tokens and 0 for padding tokens. Only real tokens are attended to.
    attention_mask = [1] * len(input_ids)
    # Zero-pad up to the sequence length.
    padding_length = max_seq_length - len(input_ids)
    input_ids = input_ids + ([0] * padding_length)
    attention_mask = attention_mask + ([0] * padding_length)
    token_type_ids = token_type_ids + ([0] * padding_length)
    
    return input_ids, attention_mask, token_type_ids

In [30]:
labels = ['azure-web-app-service', 'azure-storage', 'azure-devops', 'azure-virtual-machine', 'azure-functions']
# Load model and tokenizer
# loaded_model = TFBertForMultiClassification.from_pretrained('azure-service-classifier/model', num_labels=len(labels))
loaded_model = TFBertForMultiClassification.from_pretrained('../outputs/model', num_labels=len(labels))
tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
print("Model loaded from disk.")

Model loaded from disk.


#### Define prediction function

Using the model object we can interpret new questions and predict what Azure service they talk about. To do that conveniently we'll define **predict** function.

In [31]:
# Prediction function
def predict(question):
    input_ids, attention_mask, token_type_ids = encode_example(question, 128)
    predictions = loaded_model.predict({
        'input_ids': tf.convert_to_tensor([input_ids], dtype=tf.int32),
        'attention_mask': tf.convert_to_tensor([attention_mask], dtype=tf.int32),
        'token_type_ids': tf.convert_to_tensor([token_type_ids], dtype=tf.int32)
    })
    prediction = labels[predictions[0].argmax().item()]
    probability = predictions[0].max()
    result = {
        'prediction': str(labels[predictions[0].argmax().item()]),
        'probability': str(predictions[0].max())
    }
    print('Prediction: {}'.format(prediction))
    print('Probability: {}'.format(probability))

#### Experiement with our new model

Now we can easily test responses of the model to new inputs. 
*  **ACTION**: Invent yout own input for one of the 5 services our model understands: 'azure-web-app-service', 'azure-storage', 'azure-devops', 'azure-virtual-machine', 'azure-functions'.

In [32]:
# Route question
predict("How can I specify Service Principal in devops pipeline when deploying virtual machine")

Prediction: azure-devops
Probability: 0.2559393048286438


In [33]:
# Now more tricky cae - the opposite
predict("How can virtual machine trigger devops pipeline")

Prediction: azure-devops
Probability: 0.2656690180301666


## Distributed Training Across Multiple GPUs

Distributed training allows us to train across multiple nodes if your cluster allows it. Azure Machine Learning service helps manage the infrastructure for training distributed jobs. All we have to do is add the following parameters to our estimator object in order to enable this:

- `node_count`: The number of nodes to run this job across. Our cluster has a maximum node limit of 2, so we can set this number up to 2.
- `process_count_per_node`: The number of processes to enable per node. The nodes in our cluster have 2 GPUs each. We will set this value to 2 which will allow us to distribute the load on both GPUs. Using multi-GPUs nodes is benefitial as communication channel bandwidth on local machine is higher.
- `distributed_training`: The backend to use for our distributed job. We will be using an MPI (Message Passing Interface) backend which is used by Horovod framework.

We use [Horovod](https://github.com/horovod/horovod), which is a framework that allows us to easily modifying our existing training script to be run across multiple nodes/GPUs. The distributed training script is saved as *train_horovod.py*.

*  **ACTION**: Explore _train_horovod.py_ using [Azure ML studio > Notebooks tab](images/azuremlstudio-notebooks-explore.png)

We can submit this run in the same way that we did with the others, but with the additional parameters.

In [34]:
from azureml.train.dnn import Mpi

In [37]:
mpi_estimator = TensorFlow(source_directory='./',
                        entry_script='train_horovod.py',compute_target=compute_target,
                        script_params = {
                              '--data_dir': azure_dataset.as_named_input('azureservicedata').as_mount(),
                              '--max_seq_length': 128,
                              '--batch_size': 32,
                              '--learning_rate': 3e-5,
                              '--steps_per_epoch': 150,
                              '--num_epochs': 3,
                              '--export_dir':'./outputs/model'
                        },
                        framework_version='2.0',
                        node_count=1,
                        distributed_training=Mpi(process_count_per_node=2),
                        use_gpu=True,
                        pip_packages=['transformers==2.0.0', 'azureml-dataprep[fuse,pandas]==1.1.38'])

In [None]:
# mpi_run = experiment.submit(mpi_estimator)

Once again, we can view the current details of the run. 

In [38]:
# RunDetails(mpi_run).show()

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

Once the run completes note the time it took. It should be around 5 minutes. As you can see, by moving to the cloud GPUs and using distibuted training we managed to reduce training time of our model from more than an hour to 5 minutes. This greatly improves speed of experimentation and innovation.

## Tune Hyperparameters Using Hyperdrive

So far we have been putting in default hyperparameter values, but in practice we would need tune these values to optimize the performance. Azure Machine Learning service provides many methods for tuning hyperparameters using different strategies.

The first step is to choose the parameter space that we want to search. We have a few choices to make here :

- **Parameter Sampling Method**: This is how we select the combinations of parameters to sample. Azure Machine Learning service offers [RandomParameterSampling](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.hyperdrive.randomparametersampling?view=azure-ml-py), [GridParameterSampling](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.hyperdrive.gridparametersampling?view=azure-ml-py), and [BayesianParameterSampling](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.hyperdrive.bayesianparametersampling?view=azure-ml-py). We will use the `GridParameterSampling` method.
- **Parameters To Search**: We will be searching for optimal combinations of `learning_rate` and `num_epochs`.
- **Parameter Expressions**: This defines the [functions that can be used to describe a hyperparameter search space](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.hyperdrive.parameter_expressions?view=azure-ml-py), which can be discrete or continuous. We will be using a `discrete set of choices`.

The following code allows us to define these options.

In [None]:
from azureml.train.hyperdrive import GridParameterSampling
from azureml.train.hyperdrive.parameter_expressions import choice

In [39]:
param_sampling = GridParameterSampling( {
        '--learning_rate': choice(3e-5, 3e-4),
        '--num_epochs': choice(3, 4)
    }
)

The next step is to a define how we want to measure our performance. We do so by specifying two classes:

- **[PrimaryMetricGoal](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.hyperdrive.primarymetricgoal?view=azure-ml-py)**: We want to `MAXIMIZE` the `val_accuracy` that is logged in our training script.
- **[BanditPolicy](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.hyperdrive.banditpolicy?view=azure-ml-py)**: A policy for early termination so that jobs which don't show promising results will stop automatically.

In [None]:
from azureml.train.hyperdrive import BanditPolicy
from azureml.train.hyperdrive import PrimaryMetricGoal

In [40]:
primary_metric_name='val_accuracy'
primary_metric_goal=PrimaryMetricGoal.MAXIMIZE

early_termination_policy = BanditPolicy(slack_factor = 0.1, evaluation_interval=1, delay_evaluation=2)

We define an estimator as usual, but this time without the script parameters that we are planning to search.

In [41]:
estimator4 = TensorFlow(source_directory='./',
                        entry_script='train_logging.py',
                        compute_target=compute_target,
                        script_params = {
                              '--data_dir': azure_dataset.as_named_input('azureservicedata').as_mount(),
                              '--max_seq_length': 128,
                              '--batch_size': 32,
                              '--steps_per_epoch': 150,
                              '--export_dir':'./outputs/model',
                        },
                        framework_version='2.0',
                        use_gpu=True,
                        pip_packages=['transformers==2.0.0', 'azureml-dataprep[fuse,pandas]==1.1.38'])

Finally, we add all our parameters in a [HyperDriveConfig](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.hyperdrive.hyperdriveconfig?view=azure-ml-py) class and submit it as a run. 

In [None]:
from azureml.train.hyperdrive import HyperDriveConfig

In [42]:
hyperdrive_run_config = HyperDriveConfig(
    estimator=estimator4,
    hyperparameter_sampling=param_sampling, 
    policy=early_termination_policy,
    primary_metric_name=primary_metric_name, 
    primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
    max_total_runs=10,
    max_concurrent_runs=2
)

In [None]:
run4 = experiment.submit(hyperdrive_run_config)

When we view the details of our run this time, we will see information and metrics for every run in our hyperparameter tuning.

In [43]:
RunDetails(run4).show()

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

We can retrieve the best run based on our defined metric.

In [44]:
best_run = run4.get_best_run_by_primary_metric()

## Register Model

A registered [model](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.model(class)?view=azure-ml-py) is a reference to the directory or file that make up your model. After registering a model, you and other people in your workspace can easily gain access to and deploy your model without having to run the training script again. 

We need to define the following parameters to register a model:

- `model_name`: The name for your model. If the model name already exists in the workspace, it will create a new version for the model.
- `model_path`: The path to where the model is stored. In our case, this was the *export_dir* defined in our estimators.
- `description`: A description for the model.

Let's register the best run from our hyperparameter tuning.

In [46]:
# model_name = 'azure-service-classifier'
model_name = experiment_name

In [47]:
model = best_run.register_model(model_name=model_name, 
                                model_path='./outputs/model',
                                datasets=[('train, test, validation data', azure_dataset)],
                                description='BERT model for classifying azure services on stackoverflow posts.')

We have registered the model with Dataset reference. 
* **ACTION**: Check dataset to model link in **Azure ML studio > Datasets tab > Azure Service Dataset**.