# Part 3: Publish Azure Machine Learning Pipeline to Train BERT Model

## Overview of the part 3
For this exercise, we assume that you have trained and deployed a machine learning model and that you are now ready to manage the end-to-end lifecycle of your model. [MLOps](https://docs.microsoft.com/azure/machine-learning/service/concept-model-management-and-deployment) can help you to automatically deploy your model as a web application while implementing quality benchmarks, strict version control, model monitoring, and providing an audit trail.

The different components of the workshop are as follows:

- Part 1: [Preparing Data and Model Training](https://github.com/microsoft/bert-stack-overflow/blob/master/1-Training/AzureServiceClassifier_Training.ipynb)
- Part 2: [Inferencing and Deploying a Model](https://github.com/microsoft/bert-stack-overflow/blob/master/2-Inferencing/AzureServiceClassifier_Inferencing.ipynb)
- Part 3: [Setting Up a Pipeline Using MLOps](https://github.com/microsoft/bert-stack-overflow/tree/master/3-ML-Ops)
- Part 4: [Explaining Your Model Interpretability](https://github.com/microsoft/bert-stack-overflow/blob/master/4-Interpretibility/IBMEmployeeAttritionClassifier_Interpretability.ipynb)

## Connect to Workspace

In [None]:
from azureml.core import Workspace

ws = Workspace.from_config()
print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep = '\n')

## Compute Target

In [None]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

aml_compute_target = "gpu-nc8-t4"
try:
    aml_compute = AmlCompute(ws, aml_compute_target)
    print("found existing compute target.")
except ComputeTargetException:
    print("creating new compute target")
    
    provisioning_config = AmlCompute.provisioning_configuration(vm_size = "STANDARD_D2_V2",
                                                                min_nodes = 0, 
                                                                max_nodes = 2)    
    aml_compute = ComputeTarget.create(ws, aml_compute_target, provisioning_config)
    aml_compute.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)
    
print("Azure Machine Learning Compute attached")

## Pipeline-specific SDK imports

In [None]:
import os
import sys
from azureml.pipeline.core.graph import PipelineParameter
from azureml.pipeline.steps import PythonScriptStep
from azureml.pipeline.core import Pipeline
from azureml.core.runconfig import RunConfiguration, CondaDependencies
from azureml.core import Dataset, Datastore
from azureml.train.dnn import TensorFlow

## Define Parameters for Pipeline

In [None]:
model_name = PipelineParameter(name="model_name", default_value='azure-service-classifier')

max_seq_length = PipelineParameter(name="max_seq_length", default_value=128)

learning_rate = PipelineParameter(name="learning_rate", default_value=3e-5)

num_epochs = PipelineParameter(name="num_epochs", default_value=3)

export_dir = PipelineParameter(name="export_dir", default_value="./outputs/exports")

batch_size = PipelineParameter(name="batch_size", default_value=32)

steps_per_epoch = PipelineParameter(name="steps_per_epoch", default_value=5)

build_id = PipelineParameter(name='build_id', default_value=0)

In [None]:
from azureml.core import Dataset

# Get a dataset by name
train_ds = Dataset.get_by_name(workspace=ws, name='Azure Services Dataset')

## Creating Steps in a Pipeline

In [None]:
from azureml.core.runconfig import RunConfiguration
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core import Environment 

aml_run_config = RunConfiguration()

aml_run_config = RunConfiguration(conda_dependencies=CondaDependencies.create(
    conda_packages=['numpy', 'pandas',
                    'scikit-learn', 'keras'],
    pip_packages=['azureml-core==1.25.0', 
                  'azureml-defaults==1.25.0',
                  'azureml-telemetry==1.25.0',
                  'azureml-train-restclients-hyperdrive==1.25.0',
                  'azureml-train-core==1.25.0',
                  'azureml-dataprep',
                  'tensorflow-gpu==2.0.0',
                  'transformers==2.0.0',
                  "absl-py",
                  "azureml-dataprep",
                  'h5py<3.0.0'])
)

aml_run_config

In [None]:
source_directory = './scripts'

trainStep = PythonScriptStep(name = 'Train_step',
                            script_name = './training/train.py',
                            arguments=['--data_dir', train_ds.as_named_input('azureservicedata').as_mount(),
                              '--max_seq_length', max_seq_length,
                              '--batch_size', batch_size,
                              '--learning_rate', learning_rate,
                              '--steps_per_epoch', steps_per_epoch,
                              '--num_epochs', num_epochs,
                              '--export_dir','./outputs/model'],
                            compute_target = aml_compute,
                            source_directory = source_directory,
                            runconfig = aml_run_config,
                            allow_reuse=False)


In [None]:
evalStep = PythonScriptStep(name = 'Eval_step',
                           script_name = './evaluate/evaluate_model.py',
                           arguments=['--build_id', build_id,
                              '--model_name', model_name],
                            compute_target = aml_compute,
                            source_directory = source_directory,
                            runconfig = aml_run_config,
                            allow_reuse=False)

In [None]:
evalStep.run_after(trainStep)
steps = [evalStep]

In [None]:
train_pipeline = Pipeline(workspace=ws, steps=steps)
train_pipeline.validate()
published_pipeline = train_pipeline.publish(name='AzureServiceClassifier_BERT_Training')