## MLOps with Azure ML Pipelines

ML Pipeline - Training & Registration.  ML Pipelines can help you to build, optimize and manage your machine learning workflow. 

ML Pipelines encapsulate a workflow for a machine learning task.  Tasks often include:
- Data Prep
- Training 
- Publishing Models
- Deployment of Models

First we will set some key variables to be leveraged inside the notebook

In [1]:
registered_env_name = "experiment_env"
experiment_folder = 'remote_train_pipeline'
dataset_prefix_name = 'exp'
cluster_name = "mm-cluster"

Import required packages

In [2]:
# Import required packages
from azureml.core import Workspace, Experiment, Datastore, Environment, Dataset
from azureml.core.compute import ComputeTarget, AmlCompute, DataFactoryCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.core.runconfig import RunConfiguration
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.runconfig import DEFAULT_CPU_IMAGE
from azureml.pipeline.core import Pipeline, PipelineParameter, PipelineData
from azureml.pipeline.steps import PythonScriptStep
from azureml.pipeline.core import PipelineParameter, PipelineData
from azureml.data.output_dataset_config import OutputTabularDatasetConfig, OutputDatasetConfig, OutputFileDatasetConfig
from azureml.data.datapath import DataPath
from azureml.data.data_reference import DataReference
from azureml.data.sql_data_reference import SqlDataReference
from azureml.pipeline.steps import DataTransferStep
import logging
from azureml.core.model import Model
from azureml.exceptions import WebserviceException

### Connect to the workspace and create a cluster for running the AML Pipeline

Connect to the AML workspace and the default datastore. To run an AML Pipeline, we will want to create compute if a compute cluster is not already available

In [3]:
# Connect to AML Workspace
ws = Workspace.from_config()

# Get the default datastore
default_ds = ws.get_default_datastore()

#Select AML Compute Cluster
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException


try:
    # Check for existing compute target
    pipeline_cluster = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    # If it doesn't already exist, create it
    try:
        compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS11_V2', max_nodes=2)
        pipeline_cluster = ComputeTarget.create(ws, cluster_name, compute_config)
        pipeline_cluster.wait_for_completion(show_output=True)
    except Exception as ex:
        print(ex)

Found existing cluster, use it.


In [4]:
try:
    initial_model = Model(ws, 'diabetes_model_remote')
    inital_model_version = initial_model.version
except WebserviceException :
    inital_model_version = 0
print('inital_model_version = ' + str(inital_model_version))

inital_model_version = 13


## Create Run configuration

The RunConfiguration defines the environment used across all the python steps.  There are a variety of ways of setting up an environment.  An environment holds the required python packages needed for your code to execute on a compute cluster

In [5]:
import os
# Create a folder for the pipeline step files
os.makedirs(experiment_folder, exist_ok=True)

print(experiment_folder)

remote_train_pipeline


In [6]:
#conda_yml_file = '../configuration/environment.yml'
conda_yml_file = './'+ experiment_folder+ '/environment.yml'

In [7]:
%%writefile $conda_yml_file
name: experiment_env
dependencies:
- python=3.6.2
- scikit-learn
- ipykernel
- matplotlib
- pandas
- pip
- pip:
  - azureml-defaults
  - pyarrow
  - azureml-monitoring
  - azureml-interpret
  - inference-schema
  - joblib
  - azure-ml-api-sdk

Overwriting ./remote_train_pipeline/environment.yml


In [8]:
# Create a Python environment for the experiment (from a .yml file)
env = Environment.from_conda_specification("experiment_env", conda_yml_file)


run_config = RunConfiguration()
run_config.docker.use_docker = True
run_config.environment = env
run_config.environment.docker.base_image = DEFAULT_CPU_IMAGE

In [9]:
registered_env_name

'experiment_env'

In [10]:
from azureml.core import Environment
from azureml.core.runconfig import RunConfiguration

# Create a Python environment for the experiment (from a .yml file)
experiment_env = Environment.from_conda_specification(registered_env_name, conda_yml_file)

# Register the environment 
experiment_env.register(workspace=ws)
registered_env = Environment.get(ws, registered_env_name)

# Create a new runconfig object for the pipeline
pipeline_run_config = RunConfiguration()

# Use the compute you created above. 
pipeline_run_config.target = pipeline_cluster

# Assign the environment to the run configuration
pipeline_run_config.environment = registered_env

print ("Run configuration created.")

Run configuration created.


## Define Output datasets


The **OutputFileDatasetConfig** object is a special kind of data reference that is used for interim storage locations that can be passed between pipeline steps, so you'll create one and use at as the output for the first step and the input for the second step. Note that you need to pass it as a script argument so your code can access the datastore location referenced by the data reference. 

Note, in all cases we specify the datastore that should hold the datasets and whether they should be registered following step completion or not. This can optionally be disabled by removing the register_on_complete() call.

These can be viewed in the Datasets tab directly in the AML Portal

In [11]:
#get data from storage location and save to exp_raw_data
exp_raw_data       = OutputFileDatasetConfig(name='Exp_Raw_Data', destination=(default_ds, dataset_prefix_name + '_raw_data/{run-id}')).read_delimited_files().register_on_complete(name= dataset_prefix_name + '_Raw_Data')

#data split into testing and training
exp_training_data  = OutputFileDatasetConfig(name='Exp_Training_Data', destination=(default_ds, dataset_prefix_name + '_training_data/{run-id}')).read_delimited_files().register_on_complete(name=dataset_prefix_name + '_Training_Data')
exp_testing_data   = OutputFileDatasetConfig(name='Exp_Testing_Data', destination=(default_ds, dataset_prefix_name + '_testing_data/{run-id}')).read_delimited_files().register_on_complete(name=dataset_prefix_name + '_Testing_Data')

## Define Pipeline Data

Data used in pipeline can be **produced by one step** and **consumed in another step** by providing a PipelineData object as an output of one step and an input of one or more subsequent steps

This can be leveraged for moving a model from one step into another for model evaluation

### Create Python Script Step

In [12]:
get_data_step = PythonScriptStep(
    name='Get Data',
    script_name='get_data.py',
    arguments =['--exp_raw_data', exp_raw_data],
    outputs=[exp_raw_data],
    compute_target=pipeline_cluster,
    source_directory='./' + experiment_folder,
    allow_reuse=False,
    runconfig=pipeline_run_config
)

In [13]:
%%writefile ./$experiment_folder/get_data.py

from azureml.core import Run, Workspace, Datastore, Dataset
from azureml.data.datapath import DataPath
import pandas as pd
import os
import argparse
from sklearn import preprocessing
import numpy as np

#Parse input arguments
#command-line parsing module 
parser = argparse.ArgumentParser("Get data from and register in AML workspace")
parser.add_argument('--exp_raw_data', dest='exp_raw_data', required=True)

args, _ = parser.parse_known_args()
exp_raw_dataset = args.exp_raw_data

#Get current run
current_run = Run.get_context()

#Get associated AML workspace
ws = current_run.experiment.workspace

#Connect to default data store
ds = ws.get_default_datastore()

tab_data_set = Dataset.Tabular.from_delimited_files(path=(ds, 'diabetes-data/*.csv'))

raw_df = tab_data_set.to_pandas_dataframe()

#Make directory on mounted storage
os.makedirs(exp_raw_dataset, exist_ok=True)

#this will allow us to register the dataset on completion
raw_df.to_csv(os.path.join(exp_raw_dataset, 'exp_raw_data.csv'), index=False)

Overwriting ./remote_train_pipeline/get_data.py


### Split Data Step

In [14]:
split_scale_step = PythonScriptStep(
    name='Split  Raw Data',
    script_name='split.py',
    arguments =['--exp_training_data', exp_training_data,
                '--exp_testing_data', exp_testing_data],
    inputs=[exp_raw_data.as_input(name='Exp_Raw_Data')],
    outputs=[exp_training_data, exp_testing_data],
    compute_target=pipeline_cluster,
    source_directory='./' + experiment_folder,
    allow_reuse=False,
    runconfig=pipeline_run_config
)

In [15]:
%%writefile ./$experiment_folder/split.py

from azureml.core import Run, Workspace, Datastore, Dataset
from azureml.data.datapath import DataPath
import os
import argparse

import pandas as pd
import numpy as np
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import joblib
from numpy.random import seed

#Parse input arguments
parser = argparse.ArgumentParser("Split raw data into train/test and scale appropriately")

parser.add_argument('--exp_training_data', dest='exp_training_data', required=True)
parser.add_argument('--exp_testing_data', dest='exp_testing_data', required=True)


args, _ = parser.parse_known_args()
exp_training_data = args.exp_training_data
exp_testing_data = args.exp_testing_data


#Get current run
current_run = Run.get_context()

#Get associated AML workspace
ws = current_run.experiment.workspace

# Read input dataset to pandas dataframe
raw_datset = current_run.input_datasets['Exp_Raw_Data']
raw_df = raw_datset.to_pandas_dataframe()


for col in raw_df.columns:
    missing = raw_df[col].isnull()
    num_missing = np.sum(missing)
    if num_missing > 0:  
        raw_df['quality_{}_ismissing'.format(col)] = missing


print(raw_df.columns)

#Split data into training set and test set
df_train, df_test = train_test_split(raw_df, test_size=0.3, random_state=0)





# Save train data to both train and test (reflects the usage pattern in this sample. Note: test/train sets are typically distinct data).
os.makedirs(exp_training_data, exist_ok=True)
os.makedirs(exp_testing_data, exist_ok=True)

df_train.to_csv(os.path.join(exp_training_data, 'exp_training_data.csv'), index=False)
df_test.to_csv(os.path.join(exp_testing_data, 'exp_testing_data.csv'), index=False)


Overwriting ./remote_train_pipeline/split.py


In [16]:
### TrainingStep

In [17]:
#Raw data will be preprocessed and registered as train/test datasets

model_file = PipelineData(name='model_file', datastore=default_ds)

#by specifying as input, it does not need to be included in the arguments
train_model_step = PythonScriptStep(
    name='Train',
    script_name='train.py',
    arguments =['--model_file_output', model_file],
    inputs=[
            exp_training_data.as_input(name='Exp_Training_Data'),
            exp_testing_data.as_input(name='Exp_Testing_Data'),
           ],
    outputs = [model_file],
    compute_target=pipeline_cluster,
    source_directory='./' + experiment_folder,
    allow_reuse=False,
    runconfig=pipeline_run_config
)


In [18]:
%%writefile ./$experiment_folder/train.py

from azureml.core import Run, Workspace, Datastore, Dataset
from azureml.data.datapath import DataPath
import os
import argparse
import shutil

import pandas as pd
import numpy as np
from sklearn import preprocessing
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve


import matplotlib.pyplot as plt
import joblib
from numpy.random import seed


#Parse input arguments
parser = argparse.ArgumentParser("Train Logistic Regression model")
parser.add_argument('--model_file_output', dest='model_file_output', required=True)


args, _ = parser.parse_known_args()
model_file_output = args.model_file_output

def converttypes(df):
    cols = df.columns
    for c in cols:
        df[c] = pd.to_numeric(df[c], errors = 'coerce')

    print('data types')
    print(df.dtypes)
    return df


#Get current run
run = Run.get_context()

#Get associated AML workspace
ws = run.experiment.workspace

# Read input dataset to pandas dataframe
X_train_dataset = run.input_datasets['Exp_Training_Data'].to_pandas_dataframe()
X_test_dataset = run.input_datasets['Exp_Testing_Data'].to_pandas_dataframe()

X_train_dataset = converttypes(X_train_dataset)
X_test_dataset = converttypes(X_test_dataset)


print(type(X_train_dataset))

# Separate features and labels
X_train, y_train = X_train_dataset[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, X_train_dataset['Diabetic'].values
X_test, y_test   = X_test_dataset[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, X_test_dataset['Diabetic'].values

print('***********')
print(type(X_train))
print(type(X_test))
print(type(y_train[0]))
print(type(y_test[0]))
print('**********')
# Set regularization hyperparameter
reg = 0.01

# Train a logistic regression model
print('Training a logistic regression model with regularization rate of', reg)
run.log('Regularization Rate',  np.float(reg))
model = LogisticRegression(C=1/reg, solver="liblinear").fit(X_train, y_train)

# calculate accuracy
y_hat = model.predict(X_test)
acc = np.average(y_hat == y_test)
print('Accuracy:', acc)
run.log('Accuracy', np.float(acc))

print('y_hat[0] is if type=:' + str(type(y_hat[0])))

# calculate AUC
y_scores = model.predict_proba(X_test)
auc = roc_auc_score(y_test,y_scores[:,1])
print('AUC: ' + str(auc))
run.log('AUC', np.float(auc))

run.parent.log(name='AUC', value=np.float(auc))
run.parent.log(name='Accuracy', value=np.float(acc))

# Save the trained model in the outputs folder
os.makedirs('./outputs', exist_ok=True)
joblib.dump(value=model, filename='./outputs/diabetes_model_remote.pkl')

os.makedirs(model_file_output, exist_ok=True)

shutil.copyfile('./outputs/diabetes_model_remote.pkl', os.path.join(model_file_output, 'diabetes_model_remote.pkl'))


Overwriting ./remote_train_pipeline/train.py


### Evaluate Model Step

In [19]:
#Evaluate and register model here
#Compare metrics from current model and register if better than current
#best model


deploy_file = PipelineData(name='deploy_file', datastore=default_ds)

evaluate_and_register_step = PythonScriptStep(
    name='Evaluate and Register Model',
    script_name='evaluate_and_register.py',
    arguments=[
        '--model_file', model_file,
        '--deploy_file_output', deploy_file,       
    ],
    inputs=[model_file.as_input('model_file'),
            exp_training_data.as_input(name='Exp_Training_Data'),
            exp_testing_data.as_input(name='Exp_Testing_Data')
           ],
    outputs=[ deploy_file],
    compute_target=pipeline_cluster,
    source_directory='./' + experiment_folder,
    allow_reuse=False,
    runconfig=pipeline_run_config
)

In [20]:
%%writefile ./$experiment_folder/evaluate_and_register.py

from azureml.core import Run, Workspace, Datastore, Dataset
from azureml.core.model import Model
from azureml.data.datapath import DataPath

import joblib
import os
import argparse
import shutil
import pandas as pd

from interpret.ext.blackbox import TabularExplainer
from azureml.interpret import ExplanationClient
from azureml.interpret.scoring.scoring_explainer import LinearScoringExplainer, save

from azureml.core.model import InferenceConfig
from azureml.core.compute import ComputeTarget, AksCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.core.webservice import Webservice, AksWebservice


parser = argparse.ArgumentParser("Evaluate model and register if more performant")

parser.add_argument('--model_file', type=str, required=True)
parser.add_argument('--deploy_file_output', type=str, help='File passing in pipeline to deploy')

args, _ = parser.parse_known_args()

deploy_file = args.deploy_file_output
model_file = args.model_file

def converttypes(df):
    cols = df.columns
    for c in cols:
        df[c] = pd.to_numeric(df[c], errors = 'coerce')

    print('data types')
    print(df.dtypes)
    return df

def model_explain():
    #load trinning data
    X_train_dataset = run.input_datasets['Exp_Training_Data'].to_pandas_dataframe()
    X_test_dataset = run.input_datasets['Exp_Testing_Data'].to_pandas_dataframe()
    
    X_test_dataset = converttypes(X_test_dataset)
    X_train_dataset = converttypes(X_train_dataset)
    
    X_train, y_train = X_train_dataset[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, X_train_dataset['Diabetic'].values
    X_test, y_test   = X_test_dataset[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, X_test_dataset['Diabetic'].values

    
    #load the model
    model_list = Model.list(ws, name=model_name, latest=True)
    model_path = model_list[0].download(exist_ok=True)
    model = joblib.load(model_path)

    #https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/explain-model/azure-integration/scoring-time/train_explain.py
    # create an explanation client to store the explanation (contrib API)
    client = ExplanationClient.from_run(run)

    # create an explainer to validate or debug the model
    tabular_explainer = TabularExplainer(model,
                                         initialization_examples=X_train,
                                         features=['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age'],
                                         classes=[0, 1])
                                         #transformations=transformations)

    # explain overall model predictions (global explanation)
    # passing in test dataset for evaluation examples - note it must be a representative sample of the original data
    # more data (e.g. x_train) will likely lead to higher accuracy, but at a time cost
    global_explanation = tabular_explainer.explain_global(X_test)

    # uploading model explanation data for storage or visualization
    comment = 'Global explanation on classification model trained'
    client.upload_model_explanation(global_explanation, comment=comment, model_id=model_reg.id)



#Get current run
run = Run.get_context()

#Get associated AML workspace
ws = run.experiment.workspace

#Get default datastore
ds = ws.get_default_datastore()


#Get metrics associated with current parent run
metrics = run.get_metrics()

print('current run metrics')
for key in metrics.keys():
        print(key, metrics.get(key))
print('\n')


print('parent run metrics')
#Get metrics associated with current parent run
metrics = run.parent.get_metrics()

for key in metrics.keys():
        print(key, metrics.get(key))
print('\n')

current_model_AUC = float(metrics['AUC'])
current_model_accuracy = float(metrics['Accuracy'])

# Get current model from workspace
model_name = 'diabetes_model_remote'
model_description = 'Diabetes model remote'
model_list = Model.list(ws, name=model_name, latest=True)
first_registration = len(model_list)==0

updated_tags = {'AUC': current_model_AUC}

print('updated tags')
print(updated_tags)

# Copy  training outputs to relative path for registration



relative_model_path = 'outputs'
run.upload_folder(name=relative_model_path, path=model_file)



#If no model exists register the current model
if first_registration:
    print('First model registration.')
    model_reg = run.register_model(model_path='outputs/diabetes_model_remote.pkl', model_name=model_name,
                   tags=updated_tags,
                   properties={'AUC': current_model_AUC})

    model_explain()
else:
    #If a model has been registered previously, check to see if current model 
    #performs better. If so, register it.
    print(dir(model_list[0]))
    if float(model_list[0].tags['AUC']) < current_model_AUC:
        print('New model performs better than existing model. Register it.')

        model_reg = run.register_model(model_path='outputs/diabetes_model_remote.pkl', model_name=model_name,
                   tags=updated_tags,
                   properties={'AUC': current_model_AUC, 'Accuracy': current_model_accuracy})

        model_explain()
        
        # Output accuracy to file
        with open(deploy_file, 'w+') as f:
            f.write(('deploy'))
    
    else:
        print('New model does not perform better than existing model. Cancel run.')
        
        with open(deploy_file, 'w+') as f:
            f.write(('no deployment'))
            
        run.cancel()

Overwriting ./remote_train_pipeline/evaluate_and_register.py


## Create Pipeline steps

## Create Pipeline
Create an Azure ML Pipeline by specifying the steps to be executed. Note: based on the dataset dependencies between steps, exection occurs logically such that no step will execute unless all of the necessary input datasets have been generated.

In [21]:
pipeline = Pipeline(workspace=ws, steps=[get_data_step, split_scale_step, train_model_step, evaluate_and_register_step])

In [22]:
experiment = Experiment(ws, 'ML_Automation_RemotePipelineTraining')
run = experiment.submit(pipeline)


Created step Get Data [b2fc1b37][855be803-32d0-4424-9c19-4276de518983], (This step will run and generate new outputs)Created step Split  Raw Data [e7e190a7][9e48b1dd-7be6-47f9-bebb-f9286cf30a5d], (This step will run and generate new outputs)

Created step Train [83d569a3][9325daf1-8f37-4759-bdab-1291375ec2d4], (This step will run and generate new outputs)
Created step Evaluate and Register Model [98fcb753][704f9e18-19df-46cd-9676-49084f871dde], (This step will run and generate new outputs)
Submitted PipelineRun 353cd6c7-1227-437c-8eba-d7c237e23941
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/353cd6c7-1227-437c-8eba-d7c237e23941?wsid=/subscriptions/5da07161-3770-4a4b-aa43-418cbbb627cf/resourcegroups/mm-aml-dev-ops-rg/workspaces/mm-aml-dev-ops&tid=72f988bf-86f1-41af-91ab-2d7cd011db47


In [23]:
run.wait_for_completion(show_output=True)

PipelineRunId: 353cd6c7-1227-437c-8eba-d7c237e23941
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/353cd6c7-1227-437c-8eba-d7c237e23941?wsid=/subscriptions/5da07161-3770-4a4b-aa43-418cbbb627cf/resourcegroups/mm-aml-dev-ops-rg/workspaces/mm-aml-dev-ops&tid=72f988bf-86f1-41af-91ab-2d7cd011db47
PipelineRun Status: Running


StepRunId: 98d9330f-4a07-4e0c-8eda-453e76632931
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/98d9330f-4a07-4e0c-8eda-453e76632931?wsid=/subscriptions/5da07161-3770-4a4b-aa43-418cbbb627cf/resourcegroups/mm-aml-dev-ops-rg/workspaces/mm-aml-dev-ops&tid=72f988bf-86f1-41af-91ab-2d7cd011db47
StepRun( Get Data ) Status: Running

StepRun(Get Data) Execution Summary
StepRun( Get Data ) Status: Finished
{'runId': '98d9330f-4a07-4e0c-8eda-453e76632931', 'target': 'mm-cluster', 'status': 'Completed', 'startTimeUtc': '2022-02-01T05:30:15.251833Z', 'endTimeUtc': '2022-02-01T05:32:08.621234Z', 'services': {}, 'properties': {'ContentSnapshotId': '




StepRunId: ea6d68e4-0c97-49fd-a02b-62b844a32814
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/ea6d68e4-0c97-49fd-a02b-62b844a32814?wsid=/subscriptions/5da07161-3770-4a4b-aa43-418cbbb627cf/resourcegroups/mm-aml-dev-ops-rg/workspaces/mm-aml-dev-ops&tid=72f988bf-86f1-41af-91ab-2d7cd011db47
StepRun( Split  Raw Data ) Status: Running

StepRun(Split  Raw Data) Execution Summary
StepRun( Split  Raw Data ) Status: Finished
{'runId': 'ea6d68e4-0c97-49fd-a02b-62b844a32814', 'target': 'mm-cluster', 'status': 'Completed', 'startTimeUtc': '2022-02-01T05:32:20.286981Z', 'endTimeUtc': '2022-02-01T05:32:45.752697Z', 'services': {}, 'properties': {'ContentSnapshotId': 'c0ad4eb0-f79d-4177-befd-e7f35e581eb1', 'StepType': 'PythonScriptStep', 'ComputeTargetType': 'AmlCompute', 'azureml.moduleid': '9e48b1dd-7be6-47f9-bebb-f9286cf30a5d', 'azureml.moduleName': 'Split  Raw Data', 'azureml.runsource': 'azureml.StepRun', 'azureml.nodeid': 'e7e190a7', 'azureml.pipelinerunid': '353cd6c7-1227-




StepRunId: c8f0500e-4a4a-455a-aefd-0919cceb7b30
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/c8f0500e-4a4a-455a-aefd-0919cceb7b30?wsid=/subscriptions/5da07161-3770-4a4b-aa43-418cbbb627cf/resourcegroups/mm-aml-dev-ops-rg/workspaces/mm-aml-dev-ops&tid=72f988bf-86f1-41af-91ab-2d7cd011db47
StepRun( Train ) Status: Running

StepRun(Train) Execution Summary
StepRun( Train ) Status: Finished
{'runId': 'c8f0500e-4a4a-455a-aefd-0919cceb7b30', 'target': 'mm-cluster', 'status': 'Completed', 'startTimeUtc': '2022-02-01T05:32:55.580788Z', 'endTimeUtc': '2022-02-01T05:33:20.774872Z', 'services': {}, 'properties': {'ContentSnapshotId': 'c0ad4eb0-f79d-4177-befd-e7f35e581eb1', 'StepType': 'PythonScriptStep', 'ComputeTargetType': 'AmlCompute', 'azureml.moduleid': '9325daf1-8f37-4759-bdab-1291375ec2d4', 'azureml.moduleName': 'Train', 'azureml.runsource': 'azureml.StepRun', 'azureml.nodeid': '83d569a3', 'azureml.pipelinerunid': '353cd6c7-1227-437c-8eba-d7c237e23941', 'azureml.pipeli




StepRunId: 60d9fde4-b74f-4bdf-975a-6f31005c034c
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/60d9fde4-b74f-4bdf-975a-6f31005c034c?wsid=/subscriptions/5da07161-3770-4a4b-aa43-418cbbb627cf/resourcegroups/mm-aml-dev-ops-rg/workspaces/mm-aml-dev-ops&tid=72f988bf-86f1-41af-91ab-2d7cd011db47
StepRun( Evaluate and Register Model ) Status: Running

StepRun(Evaluate and Register Model) Execution Summary
StepRun( Evaluate and Register Model ) Status: Finished
{'runId': '60d9fde4-b74f-4bdf-975a-6f31005c034c', 'target': 'mm-cluster', 'status': 'Completed', 'startTimeUtc': '2022-02-01T05:33:32.069644Z', 'endTimeUtc': '2022-02-01T05:33:58.712607Z', 'services': {}, 'properties': {'ContentSnapshotId': 'c0ad4eb0-f79d-4177-befd-e7f35e581eb1', 'StepType': 'PythonScriptStep', 'ComputeTargetType': 'AmlCompute', 'azureml.moduleid': '704f9e18-19df-46cd-9676-49084f871dde', 'azureml.moduleName': 'Evaluate and Register Model', 'azureml.runsource': 'azureml.StepRun', 'azureml.nodeid': '98f



PipelineRun Execution Summary
PipelineRun Status: Finished
{'runId': '353cd6c7-1227-437c-8eba-d7c237e23941', 'status': 'Completed', 'startTimeUtc': '2022-02-01T05:29:54.52319Z', 'endTimeUtc': '2022-02-01T05:34:00.693475Z', 'services': {}, 'properties': {'azureml.runsource': 'azureml.PipelineRun', 'runSource': 'SDK', 'runType': 'SDK', 'azureml.parameters': '{}', 'azureml.continue_on_step_failure': 'False', 'azureml.pipelineComponent': 'pipelinerun'}, 'inputDatasets': [], 'outputDatasets': [], 'logFiles': {'logs/azureml/executionlogs.txt': 'https://mmamldevops9020263291.blob.core.windows.net/azureml/ExperimentRun/dcid.353cd6c7-1227-437c-8eba-d7c237e23941/logs/azureml/executionlogs.txt?sv=2019-07-07&sr=b&sig=VhBcCBZw27%2BtIdBTyjGZ7M4uE6cQ3pGkL56tayjhLr0%3D&skoid=6e96e716-19f5-4664-a48c-bccfc5f7e7f7&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2022-02-01T04%3A33%3A19Z&ske=2022-02-02T12%3A43%3A19Z&sks=b&skv=2019-07-07&st=2022-02-01T05%3A22%3A14Z&se=2022-02-01T13%3A32%3A14Z&sp=r', 'logs/

'Finished'

In [24]:
import json

try:
    final_model = Model(ws, 'diabetes_model_remote')
    final_model_version = final_model.version
except WebserviceException :
    final_model_version = 0
    
print('inital_model_version = ' + str(inital_model_version))
print('final_model_version= ' + str(final_model_version))

status = run.get_status()
run_details = run.get_details()

print((run_details))
print(run_details['runId'])

inital_model_version = 13
final_model_version= 14
{'runId': '353cd6c7-1227-437c-8eba-d7c237e23941', 'status': 'Completed', 'startTimeUtc': '2022-02-01T05:29:54.52319Z', 'endTimeUtc': '2022-02-01T05:34:00.693475Z', 'services': {}, 'properties': {'azureml.runsource': 'azureml.PipelineRun', 'runSource': 'SDK', 'runType': 'SDK', 'azureml.parameters': '{}', 'azureml.continue_on_step_failure': 'False', 'azureml.pipelineComponent': 'pipelinerun'}, 'inputDatasets': [], 'outputDatasets': [], 'logFiles': {'logs/azureml/executionlogs.txt': 'https://mmamldevops9020263291.blob.core.windows.net/azureml/ExperimentRun/dcid.353cd6c7-1227-437c-8eba-d7c237e23941/logs/azureml/executionlogs.txt?sv=2019-07-07&sr=b&sig=VhBcCBZw27%2BtIdBTyjGZ7M4uE6cQ3pGkL56tayjhLr0%3D&skoid=6e96e716-19f5-4664-a48c-bccfc5f7e7f7&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2022-02-01T04%3A33%3A19Z&ske=2022-02-02T12%3A43%3A19Z&sks=b&skv=2019-07-07&st=2022-02-01T05%3A22%3A14Z&se=2022-02-01T13%3A32%3A14Z&sp=r', 'logs/azureml/std

## Compare Results

In [25]:
import json
import shutil
import os

outputfolder = run_details['runId']
os.makedirs(outputfolder, exist_ok=True)

if (final_model_version != inital_model_version):
    print('new model registered')
    with open(os.path.join(outputfolder, 'deploy_details.json'), "w+") as f:
        f.write(str(final_model))
    model_name = 'diabetes_model_remote'
    model_description = 'Diabetes model remote'
    model_list = Model.list(ws, name=model_name, latest=True)
    model_path = model_list[0].download(exist_ok=True)
    shutil.copyfile('diabetes_model_remote.pkl',  os.path.join(outputfolder,'diabetes_model_remote.pkl'))
    
with open(os.path.join(outputfolder, 'run_details.json'), "w+") as f:
    print(run_details)
    f.write(str(run_details))

with open(os.path.join(outputfolder, "run_number.json"), "w+") as f:
    f.write(run_details['runId'])

new model registered
{'runId': '353cd6c7-1227-437c-8eba-d7c237e23941', 'status': 'Completed', 'startTimeUtc': '2022-02-01T05:29:54.52319Z', 'endTimeUtc': '2022-02-01T05:34:00.693475Z', 'services': {}, 'properties': {'azureml.runsource': 'azureml.PipelineRun', 'runSource': 'SDK', 'runType': 'SDK', 'azureml.parameters': '{}', 'azureml.continue_on_step_failure': 'False', 'azureml.pipelineComponent': 'pipelinerun'}, 'inputDatasets': [], 'outputDatasets': [], 'logFiles': {'logs/azureml/executionlogs.txt': 'https://mmamldevops9020263291.blob.core.windows.net/azureml/ExperimentRun/dcid.353cd6c7-1227-437c-8eba-d7c237e23941/logs/azureml/executionlogs.txt?sv=2019-07-07&sr=b&sig=VhBcCBZw27%2BtIdBTyjGZ7M4uE6cQ3pGkL56tayjhLr0%3D&skoid=6e96e716-19f5-4664-a48c-bccfc5f7e7f7&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2022-02-01T04%3A33%3A19Z&ske=2022-02-02T12%3A43%3A19Z&sks=b&skv=2019-07-07&st=2022-02-01T05%3A22%3A14Z&se=2022-02-01T13%3A32%3A14Z&sp=r', 'logs/azureml/stderrlogs.txt': 'https://mmamld