## MLOps with Azure ML Pipelines

ML Pipeline - Training & Registration.  ML Pipelines can help you to build, optimize and manage your machine learning workflow. 

ML Pipelines encapsulate a workflow for a machine learning task.  Tasks often include:
- Data Prep
- Training 
- Publishing Models
- Deployment of Models

First we will set some key variables to be leveraged inside the notebook

In [20]:
registered_env_name = "experiment_env"
experiment_folder = 'exp_pipeline'
dataset_prefix_name = 'exp'
cluster_name = "mm-cluster"

Import required packages

In [21]:
# Import required packages
from azureml.core import Workspace, Experiment, Datastore, Environment, Dataset
from azureml.core.compute import ComputeTarget, AmlCompute, DataFactoryCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.core.runconfig import RunConfiguration
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.runconfig import DEFAULT_CPU_IMAGE
from azureml.pipeline.core import Pipeline, PipelineParameter, PipelineData
from azureml.pipeline.steps import PythonScriptStep
from azureml.pipeline.core import PipelineParameter, PipelineData
from azureml.data.output_dataset_config import OutputTabularDatasetConfig, OutputDatasetConfig, OutputFileDatasetConfig
from azureml.data.datapath import DataPath
from azureml.data.data_reference import DataReference
from azureml.data.sql_data_reference import SqlDataReference
from azureml.pipeline.steps import DataTransferStep
import logging

### Connect to the workspace and create a cluster for running the AML Pipeline

Connect to the AML workspace and the default datastore. To run an AML Pipeline, we will want to create compute if a compute cluster is not already available

In [22]:
# Connect to AML Workspace
ws = Workspace.from_config()

# Get the default datastore
default_ds = ws.get_default_datastore()

#Select AML Compute Cluster
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException


try:
    # Check for existing compute target
    pipeline_cluster = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    # If it doesn't already exist, create it
    try:
        compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS11_V2', max_nodes=2)
        pipeline_cluster = ComputeTarget.create(ws, cluster_name, compute_config)
        pipeline_cluster.wait_for_completion(show_output=True)
    except Exception as ex:
        print(ex)

Found existing cluster, use it.


## Create Run configuration

The RunConfiguration defines the environment used across all the python steps.  There are a variety of ways of setting up an environment.  An environment holds the required python packages needed for your code to execute on a compute cluster

In [23]:
conda_yml_file = '../configuration/environment.yml'

In [24]:
%%writefile $conda_yml_file
name: experiment_env
dependencies:
- python=3.6.2
- scikit-learn
- ipykernel
- matplotlib
- pandas
- pip
- pip:
  - azureml-defaults
  - pyarrow

Overwriting ../configuration/environment.yml


In [25]:
# Create a Python environment for the experiment (from a .yml file)
env = Environment.from_conda_specification("experiment_env", conda_yml_file)


run_config = RunConfiguration()
run_config.docker.use_docker = True
run_config.environment = env
run_config.environment.docker.base_image = DEFAULT_CPU_IMAGE

In [7]:
import os
# Create a folder for the pipeline step files
os.makedirs(experiment_folder, exist_ok=True)

print(experiment_folder)

exp_pipeline


In [8]:
registered_env_name

'experiment_env'

In [9]:
from azureml.core import Environment
from azureml.core.runconfig import RunConfiguration

# Create a Python environment for the experiment (from a .yml file)
experiment_env = Environment.from_conda_specification(registered_env_name, conda_yml_file)

# Register the environment 
experiment_env.register(workspace=ws)
registered_env = Environment.get(ws, registered_env_name)

# Create a new runconfig object for the pipeline
pipeline_run_config = RunConfiguration()

# Use the compute you created above. 
pipeline_run_config.target = pipeline_cluster

# Assign the environment to the run configuration
pipeline_run_config.environment = registered_env

print ("Run configuration created.")

Run configuration created.


## Define Output datasets


The **OutputFileDatasetConfig** object is a special kind of data reference that is used for interim storage locations that can be passed between pipeline steps, so you'll create one and use at as the output for the first step and the input for the second step. Note that you need to pass it as a script argument so your code can access the datastore location referenced by the data reference. 

Note, in all cases we specify the datastore that should hold the datasets and whether they should be registered following step completion or not. This can optionally be disabled by removing the register_on_complete() call.

These can be viewed in the Datasets tab directly in the AML Portal

In [10]:
#get data from storage location and save to exp_raw_data
exp_raw_data       = OutputFileDatasetConfig(name='Exp_Raw_Data', destination=(default_ds, dataset_prefix_name + '_raw_data/{run-id}')).read_delimited_files().register_on_complete(name= dataset_prefix_name + '_Raw_Data')

#data split into testing and training
exp_training_data  = OutputFileDatasetConfig(name='Exp_Training_Data', destination=(default_ds, dataset_prefix_name + '_training_data/{run-id}')).read_delimited_files().register_on_complete(name=dataset_prefix_name + '_Training_Data')
exp_testing_data   = OutputFileDatasetConfig(name='Exp_Testing_Data', destination=(default_ds, dataset_prefix_name + '_testing_data/{run-id}')).read_delimited_files().register_on_complete(name=dataset_prefix_name + '_Testing_Data')

## Define Pipeline Data

Data used in pipeline can be **produced by one step** and **consumed in another step** by providing a PipelineData object as an output of one step and an input of one or more subsequent steps

This can be leveraged for moving a model from one step into another for model evaluation

In [11]:
exp_trained_model_pipeline_data = PipelineData(name='exp_trained_model_pipeline_data', datastore=default_ds)

In [12]:
%%writefile ./$experiment_folder/get_data.py

from azureml.core import Run, Workspace, Datastore, Dataset
from azureml.data.datapath import DataPath
import pandas as pd
import os
import argparse
from sklearn import preprocessing
import numpy as np

#Parse input arguments
#command-line parsing module 
parser = argparse.ArgumentParser("Get data from and register in AML workspace")
parser.add_argument('--exp_raw_data', dest='exp_raw_data', required=True)

args, _ = parser.parse_known_args()
exp_raw_dataset = args.exp_raw_data

#Get current run
current_run = Run.get_context()

#Get associated AML workspace
ws = current_run.experiment.workspace

#Connect to default data store
ds = ws.get_default_datastore()

tab_data_set = Dataset.Tabular.from_delimited_files(path=(ds, 'diabetes-data/*.csv'))

raw_df = tab_data_set.to_pandas_dataframe()

#Make directory on mounted storage
os.makedirs(exp_raw_dataset, exist_ok=True)

#this will allow us to register the dataset on completion
raw_df.to_csv(os.path.join(exp_raw_dataset, 'exp_raw_data.csv'), index=False)

Overwriting ./exp_pipeline/get_data.py


In [13]:
%%writefile ./$experiment_folder/split.py

from azureml.core import Run, Workspace, Datastore, Dataset
from azureml.data.datapath import DataPath
import os
import argparse

import pandas as pd
import numpy as np
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import joblib
from numpy.random import seed

#Parse input arguments
parser = argparse.ArgumentParser("Split raw data into train/test and scale appropriately")

parser.add_argument('--exp_training_data', dest='exp_training_data', required=True)
parser.add_argument('--exp_testing_data', dest='exp_testing_data', required=True)


args, _ = parser.parse_known_args()
exp_training_data = args.exp_training_data
exp_testing_data = args.exp_testing_data


#Get current run
current_run = Run.get_context()

#Get associated AML workspace
ws = current_run.experiment.workspace

# Read input dataset to pandas dataframe
raw_datset = current_run.input_datasets['Exp_Raw_Data']
raw_df = raw_datset.to_pandas_dataframe()


for col in raw_df.columns:
    missing = raw_df[col].isnull()
    num_missing = np.sum(missing)
    if num_missing > 0:  
        raw_df['quality_{}_ismissing'.format(col)] = missing


print(raw_df.columns)

#Split data into training set and test set
df_train, df_test = train_test_split(raw_df, test_size=0.3, random_state=0)





# Save train data to both train and test (reflects the usage pattern in this sample. Note: test/train sets are typically distinct data).
os.makedirs(exp_training_data, exist_ok=True)
os.makedirs(exp_testing_data, exist_ok=True)

df_train.to_csv(os.path.join(exp_training_data, 'exp_training_data.csv'), index=False)
df_test.to_csv(os.path.join(exp_testing_data, 'exp_testing_data.csv'), index=False)


Overwriting ./exp_pipeline/split.py


In [14]:
%%writefile ./$experiment_folder/train.py

from azureml.core import Run, Workspace, Datastore, Dataset
from azureml.data.datapath import DataPath
import os
import argparse
import shutil

import pandas as pd
import numpy as np
from sklearn import preprocessing
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve


import matplotlib.pyplot as plt
import joblib
from numpy.random import seed


#Parse input arguments
parser = argparse.ArgumentParser("Train Logistic Regression model")
parser.add_argument('--exp_trained_model_pipeline_data', dest='exp_trained_model_pipeline_data', required=True)

args, _ = parser.parse_known_args()
exp_trained_model_pipeline_data = args.exp_trained_model_pipeline_data

#Get current run
run = Run.get_context()

#Get associated AML workspace
ws = run.experiment.workspace

# Read input dataset to pandas dataframe
X_train_dataset = run.input_datasets['Exp_Training_Data'].to_pandas_dataframe()
X_test_dataset = run.input_datasets['Exp_Testing_Data'].to_pandas_dataframe()

print(type(X_train_dataset))

# Separate features and labels
X_train, y_train = X_train_dataset[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, X_train_dataset['Diabetic'].values
X_test, y_test   = X_test_dataset[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, X_test_dataset['Diabetic'].values



# Set regularization hyperparameter
reg = 0.01

# Train a logistic regression model
print('Training a logistic regression model with regularization rate of', reg)
run.log('Regularization Rate',  np.float(reg))
model = LogisticRegression(C=1/reg, solver="liblinear").fit(X_train, y_train)

# calculate accuracy
y_hat = model.predict(X_test)
acc = np.average(y_hat == y_test)
print('Accuracy:', acc)
run.log('Accuracy', np.float(acc))

# calculate AUC
y_scores = model.predict_proba(X_test)
auc = roc_auc_score(y_test,y_scores[:,1])
print('AUC: ' + str(auc))
run.log('AUC', np.float(auc))

run.parent.log(name='AUC', value=np.float(auc))
run.parent.log(name='Accuracy', value=np.float(acc))

# Save the trained model in the outputs folder
os.makedirs('./outputs', exist_ok=True)
joblib.dump(value=model, filename='./outputs/diabetes_model.pkl')

#df_train.to_csv(os.path.join(exp_training_data, 'exp_training_data.csv'), index=False)
os.makedirs(exp_trained_model_pipeline_data, exist_ok=True)

shutil.copyfile('./outputs/diabetes_model.pkl', os.path.join(exp_trained_model_pipeline_data, 'diabetes_model.pkl'))



Overwriting ./exp_pipeline/train.py


In [15]:
%%writefile ./$experiment_folder/evaluate_and_register.py

from azureml.core import Run, Workspace, Datastore, Dataset
from azureml.core.model import Model
from azureml.data.datapath import DataPath

import os
import argparse
import shutil

parser = argparse.ArgumentParser("Evaluate model and register if more performant")
parser.add_argument('--exp_trained_model_pipeline_data', type=str, required=True)

args, _ = parser.parse_known_args()
exp_trained_model_pipeline_data = args.exp_trained_model_pipeline_data


#Get current run
run = Run.get_context()

#Get associated AML workspace
ws = run.experiment.workspace

#Get default datastore
ds = ws.get_default_datastore()

#Get metrics associated with current parent run
metrics = run.get_metrics()

print('current run metrics')
for key in metrics.keys():
        print(key, metrics.get(key))
print('\n')


print('parent run metrics')
#Get metrics associated with current parent run
metrics = run.parent.get_metrics()

for key in metrics.keys():
        print(key, metrics.get(key))
print('\n')

current_model_AUC = float(metrics['AUC'])
current_model_accuracy = float(metrics['Accuracy'])

# Get current model from workspace
model_name = 'diabetes_model'
model_description = 'Diabetes model'
model_list = Model.list(ws, name=model_name, latest=True)
first_registration = len(model_list)==0

updated_tags = {'AUC': current_model_AUC}

print('updated tags')
print(updated_tags)

# Copy autoencoder training outputs to relative path for registration
relative_model_path = 'model_files'
run.upload_folder(name=relative_model_path, path=exp_trained_model_pipeline_data)


#If no model exists register the current model
if first_registration:
    print('First model registration.')
    #model = run.register_model(model_name, model_path='model_files', description=model_description, model_framework='sklearn', model_framework_version=tf.__version__, tags=updated_tags, datasets=formatted_datasets, sample_input_dataset = training_dataset)
    run.register_model(model_path=relative_model_path, model_name='diabetes_model',
                   tags=updated_tags,
                   properties={'AUC': current_model_AUC})
else:
    #If a model has been registered previously, check to see if current model 
    #performs better. If so, register it.
    print(dir(model_list[0]))
    if float(model_list[0].tags['AUC']) < current_model_AUC:
        print('New model performs better than existing model. Register it.')
        #model = run.register_model(model_name, model_path='model_files', description=model_description, model_framework='Tensorflow/Keras', model_framework_version=tf.__version__, tags=updated_tags, datasets=formatted_datasets, sample_input_dataset = training_dataset)
        run.register_model(model_path=relative_model_path, model_name='diabetes_model',
                   tags=updated_tags,
                   properties={'AUC': current_model_AUC, 'Accuracy': current_model_accuracy})
    else:
        print('New model does not perform better than existing model. Cancel run.')
        run.cancel()

Overwriting ./exp_pipeline/evaluate_and_register.py


## Create Pipeline steps

In [16]:
#Get raw data from registered
#Register tabular dataset after retrieval
get_data_step = PythonScriptStep(
    name='Get Data',
    script_name='get_data.py',
    arguments =['--exp_raw_data', exp_raw_data],
    outputs=[exp_raw_data],
    compute_target=pipeline_cluster,
    source_directory='./' + experiment_folder,
    allow_reuse=False,
    runconfig=run_config
)

#Normalize the raw data using a MinMaxScaler
#and then split into test and train datasets
split_scale_step = PythonScriptStep(
    name='Split  Raw Data',
    script_name='split.py',
    arguments =['--exp_training_data', exp_training_data,
                '--exp_testing_data', exp_testing_data],
    inputs=[exp_raw_data.as_input(name='Exp_Raw_Data')],
    outputs=[exp_training_data, exp_testing_data],
    compute_target=pipeline_cluster,
    source_directory='./' + experiment_folder,
    allow_reuse=False,
    runconfig=run_config
)

#Train autoencoder using raw data as an input
#Raw data will be preprocessed and registered as train/test datasets
#Scaler and train autoencoder will be saved out
train_model_step = PythonScriptStep(
    name='Train',
    script_name='train.py',
    arguments =['--exp_trained_model_pipeline_data', exp_trained_model_pipeline_data],
    inputs=[exp_training_data.as_input(name='Exp_Training_Data'),
            exp_testing_data.as_input(name='Exp_Testing_Data'),
           ],
    outputs=[exp_trained_model_pipeline_data],
    compute_target=pipeline_cluster,
    source_directory='./' + experiment_folder,
    allow_reuse=False,
    runconfig=run_config
)

#Evaluate and register model here
#Compare metrics from current model and register if better than current
#best model
evaluate_and_register_step = PythonScriptStep(
    name='Evaluate and Register Model',
    script_name='evaluate_and_register.py',
    arguments=['--exp_trained_model_pipeline_data', exp_trained_model_pipeline_data],
    inputs=[ exp_trained_model_pipeline_data.as_input('exp_trained_model_pipeline_data')],
    compute_target=pipeline_cluster,
    source_directory='./' + experiment_folder,
    allow_reuse=False,
    runconfig=run_config
)

## Create Pipeline
Create an Azure ML Pipeline by specifying the steps to be executed. Note: based on the dataset dependencies between steps, exection occurs logically such that no step will execute unless all of the necessary input datasets have been generated.

In [17]:
pipeline = Pipeline(workspace=ws, steps=[get_data_step, split_scale_step, train_model_step, evaluate_and_register_step])

In [18]:
experiment = Experiment(ws, 'diabetes-train-experiment')
run = experiment.submit(pipeline)
run.wait_for_completion(show_output=True)

Created step Get Data [c9140548][855f2e6a-6cc1-4d65-bfe9-afb28cf08bd7], (This step will run and generate new outputs)
Created step Split  Raw Data [f6fec1b6][ad76a5ff-69b9-461a-b507-d8cd2af282ed], (This step will run and generate new outputs)
Created step Train [1c0dfdb4][4ba6d523-f90a-4224-a202-8f5bb9f6deed], (This step will run and generate new outputs)
Created step Evaluate and Register Model [bb29b2a5][fcb4ca5c-4d2a-4dc1-b6d4-87e1687ed0e4], (This step will run and generate new outputs)
Submitted PipelineRun 353eb952-7a5a-4315-b01c-93e38bcd2b56
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/353eb952-7a5a-4315-b01c-93e38bcd2b56?wsid=/subscriptions/5da07161-3770-4a4b-aa43-418cbbb627cf/resourcegroups/mm-aml-dev2-rg/workspaces/mm-aml-dev2&tid=72f988bf-86f1-41af-91ab-2d7cd011db47
PipelineRunId: 353eb952-7a5a-4315-b01c-93e38bcd2b56
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/353eb952-7a5a-4315-b01c-93e38bcd2b56?wsid=/subscriptions/5da07161-3770-4a4b-


Streaming azureml-logs/75_job_post-tvmps_3f38fe6a2e2b2d50a57c30cd65f0e09b774353f334fff87c50398ba50db6f17d_d.txt
[2021-11-16T16:24:34.939925] Entering job release
[2021-11-16T16:24:35.898473] Starting job release
[2021-11-16T16:24:35.899458] Logging experiment finalizing status in history service.[2021-11-16T16:24:35.899766] job release stage : upload_datastore starting...
Starting the daemon thread to refresh tokens in background for process with pid = 239

[2021-11-16T16:24:35.904629] Entering context manager injector.
[2021-11-16T16:24:35.907506] job release stage : upload_datastore completed...
[2021-11-16T16:24:35.907786] job release stage : start importing azureml.history._tracking in run_history_release.
[2021-11-16T16:24:35.908045] job release stage : execute_job_release starting...
[2021-11-16T16:24:35.930232] job release stage : copy_batchai_cached_logs starting...
[2021-11-16T16:24:35.930875] job release stage : copy_batchai_cached_logs completed...
[2021-11-16T16:24:36.0202




StepRunId: 2dfc3702-d486-4ced-b6f6-1d4193a55c82
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/2dfc3702-d486-4ced-b6f6-1d4193a55c82?wsid=/subscriptions/5da07161-3770-4a4b-aa43-418cbbb627cf/resourcegroups/mm-aml-dev2-rg/workspaces/mm-aml-dev2&tid=72f988bf-86f1-41af-91ab-2d7cd011db47
StepRun( Split  Raw Data ) Status: Running

Streaming azureml-logs/55_azureml-execution-tvmps_3f38fe6a2e2b2d50a57c30cd65f0e09b774353f334fff87c50398ba50db6f17d_d.txt
2021-11-16T16:25:56Z Successfully mounted a/an Blobfuse File System at /mnt/batch/tasks/shared/LS_root/jobs/mm-aml-dev2/azureml/2dfc3702-d486-4ced-b6f6-1d4193a55c82/mounts/workspaceblobstore
2021-11-16T16:25:57Z The vmsize standard_ds11_v2 is not a GPU VM, skipping get GPU count by running nvidia-smi command.
2021-11-16T16:25:57Z Starting output-watcher...
2021-11-16T16:25:58Z Executing 'Copy ACR Details file' on 10.0.0.5
2021-11-16T16:25:58Z Copy ACR Details file succeeded on 10.0.0.5. Output: 
>>>   
>>>   
2021-11-16T16:25


StepRun(Split  Raw Data) Execution Summary
StepRun( Split  Raw Data ) Status: Finished
{'runId': '2dfc3702-d486-4ced-b6f6-1d4193a55c82', 'target': 'mm-cluster', 'status': 'Completed', 'startTimeUtc': '2021-11-16T16:25:55.223992Z', 'endTimeUtc': '2021-11-16T16:26:53.978804Z', 'services': {}, 'properties': {'ContentSnapshotId': '7d8c19dd-2959-4a2c-a48d-383b8771e621', 'StepType': 'PythonScriptStep', 'ComputeTargetType': 'AmlCompute', 'azureml.moduleid': 'ad76a5ff-69b9-461a-b507-d8cd2af282ed', 'azureml.runsource': 'azureml.StepRun', 'azureml.nodeid': 'f6fec1b6', 'azureml.pipelinerunid': '353eb952-7a5a-4315-b01c-93e38bcd2b56', 'azureml.pipeline': '353eb952-7a5a-4315-b01c-93e38bcd2b56', 'azureml.pipelineComponent': 'masterescloud', '_azureml.ComputeTargetType': 'amlcompute', 'ProcessInfoFile': 'azureml-logs/process_info.json', 'ProcessStatusFile': 'azureml-logs/process_status.json'}, 'inputDatasets': [{'dataset': {'id': '6f49cc1b-22b4-45d3-ac07-ee58fb43fb13'}, 'consumptionDetails': {'type':




StepRunId: 520fcaf6-80e7-4379-9d3c-3f74160c1b05
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/520fcaf6-80e7-4379-9d3c-3f74160c1b05?wsid=/subscriptions/5da07161-3770-4a4b-aa43-418cbbb627cf/resourcegroups/mm-aml-dev2-rg/workspaces/mm-aml-dev2&tid=72f988bf-86f1-41af-91ab-2d7cd011db47
StepRun( Train ) Status: Running

Streaming azureml-logs/55_azureml-execution-tvmps_3f38fe6a2e2b2d50a57c30cd65f0e09b774353f334fff87c50398ba50db6f17d_d.txt
2021-11-16T16:27:55Z Successfully mounted a/an Blobfuse File System at /mnt/batch/tasks/shared/LS_root/jobs/mm-aml-dev2/azureml/520fcaf6-80e7-4379-9d3c-3f74160c1b05/mounts/workspaceblobstore
2021-11-16T16:27:55Z The vmsize standard_ds11_v2 is not a GPU VM, skipping get GPU count by running nvidia-smi command.
2021-11-16T16:27:55Z Starting output-watcher...
2021-11-16T16:27:55Z IsDedicatedCompute == True, won't poll for Low Pri Preemption
2021-11-16T16:27:56Z Executing 'Copy ACR Details file' on 10.0.0.5
2021-11-16T16:27:56Z Copy ACR De


Streaming azureml-logs/75_job_post-tvmps_3f38fe6a2e2b2d50a57c30cd65f0e09b774353f334fff87c50398ba50db6f17d_d.txt
[2021-11-16T16:28:19.757545] Entering job release
[2021-11-16T16:28:21.016887] Starting job release
[2021-11-16T16:28:21.017985] Logging experiment finalizing status in history service.
Starting the daemon thread to refresh tokens in background for process with pid = 338
[2021-11-16T16:28:21.018522] job release stage : upload_datastore starting...
[2021-11-16T16:28:21.019219] Entering context manager injector.
[2021-11-16T16:28:21.023976] job release stage : start importing azureml.history._tracking in run_history_release.
[2021-11-16T16:28:21.026883] job release stage : copy_batchai_cached_logs starting...
[2021-11-16T16:28:21.033987] job release stage : execute_job_release starting...
[2021-11-16T16:28:21.034342] job release stage : copy_batchai_cached_logs completed...
[2021-11-16T16:28:21.056828] job release stage : upload_datastore completed...
[2021-11-16T16:28:21.1282




StepRunId: c51806c8-9991-4530-a0f8-a979944fd17b
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/c51806c8-9991-4530-a0f8-a979944fd17b?wsid=/subscriptions/5da07161-3770-4a4b-aa43-418cbbb627cf/resourcegroups/mm-aml-dev2-rg/workspaces/mm-aml-dev2&tid=72f988bf-86f1-41af-91ab-2d7cd011db47
StepRun( Evaluate and Register Model ) Status: Running

Streaming azureml-logs/55_azureml-execution-tvmps_3f38fe6a2e2b2d50a57c30cd65f0e09b774353f334fff87c50398ba50db6f17d_d.txt
2021-11-16T16:29:25Z Successfully mounted a/an Blobfuse File System at /mnt/batch/tasks/shared/LS_root/jobs/mm-aml-dev2/azureml/c51806c8-9991-4530-a0f8-a979944fd17b/mounts/workspaceblobstore
2021-11-16T16:29:25Z The vmsize standard_ds11_v2 is not a GPU VM, skipping get GPU count by running nvidia-smi command.
2021-11-16T16:29:25Z Starting output-watcher...
2021-11-16T16:29:25Z IsDedicatedCompute == True, won't poll for Low Pri Preemption
2021-11-16T16:29:25Z Executing 'Copy ACR Details file' on 10.0.0.5
2021-11-16


Streaming azureml-logs/75_job_post-tvmps_3f38fe6a2e2b2d50a57c30cd65f0e09b774353f334fff87c50398ba50db6f17d_d.txt
[2021-11-16T16:29:40.005657] Entering job release
[2021-11-16T16:29:41.317084] Starting job release
[2021-11-16T16:29:41.318396] Logging experiment finalizing status in history service.
Starting the daemon thread to refresh tokens in background for process with pid = 130
[2021-11-16T16:29:41.320123] job release stage : upload_datastore starting...
[2021-11-16T16:29:41.321358] Entering context manager injector.
[2021-11-16T16:29:41.326690] job release stage : start importing azureml.history._tracking in run_history_release.
[2021-11-16T16:29:41.336552] job release stage : execute_job_release starting...
[2021-11-16T16:29:41.359810] job release stage : copy_batchai_cached_logs starting...
[2021-11-16T16:29:41.361988] job release stage : copy_batchai_cached_logs completed...
[2021-11-16T16:29:41.366473] job release stage : upload_datastore completed...
[2021-11-16T16:29:41.4341



PipelineRun Execution Summary
PipelineRun Status: Finished
{'runId': '353eb952-7a5a-4315-b01c-93e38bcd2b56', 'status': 'Completed', 'startTimeUtc': '2021-11-16T16:20:36.444555Z', 'endTimeUtc': '2021-11-16T16:29:53.520925Z', 'services': {}, 'properties': {'azureml.runsource': 'azureml.PipelineRun', 'runSource': 'SDK', 'runType': 'SDK', 'azureml.parameters': '{}', 'azureml.continue_on_step_failure': 'False', 'azureml.pipelineComponent': 'pipelinerun'}, 'inputDatasets': [], 'outputDatasets': [], 'logFiles': {'logs/azureml/executionlogs.txt': 'https://mmamldev24709630538.blob.core.windows.net/azureml/ExperimentRun/dcid.353eb952-7a5a-4315-b01c-93e38bcd2b56/logs/azureml/executionlogs.txt?sv=2019-07-07&sr=b&sig=T%2FiKd3fo8EvxhTJ9WwZQ78bA26zPZF1jZxE4h9DKPo4%3D&skoid=b7a3bef5-d355-4fee-9d06-a4a027869af0&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2021-11-16T15%3A23%3A14Z&ske=2021-11-17T23%3A33%3A14Z&sks=b&skv=2019-07-07&st=2021-11-16T16%3A19%3A54Z&se=2021-11-17T00%3A29%3A54Z&sp=r', 'logs/a

'Finished'

## Publish Pipeline

In [19]:
published_pipeline = pipeline.publish(name = 'Diabetes Training Pipeline',
                                     description = 'Pipeline that generates batch predictions using a registered trained model.',
                                     continue_on_step_failure = False)

In [None]:
published_pipeline

In [None]:
# from azureml.pipeline.core import ScheduleRecurrence, Schedule

# # Submit the Pipeline every Monday at 00:00 UTC
# recurrence = ScheduleRecurrence(frequency="Week", interval=1, week_days=["Monday"], time_of_day="00:00")
# weekly_schedule = Schedule.create(ws, name="weekly-diabetes-training", 
#                                   description="Based on time",
#                                   pipeline_id=published_pipeline.id, 
#                                   experiment_name='mslearn-diabetes-pipeline', 
#                                   recurrence=recurrence)
# print('Pipeline scheduled.')