## MLOps with Azure ML Pipelines

ML Pipeline - Training & Registration.  ML Pipelines can help you to build, optimize and manage your machine learning workflow. 

ML Pipelines encapsulate a workflow for a machine learning task.  Tasks often include:
- Data Prep
- Training 
- Publishing Models
- Deployment of Models

First we will set some key variables to be leveraged inside the notebook

In [1]:
registered_env_name = "experiment_env"
experiment_folder = 'exp_train_pipeline'
dataset_prefix_name = 'exp'
cluster_name = "mm-cluster"

Import required packages

In [2]:
# Import required packages
from azureml.core import Workspace, Experiment, Datastore, Environment, Dataset
from azureml.core.compute import ComputeTarget, AmlCompute, DataFactoryCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.core.runconfig import RunConfiguration
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.runconfig import DEFAULT_CPU_IMAGE
from azureml.pipeline.core import Pipeline, PipelineParameter, PipelineData
from azureml.pipeline.steps import PythonScriptStep
from azureml.pipeline.core import PipelineParameter, PipelineData
from azureml.data.output_dataset_config import OutputTabularDatasetConfig, OutputDatasetConfig, OutputFileDatasetConfig
from azureml.data.datapath import DataPath
from azureml.data.data_reference import DataReference
from azureml.data.sql_data_reference import SqlDataReference
from azureml.pipeline.steps import DataTransferStep
import logging

### Connect to the workspace and create a cluster for running the AML Pipeline

Connect to the AML workspace and the default datastore. To run an AML Pipeline, we will want to create compute if a compute cluster is not already available

In [3]:
# Connect to AML Workspace
ws = Workspace.from_config()

# Get the default datastore
default_ds = ws.get_default_datastore()

#Select AML Compute Cluster
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException


try:
    # Check for existing compute target
    pipeline_cluster = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    # If it doesn't already exist, create it
    try:
        compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS11_V2', max_nodes=2)
        pipeline_cluster = ComputeTarget.create(ws, cluster_name, compute_config)
        pipeline_cluster.wait_for_completion(show_output=True)
    except Exception as ex:
        print(ex)

Found existing cluster, use it.


## Create Run configuration

The RunConfiguration defines the environment used across all the python steps.  There are a variety of ways of setting up an environment.  An environment holds the required python packages needed for your code to execute on a compute cluster

Overwriting ../configuration/environment.yml


In [None]:
#copy file from ../configuration/environment.yml to 

In [7]:
import os
# Create a folder for the pipeline step files
os.makedirs(experiment_folder, exist_ok=True)

print(experiment_folder)

exp_train_pipeline


In [77]:
#conda_yml_file = '../configuration/environment.yml'
conda_yml_file = './'+ experiment_folder+ '/environment.yml'

In [78]:
%%writefile $conda_yml_file
name: experiment_env
dependencies:
- python=3.6.2
- scikit-learn
- ipykernel
- matplotlib
- pandas
- pip
- pip:
  - azureml-defaults
  - pyarrow
  - azureml-monitoring
  - azureml-interpret
  - inference-schema
  - joblib
  - azure-ml-api-sdk

Overwriting ./exp_train_pipeline/environment.yml


In [79]:
# Create a Python environment for the experiment (from a .yml file)
env = Environment.from_conda_specification("experiment_env", conda_yml_file)


run_config = RunConfiguration()
run_config.docker.use_docker = True
run_config.environment = env
run_config.environment.docker.base_image = DEFAULT_CPU_IMAGE

In [80]:
registered_env_name

'experiment_env'

In [81]:
from azureml.core import Environment
from azureml.core.runconfig import RunConfiguration

# Create a Python environment for the experiment (from a .yml file)
experiment_env = Environment.from_conda_specification(registered_env_name, conda_yml_file)

# Register the environment 
experiment_env.register(workspace=ws)
registered_env = Environment.get(ws, registered_env_name)

# Create a new runconfig object for the pipeline
pipeline_run_config = RunConfiguration()

# Use the compute you created above. 
pipeline_run_config.target = pipeline_cluster

# Assign the environment to the run configuration
pipeline_run_config.environment = registered_env

print ("Run configuration created.")

Run configuration created.


## Define Output datasets


The **OutputFileDatasetConfig** object is a special kind of data reference that is used for interim storage locations that can be passed between pipeline steps, so you'll create one and use at as the output for the first step and the input for the second step. Note that you need to pass it as a script argument so your code can access the datastore location referenced by the data reference. 

Note, in all cases we specify the datastore that should hold the datasets and whether they should be registered following step completion or not. This can optionally be disabled by removing the register_on_complete() call.

These can be viewed in the Datasets tab directly in the AML Portal

In [82]:
#get data from storage location and save to exp_raw_data
exp_raw_data       = OutputFileDatasetConfig(name='Exp_Raw_Data', destination=(default_ds, dataset_prefix_name + '_raw_data/{run-id}')).read_delimited_files().register_on_complete(name= dataset_prefix_name + '_Raw_Data')

#data split into testing and training
exp_training_data  = OutputFileDatasetConfig(name='Exp_Training_Data', destination=(default_ds, dataset_prefix_name + '_training_data/{run-id}')).read_delimited_files().register_on_complete(name=dataset_prefix_name + '_Training_Data')
exp_testing_data   = OutputFileDatasetConfig(name='Exp_Testing_Data', destination=(default_ds, dataset_prefix_name + '_testing_data/{run-id}')).read_delimited_files().register_on_complete(name=dataset_prefix_name + '_Testing_Data')

## Define Pipeline Data

Data used in pipeline can be **produced by one step** and **consumed in another step** by providing a PipelineData object as an output of one step and an input of one or more subsequent steps

This can be leveraged for moving a model from one step into another for model evaluation

### Create Python Script Step

In [83]:
get_data_step = PythonScriptStep(
    name='Get Data',
    script_name='get_data.py',
    arguments =['--exp_raw_data', exp_raw_data],
    outputs=[exp_raw_data],
    compute_target=pipeline_cluster,
    source_directory='./' + experiment_folder,
    allow_reuse=False,
    runconfig=pipeline_run_config
)

In [84]:
%%writefile ./$experiment_folder/get_data.py

from azureml.core import Run, Workspace, Datastore, Dataset
from azureml.data.datapath import DataPath
import pandas as pd
import os
import argparse
from sklearn import preprocessing
import numpy as np

#Parse input arguments
#command-line parsing module 
parser = argparse.ArgumentParser("Get data from and register in AML workspace")
parser.add_argument('--exp_raw_data', dest='exp_raw_data', required=True)

args, _ = parser.parse_known_args()
exp_raw_dataset = args.exp_raw_data

#Get current run
current_run = Run.get_context()

#Get associated AML workspace
ws = current_run.experiment.workspace

#Connect to default data store
ds = ws.get_default_datastore()

tab_data_set = Dataset.Tabular.from_delimited_files(path=(ds, 'diabetes-data/*.csv'))

raw_df = tab_data_set.to_pandas_dataframe()

#Make directory on mounted storage
os.makedirs(exp_raw_dataset, exist_ok=True)

#this will allow us to register the dataset on completion
raw_df.to_csv(os.path.join(exp_raw_dataset, 'exp_raw_data.csv'), index=False)

Overwriting ./exp_train_pipeline/get_data.py


### Split Data Step

In [85]:
split_scale_step = PythonScriptStep(
    name='Split  Raw Data',
    script_name='split.py',
    arguments =['--exp_training_data', exp_training_data,
                '--exp_testing_data', exp_testing_data],
    inputs=[exp_raw_data.as_input(name='Exp_Raw_Data')],
    outputs=[exp_training_data, exp_testing_data],
    compute_target=pipeline_cluster,
    source_directory='./' + experiment_folder,
    allow_reuse=False,
    runconfig=pipeline_run_config
)

In [86]:
%%writefile ./$experiment_folder/split.py

from azureml.core import Run, Workspace, Datastore, Dataset
from azureml.data.datapath import DataPath
import os
import argparse

import pandas as pd
import numpy as np
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import joblib
from numpy.random import seed

#Parse input arguments
parser = argparse.ArgumentParser("Split raw data into train/test and scale appropriately")

parser.add_argument('--exp_training_data', dest='exp_training_data', required=True)
parser.add_argument('--exp_testing_data', dest='exp_testing_data', required=True)


args, _ = parser.parse_known_args()
exp_training_data = args.exp_training_data
exp_testing_data = args.exp_testing_data


#Get current run
current_run = Run.get_context()

#Get associated AML workspace
ws = current_run.experiment.workspace

# Read input dataset to pandas dataframe
raw_datset = current_run.input_datasets['Exp_Raw_Data']
raw_df = raw_datset.to_pandas_dataframe()


for col in raw_df.columns:
    missing = raw_df[col].isnull()
    num_missing = np.sum(missing)
    if num_missing > 0:  
        raw_df['quality_{}_ismissing'.format(col)] = missing


print(raw_df.columns)

#Split data into training set and test set
df_train, df_test = train_test_split(raw_df, test_size=0.3, random_state=0)





# Save train data to both train and test (reflects the usage pattern in this sample. Note: test/train sets are typically distinct data).
os.makedirs(exp_training_data, exist_ok=True)
os.makedirs(exp_testing_data, exist_ok=True)

df_train.to_csv(os.path.join(exp_training_data, 'exp_training_data.csv'), index=False)
df_test.to_csv(os.path.join(exp_testing_data, 'exp_testing_data.csv'), index=False)


Overwriting ./exp_train_pipeline/split.py


In [87]:
### TrainingStep

In [88]:
#Raw data will be preprocessed and registered as train/test datasets

model_file = PipelineData(name='model_file', datastore=default_ds)

#by specifying as input, it does not need to be included in the arguments
train_model_step = PythonScriptStep(
    name='Train',
    script_name='train.py',
    arguments =['--model_file_output', model_file],
    inputs=[
            exp_training_data.as_input(name='Exp_Training_Data'),
            exp_testing_data.as_input(name='Exp_Testing_Data'),
           ],
    outputs = [model_file],
    compute_target=pipeline_cluster,
    source_directory='./' + experiment_folder,
    allow_reuse=False,
    runconfig=pipeline_run_config
)


In [89]:
%%writefile ./$experiment_folder/train.py

from azureml.core import Run, Workspace, Datastore, Dataset
from azureml.data.datapath import DataPath
import os
import argparse
import shutil

import pandas as pd
import numpy as np
from sklearn import preprocessing
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve


import matplotlib.pyplot as plt
import joblib
from numpy.random import seed


#Parse input arguments
parser = argparse.ArgumentParser("Train Logistic Regression model")
parser.add_argument('--model_file_output', dest='model_file_output', required=True)


args, _ = parser.parse_known_args()
model_file_output = args.model_file_output


#Get current run
run = Run.get_context()

#Get associated AML workspace
ws = run.experiment.workspace

# Read input dataset to pandas dataframe
X_train_dataset = run.input_datasets['Exp_Training_Data'].to_pandas_dataframe()
X_test_dataset = run.input_datasets['Exp_Testing_Data'].to_pandas_dataframe()

print(type(X_train_dataset))

# Separate features and labels
X_train, y_train = X_train_dataset[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, X_train_dataset['Diabetic'].values
X_test, y_test   = X_test_dataset[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, X_test_dataset['Diabetic'].values



# Set regularization hyperparameter
reg = 0.01

# Train a logistic regression model
print('Training a logistic regression model with regularization rate of', reg)
run.log('Regularization Rate',  np.float(reg))
model = LogisticRegression(C=1/reg, solver="liblinear").fit(X_train, y_train)

# calculate accuracy
y_hat = model.predict(X_test)
acc = np.average(y_hat == y_test)
print('Accuracy:', acc)
run.log('Accuracy', np.float(acc))

# calculate AUC
y_scores = model.predict_proba(X_test)
auc = roc_auc_score(y_test,y_scores[:,1])
print('AUC: ' + str(auc))
run.log('AUC', np.float(auc))

run.parent.log(name='AUC', value=np.float(auc))
run.parent.log(name='Accuracy', value=np.float(acc))

# Save the trained model in the outputs folder
os.makedirs('./outputs', exist_ok=True)
joblib.dump(value=model, filename='./outputs/diabetes_model_remote.pkl')

os.makedirs(model_file_output, exist_ok=True)

shutil.copyfile('./outputs/diabetes_model_remote.pkl', os.path.join(model_file_output, 'diabetes_model_remote.pkl'))


Overwriting ./exp_train_pipeline/train.py


### Evaluate Model Step

In [90]:
#Evaluate and register model here
#Compare metrics from current model and register if better than current
#best model


deploy_file = PipelineData(name='deploy_file', datastore=default_ds)

evaluate_and_register_step = PythonScriptStep(
    name='Evaluate and Register Model',
    script_name='evaluate_and_register.py',
    arguments=[
        '--model_file', model_file,
        '--deploy_file_output', deploy_file,       
    ],
    inputs=[model_file.as_input('model_file'),
            exp_training_data.as_input(name='Exp_Training_Data'),
            exp_testing_data.as_input(name='Exp_Testing_Data')
           ],
    outputs=[ deploy_file],
    compute_target=pipeline_cluster,
    source_directory='./' + experiment_folder,
    allow_reuse=False,
    runconfig=pipeline_run_config
)

In [91]:
%%writefile ./$experiment_folder/evaluate_and_register.py

from azureml.core import Run, Workspace, Datastore, Dataset
from azureml.core.model import Model
from azureml.data.datapath import DataPath

import joblib
import os
import argparse
import shutil
import pandas as pd

from interpret.ext.blackbox import TabularExplainer
from azureml.interpret import ExplanationClient
from azureml.interpret.scoring.scoring_explainer import LinearScoringExplainer, save

from azureml.core.model import InferenceConfig
from azureml.core.compute import ComputeTarget, AksCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.core.webservice import Webservice, AksWebservice


parser = argparse.ArgumentParser("Evaluate model and register if more performant")

parser.add_argument('--model_file', type=str, required=True)
parser.add_argument('--deploy_file_output', type=str, help='File passing in pipeline to deploy')

args, _ = parser.parse_known_args()

deploy_file = args.deploy_file_output
model_file = args.model_file

def converttypes(df):
    cols = df.columns
    for c in cols:
        df[c] = pd.to_numeric(df[c], errors = 'coerce')

    print('data types')
    print(df.dtypes)
    return df

def model_explain():
    #load trinning data
    X_train_dataset = run.input_datasets['Exp_Training_Data'].to_pandas_dataframe()
    X_test_dataset = run.input_datasets['Exp_Testing_Data'].to_pandas_dataframe()
    
    X_test_dataset = converttypes(X_test_dataset)
    X_train_dataset = converttypes(X_train_dataset)
    
    X_train, y_train = X_train_dataset[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, X_train_dataset['Diabetic'].values
    X_test, y_test   = X_test_dataset[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, X_test_dataset['Diabetic'].values

    
    #load the model
    model_list = Model.list(ws, name=model_name, latest=True)
    model_path = model_list[0].download(exist_ok=True)
    model = joblib.load(model_path)

    #https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/explain-model/azure-integration/scoring-time/train_explain.py
    # create an explanation client to store the explanation (contrib API)
    client = ExplanationClient.from_run(run)

    # create an explainer to validate or debug the model
    tabular_explainer = TabularExplainer(model,
                                         initialization_examples=X_train,
                                         features=['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age'],
                                         classes=[0, 1])
                                         #transformations=transformations)

    # explain overall model predictions (global explanation)
    # passing in test dataset for evaluation examples - note it must be a representative sample of the original data
    # more data (e.g. x_train) will likely lead to higher accuracy, but at a time cost
    global_explanation = tabular_explainer.explain_global(X_test)

    # uploading model explanation data for storage or visualization
    comment = 'Global explanation on classification model trained'
    client.upload_model_explanation(global_explanation, comment=comment, model_id=model_reg.id)



#Get current run
run = Run.get_context()

#Get associated AML workspace
ws = run.experiment.workspace

#Get default datastore
ds = ws.get_default_datastore()


#Get metrics associated with current parent run
metrics = run.get_metrics()

print('current run metrics')
for key in metrics.keys():
        print(key, metrics.get(key))
print('\n')


print('parent run metrics')
#Get metrics associated with current parent run
metrics = run.parent.get_metrics()

for key in metrics.keys():
        print(key, metrics.get(key))
print('\n')

current_model_AUC = float(metrics['AUC'])
current_model_accuracy = float(metrics['Accuracy'])

# Get current model from workspace
model_name = 'diabetes_model_remote'
model_description = 'Diabetes model remote'
model_list = Model.list(ws, name=model_name, latest=True)
first_registration = len(model_list)==0

updated_tags = {'AUC': current_model_AUC}

print('updated tags')
print(updated_tags)

# Copy  training outputs to relative path for registration



relative_model_path = 'outputs'
run.upload_folder(name=relative_model_path, path=model_file)



#If no model exists register the current model
if first_registration:
    print('First model registration.')
    model_reg = run.register_model(model_path='outputs/diabetes_model_remote.pkl', model_name=model_name,
                   tags=updated_tags,
                   properties={'AUC': current_model_AUC})

    #model_explain()
else:
    #If a model has been registered previously, check to see if current model 
    #performs better. If so, register it.
    print(dir(model_list[0]))
    if float(model_list[0].tags['AUC']) < current_model_AUC:
        print('New model performs better than existing model. Register it.')

        model_reg = run.register_model(model_path='outputs/diabetes_model_remote.pkl', model_name=model_name,
                   tags=updated_tags,
                   properties={'AUC': current_model_AUC, 'Accuracy': current_model_accuracy})

        #model_explain()
        
        # Output accuracy to file
        with open(deploy_file, 'w+') as f:
            f.write(('deploy'))
    
    else:
        print('New model does not perform better than existing model. Cancel run.')
        
        with open(deploy_file, 'w+') as f:
            f.write(('no deployment'))
            
        run.cancel()

Overwriting ./exp_train_pipeline/evaluate_and_register.py


### Deploy ACI

In [92]:
exp_deploy_pipeline_data = PipelineData(
        name='scoring_url_file', 
        pipeline_output_name='scoring_url_file',
        datastore=default_ds,
        output_mode='mount',
        is_directory=False)

aci_service_name = 'diabetes-service-remote-training'
registered_model_name = 'diabetes_model_remote'

env_name = PipelineParameter(name='environment_name', default_value=registered_env_name)
service_name = PipelineParameter(name='service_name', default_value=aci_service_name)
model_name = PipelineParameter(name='model_name', default_value=registered_model_name)

deploy_test = PythonScriptStep(
    name='Deploy to ACI',
    script_name='deployACI.py',
    arguments=[
        '--deploy_file', deploy_file,
        '--environment_name', env_name,
        '--service_name', service_name,
        '--model_name', model_name
    ],
    inputs=[deploy_file.as_input('deploy_file')
    ],
    outputs=[exp_deploy_pipeline_data],
    compute_target=pipeline_cluster,
    source_directory='./' + experiment_folder,
    allow_reuse=False,
    runconfig=pipeline_run_config
)

In [104]:
%%writefile ./$experiment_folder/deployACI.py

import argparse
from azureml.core import Workspace, Environment
from azureml.core.model import Model
from azureml.core.run import Run
from azureml.core.model import InferenceConfig
from azureml.core.webservice import Webservice, AciWebservice
from azureml.exceptions import WebserviceException

parser = argparse.ArgumentParser(description='Deploy arg parser')
parser.add_argument('--environment_name', type=str,help='Environment name')
parser.add_argument('--service_name', type=str,help='service name')
parser.add_argument('--model_name', type=str,help='model name')
parser.add_argument('--deploy_file', type=str, help='File storing if model should be deployed')
parser.add_argument('--scoring_url_file', type=str, help='File storing the scoring url')


args = parser.parse_args()
environment_name = args.environment_name
deploy_file = args.deploy_file
scoring_url_file = args.scoring_url_file
service_name = args.service_name
model_name = args.model_name


run = Run.get_context()

#Get associated AML workspace
ws = run.experiment.workspace

model = Model(ws, model_name)

env = Environment.get(ws, environment_name)


inference_config = InferenceConfig(entry_script='score.py', environment=env)


# Deploy model
aci_config = AciWebservice.deploy_configuration(
            cpu_cores = 1, 
            memory_gb = 2, 
            tags = {'model': 'diabetes remote training'},
            auth_enabled=True,
            enable_app_insights=True,
            collect_model_data=True)

try:
    service = Webservice(ws, name=service_name)
    if service:
        service.delete()
except WebserviceException as e:
         print()

service = Model.deploy(ws, service_name, [model], inference_config, aci_config)
service.wait_for_deployment(True)
    

# Output scoring url
print(service.scoring_uri)
with open(scoring_url_file, 'w+') as f:
    f.write(service.scoring_uri)

Overwriting ./exp_train_pipeline/deployACI.py


In [105]:
%%writefile ./$experiment_folder/score.py

import json
import joblib
import numpy as np
from azureml.core.model import Model
from azureml.monitoring import ModelDataCollector
import time
import os


#version 2
# Called when the service is loaded
def init():
    global model
    #Print statement for appinsights custom traces:
    print ("model initialized" + time.strftime("%H:%M:%S"))
    # Get the path to the deployed model file and load it
    path = os.path.join(Model.get_model_path('diabetes_model_remote'))
    
    print(path)
    model = joblib.load(path)

    
    global inputs_dc, prediction_dc
    inputs_dc = ModelDataCollector("best_model", designation="inputs", feature_names=['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age'])
    prediction_dc = ModelDataCollector("best_model", designation="predictions", feature_names=["Diabetic"])



# Called when a request is received
def run(raw_data):
    try:
        # Get the input data as a numpy array
        data = np.array(json.loads(raw_data)['data'])
        # Get a prediction from the model
        predictions = model.predict(data)
        print ("Prediction created" + time.strftime("%H:%M:%S"))
        # Get the corresponding classname for each prediction (0 or 1)
        classnames = ['not-diabetic', 'diabetic']
        predicted_classes = []
        for prediction in predictions:
            predicted_classes.append(classnames[prediction])
        # Return the predictions as JSON
        
         # Log the input and output data to appinsights:
        info = {
            "input": raw_data,
            "output": predicted_classes
            }
        print(json.dumps(info))
        
        inputs_dc.collect(data.tolist()) #this call is saving our input data into Azure Blob
        prediction_dc.collect(predicted_classes) #this call is saving our prediction data into Azure Blob

        
        return json.dumps(predicted_classes)
    except Exception as e:
        error = str(e)
        print (error + time.strftime("%H:%M:%S"))
        return error

Overwriting ./exp_train_pipeline/score.py


## Create Pipeline steps

## Create Pipeline
Create an Azure ML Pipeline by specifying the steps to be executed. Note: based on the dataset dependencies between steps, exection occurs logically such that no step will execute unless all of the necessary input datasets have been generated.

In [106]:
pipeline = Pipeline(workspace=ws, steps=[get_data_step, split_scale_step, train_model_step, evaluate_and_register_step, deploy_test])

In [107]:
experiment = Experiment(ws, 'AML_Automation_RemotePipelineTraining')
run = experiment.submit(pipeline)
run.wait_for_completion(show_output=True)

Created step Get Data [a1dfa377][f068c843-eaf8-4a85-ac0d-6df7223d3d89], (This step will run and generate new outputs)
Created step Split  Raw Data [849d806c][90375723-79a9-4857-b490-63db88470059], (This step will run and generate new outputs)
Created step Train [160d1250][a889e10e-813e-43ed-a3d2-79fb58665e1b], (This step will run and generate new outputs)
Created step Evaluate and Register Model [8dbc938f][cbfaa632-8a14-43b2-bc99-e607b1cf9802], (This step will run and generate new outputs)
Created step Deploy to ACI [ac372256][3d991ca8-1833-43ca-85f8-a33f88c6028d], (This step will run and generate new outputs)
Submitted PipelineRun 5817898b-b946-41e5-adec-bf7240e2d735
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/5817898b-b946-41e5-adec-bf7240e2d735?wsid=/subscriptions/5da07161-3770-4a4b-aa43-418cbbb627cf/resourcegroups/mm-aml-dev-ops-rg/workspaces/mm-aml-dev-ops&tid=72f988bf-86f1-41af-91ab-2d7cd011db47
PipelineRunId: 5817898b-b946-41e5-adec-bf7240e2d735
Link to Azur




StepRunId: c1ff1898-7ec5-4e04-a745-297f1497dfd7
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/c1ff1898-7ec5-4e04-a745-297f1497dfd7?wsid=/subscriptions/5da07161-3770-4a4b-aa43-418cbbb627cf/resourcegroups/mm-aml-dev-ops-rg/workspaces/mm-aml-dev-ops&tid=72f988bf-86f1-41af-91ab-2d7cd011db47
StepRun( Split  Raw Data ) Status: Running

StepRun(Split  Raw Data) Execution Summary
StepRun( Split  Raw Data ) Status: Finished
{'runId': 'c1ff1898-7ec5-4e04-a745-297f1497dfd7', 'target': 'mm-cluster', 'status': 'Completed', 'startTimeUtc': '2022-01-28T22:20:37.545355Z', 'endTimeUtc': '2022-01-28T22:21:06.436602Z', 'services': {}, 'properties': {'ContentSnapshotId': '994c4def-643a-4e6f-bcc0-8236f10ca6b1', 'StepType': 'PythonScriptStep', 'ComputeTargetType': 'AmlCompute', 'azureml.moduleid': '90375723-79a9-4857-b490-63db88470059', 'azureml.moduleName': 'Split  Raw Data', 'azureml.runsource': 'azureml.StepRun', 'azureml.nodeid': '849d806c', 'azureml.pipelinerunid': '5817898b-b946-




StepRunId: 4ce36af2-5781-41a7-87ac-c92d4da693a5
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/4ce36af2-5781-41a7-87ac-c92d4da693a5?wsid=/subscriptions/5da07161-3770-4a4b-aa43-418cbbb627cf/resourcegroups/mm-aml-dev-ops-rg/workspaces/mm-aml-dev-ops&tid=72f988bf-86f1-41af-91ab-2d7cd011db47
StepRun( Train ) Status: NotStarted
StepRun( Train ) Status: Running

StepRun(Train) Execution Summary
StepRun( Train ) Status: Finished
{'runId': '4ce36af2-5781-41a7-87ac-c92d4da693a5', 'target': 'mm-cluster', 'status': 'Completed', 'startTimeUtc': '2022-01-28T22:21:18.996953Z', 'endTimeUtc': '2022-01-28T22:21:45.163652Z', 'services': {}, 'properties': {'ContentSnapshotId': '994c4def-643a-4e6f-bcc0-8236f10ca6b1', 'StepType': 'PythonScriptStep', 'ComputeTargetType': 'AmlCompute', 'azureml.moduleid': 'a889e10e-813e-43ed-a3d2-79fb58665e1b', 'azureml.moduleName': 'Train', 'azureml.runsource': 'azureml.StepRun', 'azureml.nodeid': '160d1250', 'azureml.pipelinerunid': '5817898b-b946-41e5




StepRunId: 66fe6dd6-47e2-4907-a13f-0922caa5dd49
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/66fe6dd6-47e2-4907-a13f-0922caa5dd49?wsid=/subscriptions/5da07161-3770-4a4b-aa43-418cbbb627cf/resourcegroups/mm-aml-dev-ops-rg/workspaces/mm-aml-dev-ops&tid=72f988bf-86f1-41af-91ab-2d7cd011db47
StepRun( Evaluate and Register Model ) Status: Running

StepRun(Evaluate and Register Model) Execution Summary
StepRun( Evaluate and Register Model ) Status: Finished
{'runId': '66fe6dd6-47e2-4907-a13f-0922caa5dd49', 'target': 'mm-cluster', 'status': 'Completed', 'startTimeUtc': '2022-01-28T22:21:57.890408Z', 'endTimeUtc': '2022-01-28T22:22:12.446302Z', 'services': {}, 'properties': {'ContentSnapshotId': '994c4def-643a-4e6f-bcc0-8236f10ca6b1', 'StepType': 'PythonScriptStep', 'ComputeTargetType': 'AmlCompute', 'azureml.moduleid': 'cbfaa632-8a14-43b2-bc99-e607b1cf9802', 'azureml.moduleName': 'Evaluate and Register Model', 'azureml.runsource': 'azureml.StepRun', 'azureml.nodeid': '8db

ActivityFailedException: ActivityFailedException:
	Message: Activity Failed:
{
    "error": {
        "code": "UserError",
        "message": "{'code': ExecutionFailed, 'message': [{\"exit_code\":1,\"error_message\":\"Execution failed with error: Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.\\nRunning\\n2022-01-28 22:22:32+00:00 Creating Container Registry if not exists.\\n2022-01-28 22:22:33+00:00 Registering the environment.\\n2022-01-28 22:22:34+00:00 Generating deployment configuration.\\n2022-01-28 22:22:35+00:00 Submitting deployment to compute.\\n2022-01-28 22:22:39+00:00 Checking the status of deployment diabetes-service-remote-training..\\n2022-01-28 22:28:41+00:00 Checking the status of inference endpoint diabetes-service-remote-training.\\nSucceeded\\n[stderr]Traceback (most recent call last):\\n[stderr]  File \\\"deployACI.py\\\", line 61, in <module>\\n[stderr]    with open(scoring_url_file, 'w+') as f:\\n[stderr]TypeError: expected str, bytes or os.PathLike object, not NoneType\\n[stderr]\\nACI service creation operation finished, operation \\\"Succeeded\\\"\\nhttp://d776f11b-defe-49a2-a9b4-f652c488e854.eastus.azurecontainer.io/score\\nCleaning up all outstanding Run operations, waiting 300.0 seconds\\n5 items cleaning up...\\nCleanup took 0.46288228034973145 seconds\\n\",\"process_name\":\"/azureml-envs/azureml_3599150719ffbb71885ce3276211def7/bin/python\",\"error_file\":\"user_logs/std_log.txt\"}], 'target': , 'category': UserError, 'error_details': [{'key': exit_codes, 'value': 1}, ], 'inner_error': null}",
        "messageParameters": {},
        "details": []
    },
    "time": "0001-01-01T00:00:00.000Z"
}
	InnerException None
	ErrorResponse 
{
    "error": {
        "message": "Activity Failed:\n{\n    \"error\": {\n        \"code\": \"UserError\",\n        \"message\": \"{'code': ExecutionFailed, 'message': [{\\\"exit_code\\\":1,\\\"error_message\\\":\\\"Execution failed with error: Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.\\\\nRunning\\\\n2022-01-28 22:22:32+00:00 Creating Container Registry if not exists.\\\\n2022-01-28 22:22:33+00:00 Registering the environment.\\\\n2022-01-28 22:22:34+00:00 Generating deployment configuration.\\\\n2022-01-28 22:22:35+00:00 Submitting deployment to compute.\\\\n2022-01-28 22:22:39+00:00 Checking the status of deployment diabetes-service-remote-training..\\\\n2022-01-28 22:28:41+00:00 Checking the status of inference endpoint diabetes-service-remote-training.\\\\nSucceeded\\\\n[stderr]Traceback (most recent call last):\\\\n[stderr]  File \\\\\\\"deployACI.py\\\\\\\", line 61, in <module>\\\\n[stderr]    with open(scoring_url_file, 'w+') as f:\\\\n[stderr]TypeError: expected str, bytes or os.PathLike object, not NoneType\\\\n[stderr]\\\\nACI service creation operation finished, operation \\\\\\\"Succeeded\\\\\\\"\\\\nhttp://d776f11b-defe-49a2-a9b4-f652c488e854.eastus.azurecontainer.io/score\\\\nCleaning up all outstanding Run operations, waiting 300.0 seconds\\\\n5 items cleaning up...\\\\nCleanup took 0.46288228034973145 seconds\\\\n\\\",\\\"process_name\\\":\\\"/azureml-envs/azureml_3599150719ffbb71885ce3276211def7/bin/python\\\",\\\"error_file\\\":\\\"user_logs/std_log.txt\\\"}], 'target': , 'category': UserError, 'error_details': [{'key': exit_codes, 'value': 1}, ], 'inner_error': null}\",\n        \"messageParameters\": {},\n        \"details\": []\n    },\n    \"time\": \"0001-01-01T00:00:00.000Z\"\n}"
    }
}

## Publish Pipeline

In [None]:
published_pipeline = pipeline.publish(name = 'Diabetes Training Pipeline',
                                     description = 'Pipeline that generates batch predictions using a registered trained model.',
                                     continue_on_step_failure = False)

In [None]:
published_pipeline

In [None]:
# from azureml.pipeline.core import ScheduleRecurrence, Schedule

# # Submit the Pipeline every Monday at 00:00 UTC
# recurrence = ScheduleRecurrence(frequency="Week", interval=1, week_days=["Monday"], time_of_day="00:00")
# weekly_schedule = Schedule.create(ws, name="weekly-diabetes-training", 
#                                   description="Based on time",
#                                   pipeline_id=published_pipeline.id, 
#                                   experiment_name='mslearn-diabetes-pipeline', 
#                                   recurrence=recurrence)
# print('Pipeline scheduled.')