# Creating an Azure Machine Learning Pipeline

You can perform the various steps required to ingest data, train a model, and register the model individually by using the Azure ML SDK to run script-based experiments. However, in an enterprise environment it is common to encapsulate the sequence of discrete steps required to build a machine learning solution into a *pipeline* that can be run on one or more compute targets, either on-demand by a user, from an automated build process, or on a schedule.

In this lab, you'll bring together all of these elements to create a simple pipeline that trains and registers a model.

## Connect to Your Workspace

The first thing you need to do is to connect to your workspace using the Azure ML SDK.

> **Note**: If the authenticated session with your Azure subscription has expired since you completed the previous exercise, you'll be prompted to reauthenticate.

In [1]:
import azureml.core
from azureml.core import Workspace

# Load the workspace from the saved config file
ws = Workspace.from_config()
print('Ready to use Azure ML {} to work with {}'.format(azureml.core.VERSION, ws.name))

Ready to use Azure ML 1.18.0 to work with nikhilsuthardp100


## Prepare the Training Data

You can use local data files to train a model, but when running training workloads automatically on cloud-based compute, it makes more sense to store the data centrally in the cloud and ingest it into the training script wherever it happens to be running.

In this lab, you'll upload the training data to a *datastore* and define a *dataset* that can be used to access the data from a training script. For simplicity, you'll upload the data to the default datastore for your Azure Machine Learning workspace - this is an Azure Storage blob container that was created when you provisioned the workspace. In a real solution, you'd likely register a datastore that references the cloud location where you typically store your data. You'll then create a *tabular* dataset that references the CSV files you uploaded.

In [2]:
from azureml.core import Dataset

default_ds = ws.get_default_datastore()

if 'diabetes dataset' not in ws.datasets:
    default_ds.upload_files(files=['./data/diabetes.csv', './data/diabetes2.csv'], # Upload the diabetes csv files in /data
                        target_path='diabetes-data/', # Put it in a folder path in the datastore
                        overwrite=True, # Replace existing files of the same name
                        show_progress=True)

    #Create a tabular dataset from the path on the datastore (this may take a short while)
    tab_data_set = Dataset.Tabular.from_delimited_files(path=(default_ds, 'diabetes-data/*.csv'))

    # Register the tabular dataset
    try:
        tab_data_set = tab_data_set.register(workspace=ws, 
                                name='diabetes dataset',
                                description='diabetes data',
                                tags = {'format':'CSV'},
                                create_new_version=True)
        print('Dataset registered.')
    except Exception as ex:
        print(ex)
else:
    print('Dataset already registered.')

Dataset already registered.


## Create Scripts for Pipeline Steps

Now you're ready to start work on your pipeline. Pipelines consist of one or more *steps*, which can be Python scripts, or specialized steps like an Auto ML training estimator or a data transfer step that copies data from one location to another. Each step can run in its own compute context.

In this exercise, you'll build a simple pipeline that contains an estimator step (to train a model) and a Python script step (to register the trained model). Start by creating a folder to contain the scripts for each step.

In [3]:
import os
# Create a folder for the pipeline step files
experiment_folder = 'diabetes_pipeline'
os.makedirs(experiment_folder, exist_ok=True)

print(experiment_folder)

diabetes_pipeline


The first step will use an estimator to run a training script. The code in the following cell creates this script for you. Note that the script includes a parameter named **output_folder**, which references the folder where the trained model should be saved.

In [4]:
%%writefile $experiment_folder/train_diabetes.py
# Import libraries
from azureml.core import Run
import argparse
import pandas as pd
import numpy as np
import joblib
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import roc_auc_score

# Get parameters
parser = argparse.ArgumentParser()
parser.add_argument('--output_folder', type=str, dest='output_folder', default="diabetes_model", help='output folder')
args = parser.parse_args()
output_folder = args.output_folder

# Get the experiment run context
run = Run.get_context()

# load the diabetes data (passed as an input dataset)
print("Loading Data...")
diabetes = run.input_datasets['diabetes_train'].to_pandas_dataframe()

# Separate features and labels
X, y = diabetes[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, diabetes['Diabetic'].values

# Split data into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)

# Train adecision tree model
print('Training a decision tree model')
model = DecisionTreeClassifier().fit(X_train, y_train)

# calculate accuracy
y_hat = model.predict(X_test)
acc = np.average(y_hat == y_test)
print('Accuracy:', acc)
run.log('Accuracy', np.float(acc))

# calculate AUC
y_scores = model.predict_proba(X_test)
auc = roc_auc_score(y_test,y_scores[:,1])
print('AUC: ' + str(auc))
run.log('AUC', np.float(auc))

# Save the trained model
os.makedirs(output_folder, exist_ok=True)
output_path = output_folder + "/model.pkl"
joblib.dump(value=model, filename=output_path)

run.complete()

Writing diabetes_pipeline/train_diabetes.py


The script for the second step of the pipeline will load the model from where it was saved, and then register it in the workspace. It includes a single **model_folder** parameter that contains the path to the folder where the model was saved by the previous step.

In [5]:
%%writefile $experiment_folder/register_diabetes.py
# Import libraries
import argparse
import joblib
from azureml.core import Workspace, Model, Run

# Get parameters
parser = argparse.ArgumentParser()
parser.add_argument('--model_folder', type=str, dest='model_folder', default="diabetes_model", help='model location')
args = parser.parse_args()
model_folder = args.model_folder

# Get the experiment run context
run = Run.get_context()

# load the model
print("Loading model from " + model_folder)
model_file = model_folder + "/model.pkl"
model = joblib.load(model_file)

Model.register(workspace=run.experiment.workspace,
               model_path = model_file,
               model_name = 'diabetes_model',
               tags={'Training context':'Pipeline'})

run.complete()

Writing diabetes_pipeline/register_diabetes.py


## Prepare a Compute Environment for the Pipeline Steps

The pipeline will eventually be published and run on-demand, so it needs a compute environment in which to run. In this exercise, you'll use the same compute for both steps, but it's important to realize that each step is run independently; so you could specify different compute contexts for each step if appropriate.

First, you need a compute target. In this case, you create an Azure Machine Learning compute cluster in your workspace (or use an existing one if you have created it previously).

> **Important**: Change *your-compute-cluster* to a unique name for your compute cluster in the code below before running it! Cluster names must be globally unique names between 2 to 16 characters in length. Valid characters are letters, digits, and the - character.

In [6]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

cluster_name = "nikhilvmcluster"

try:
    # Check for existing compute target
    pipeline_cluster = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    # If it doesn't already exist, create it
    try:
        compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS11_V2', max_nodes=2)
        pipeline_cluster = ComputeTarget.create(ws, cluster_name, compute_config)
        pipeline_cluster.wait_for_completion(show_output=True)
    except Exception as ex:
        print(ex)
    

Found existing cluster, use it.


The compute will require a Python environment with the necessary package dependencies installed, so we'll create a run configuration.

In [7]:
from azureml.core import Environment
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.runconfig import RunConfiguration

# Create a Python environment for the experiment
diabetes_env = Environment("diabetes-pipeline-env")
diabetes_env.python.user_managed_dependencies = False # Let Azure ML manage dependencies
diabetes_env.docker.enabled = True # Use a docker container

# Create a set of package dependencies
diabetes_packages = CondaDependencies.create(conda_packages=['scikit-learn','pandas'],
                                             pip_packages=['azureml-defaults','azureml-dataprep[pandas]'])

# Add the dependencies to the environment
diabetes_env.python.conda_dependencies = diabetes_packages

# Register the environment (just in case you want to use it again)
diabetes_env.register(workspace=ws)
registered_env = Environment.get(ws, 'diabetes-pipeline-env')

# Create a new runconfig object for the pipeline
pipeline_run_config = RunConfiguration()

# Use the compute you created above. 
pipeline_run_config.target = pipeline_cluster

# Assign the environment to the run configuration
pipeline_run_config.environment = registered_env

print ("Run configuration created.")

Run configuration created.


## Create and Run a Pipeline

Now you're ready to define and run the pipeline.

First you need to define the steps for the pipeline, and any data references that need to passed between them. In this case, the first step must write the model to a folder that can be read from by the second step. Since the steps will be run on remote compute (and in fact, could each be run on different compute), the folder path must be passed as a data reference to a location in a datastore within the workspace. The **PipelineData** object is a special kind of data reference that is used to pass data from the output of one pipeline step to the input of another, creating a dependency between them. You'll create one and use it as the output for the first step and the input for the second step. Note that you also need to pass it as a script argument so your code can access the datastore location referenced by the data reference.

In [8]:
from azureml.pipeline.core import PipelineData
from azureml.pipeline.steps import PythonScriptStep, EstimatorStep
from azureml.train.estimator import Estimator

# Get the training dataset
diabetes_ds = ws.datasets.get("diabetes dataset")

# Create a PipelineData (temporary Data Reference) for the model folder
model_folder = PipelineData("model_folder", datastore=ws.get_default_datastore())

estimator = Estimator(source_directory=experiment_folder,
                        compute_target = pipeline_cluster,
                        environment_definition=pipeline_run_config.environment,
                        entry_script='train_diabetes.py')

# Step 1, run the estimator to train the model
train_step = EstimatorStep(name = "Train Model",
                           estimator=estimator, 
                           estimator_entry_script_arguments=['--output_folder', model_folder],
                           inputs=[diabetes_ds.as_named_input('diabetes_train')],
                           outputs=[model_folder],
                           compute_target = pipeline_cluster,
                           allow_reuse = True)

# Step 2, run the model registration script
register_step = PythonScriptStep(name = "Register Model",
                                source_directory = experiment_folder,
                                script_name = "register_diabetes.py",
                                arguments = ['--model_folder', model_folder],
                                inputs=[model_folder],
                                compute_target = pipeline_cluster,
                                runconfig = pipeline_run_config,
                                allow_reuse = True)

print("Pipeline steps defined")

Pipeline steps defined


OK, now you're ready to build the pipeline from the steps you've defined and run it as an experiment.

> **Note**: This may take a while. The training cluster must be started and configured with the Python environment before the scripts can be run. Now might be a good time to take a coffee break!

In [9]:
from azureml.core import Experiment
from azureml.pipeline.core import Pipeline

# Construct the pipeline
pipeline_steps = [train_step, register_step]
pipeline = Pipeline(workspace = ws, steps=pipeline_steps)
print("Pipeline is built.")

# Create an experiment and run the pipeline
experiment = Experiment(workspace = ws, name = 'diabetes-training-pipeline')
pipeline_run = experiment.submit(pipeline, regenerate_outputs=True)
print("Pipeline submitted for execution.")
pipeline_run.wait_for_completion(show_output=True)

Pipeline is built.
Created step Train Model [590f6e0b][07bd421b-156f-4fcb-9659-c969f786c8f1], (This step will run and generate new outputs)Created step Register Model [c136bfe3][5538de8b-7f8a-40bc-9bb0-8eb9268fae56], (This step will run and generate new outputs)

Submitted PipelineRun fa014266-d11a-4f39-b299-35392b892075
Link to Azure Machine Learning Portal: https://ml.azure.com/experiments/diabetes-training-pipeline/runs/fa014266-d11a-4f39-b299-35392b892075?wsid=/subscriptions/71bfcf50-7e10-4546-9c9a-fd4f1ee42434/resourcegroups/nikhil-suthardp100/workspaces/nikhilsuthardp100
Pipeline submitted for execution.
PipelineRunId: fa014266-d11a-4f39-b299-35392b892075
Link to Azure Machine Learning Portal: https://ml.azure.com/experiments/diabetes-training-pipeline/runs/fa014266-d11a-4f39-b299-35392b892075?wsid=/subscriptions/71bfcf50-7e10-4546-9c9a-fd4f1ee42434/resourcegroups/nikhil-suthardp100/workspaces/nikhilsuthardp100
PipelineRun Status: NotStarted
PipelineRun Status: Running


StepRunI

aece08fd27fc: Pushed
5e1805eb9eb5: Pushed
dcc0cc99372e: Pushed
2817caf0a082: Pushed
87c128261339: Pushed
41a253a417e6: Pushed
8dab94e6d05c: Pushed
4caea5ef1f0b: Pushed
e06660e80cf4: Pushed

55cf4caf203a: Pushed
latest: digest: sha256:e24632faa61a197298cfbb748d9f148fcce15d72ec2c97c3cae1ab06f89a0a92 size: 4095
2020/11/25 10:21:25 Successfully pushed image: ed4462f89ccc45679b9210c6758878a5.azurecr.io/azureml/azureml_4c436a746d1f86e10a04b3459370ec37:latest
2020/11/25 10:21:25 Step ID: acb_step_0 marked as successful (elapsed time in seconds: 208.160232)
2020/11/25 10:21:25 Populating digests for step ID: acb_step_0...
2020/11/25 10:21:27 Successfully populated digests for step ID: acb_step_0
2020/11/25 10:21:27 Step ID: acb_step_1 marked as successful (elapsed time in seconds: 145.058186)
2020/11/25 10:21:27 The following dependencies were found:
2020/11/25 10:21:27 
- image:
    registry: ed4462f89ccc45679b9210c6758878a5.azurecr.io
    repository: azureml/azureml_4c436a746d1f86e10a04b3459


StepRun(Train Model) Execution Summary
StepRun( Train Model ) Status: Finished
{'runId': '327d0507-d004-4017-9fde-f49b4761a284', 'target': 'nikhilvmcluster', 'status': 'Completed', 'startTimeUtc': '2020-11-25T10:25:41.472542Z', 'endTimeUtc': '2020-11-25T10:27:43.989187Z', 'properties': {'azureml.runsource': 'azureml.StepRun', 'ContentSnapshotId': '5af7ce67-c6d7-4d64-96ee-7820e11f7625', 'StepType': 'PythonScriptStep', 'ComputeTargetType': 'AmlCompute', 'azureml.moduleid': '07bd421b-156f-4fcb-9659-c969f786c8f1', 'azureml.nodeid': '590f6e0b', 'azureml.pipelinerunid': 'fa014266-d11a-4f39-b299-35392b892075', '_azureml.ComputeTargetType': 'amlcompute', 'ProcessInfoFile': 'azureml-logs/process_info.json', 'ProcessStatusFile': 'azureml-logs/process_status.json'}, 'inputDatasets': [{'dataset': {'id': '47094f00-3593-45f5-97d5-f8ebb5db829c'}, 'consumptionDetails': {'type': 'RunInput', 'inputName': 'diabetes_train', 'mechanism': 'Direct'}}], 'outputDatasets': [], 'runDefinition': {'script': 'trai




StepRunId: a57478ab-8bca-4e45-b82d-bcb60b5f3573
Link to Azure Machine Learning Portal: https://ml.azure.com/experiments/diabetes-training-pipeline/runs/a57478ab-8bca-4e45-b82d-bcb60b5f3573?wsid=/subscriptions/71bfcf50-7e10-4546-9c9a-fd4f1ee42434/resourcegroups/nikhil-suthardp100/workspaces/nikhilsuthardp100
StepRun( Register Model ) Status: Running

Streaming azureml-logs/55_azureml-execution-tvmps_a527a08f0b4e0c38d884e9d58aae7f365886dabc5956acd3a998d99c4ff0eb71_d.txt
2020-11-25T10:28:06Z Starting output-watcher...
2020-11-25T10:28:06Z IsDedicatedCompute == True, won't poll for Low Pri Preemption
2020-11-25T10:28:07Z Executing 'Copy ACR Details file' on 10.0.0.5
2020-11-25T10:28:07Z Copy ACR Details file succeeded on 10.0.0.5. Output: 
>>>   
>>>   
Login Succeeded
Using default tag: latest
latest: Pulling from azureml/azureml_4c436a746d1f86e10a04b3459370ec37
Digest: sha256:e24632faa61a197298cfbb748d9f148fcce15d72ec2c97c3cae1ab06f89a0a92
Status: Image is up to date for ed4462f89ccc4


Streaming azureml-logs/75_job_post-tvmps_a527a08f0b4e0c38d884e9d58aae7f365886dabc5956acd3a998d99c4ff0eb71_d.txt
[2020-11-25T10:28:25.521958] Entering job release
[2020-11-25T10:28:26.646912] Starting job release
[2020-11-25T10:28:26.701839] Logging experiment finalizing status in history service.
Starting the daemon thread to refresh tokens in background for process with pid = 139
[2020-11-25T10:28:26.702451] job release stage : upload_datastore starting...
[2020-11-25T10:28:26.706133] job release stage : start importing azureml.history._tracking in run_history_release.
[2020-11-25T10:28:26.715261] job release stage : execute_job_release starting...
[2020-11-25T10:28:26.715452] job release stage : copy_batchai_cached_logs starting...
[2020-11-25T10:28:26.715488] job release stage : copy_batchai_cached_logs completed...
[2020-11-25T10:28:26.716901] Entering context manager injector.
[2020-11-25T10:28:26.967877] job release stage : send_run_telemetry starting...
[2020-11-25T10:28:27.130



PipelineRun Execution Summary
PipelineRun Status: Finished
{'runId': 'fa014266-d11a-4f39-b299-35392b892075', 'status': 'Completed', 'startTimeUtc': '2020-11-25T10:15:00.881715Z', 'endTimeUtc': '2020-11-25T10:28:47.316608Z', 'properties': {'azureml.runsource': 'azureml.PipelineRun', 'runSource': 'SDK', 'runType': 'SDK', 'azureml.parameters': '{}'}, 'inputDatasets': [], 'outputDatasets': [], 'logFiles': {'logs/azureml/executionlogs.txt': 'https://nikhilsuthardp7211953111.blob.core.windows.net/azureml/ExperimentRun/dcid.fa014266-d11a-4f39-b299-35392b892075/logs/azureml/executionlogs.txt?sv=2019-02-02&sr=b&sig=TmskTn1Sp9H2hw1IDAcM8RigB7Nr%2FTIEJbTIXyQOV38%3D&st=2020-11-25T10%3A05%3A08Z&se=2020-11-25T18%3A15%3A08Z&sp=r', 'logs/azureml/stderrlogs.txt': 'https://nikhilsuthardp7211953111.blob.core.windows.net/azureml/ExperimentRun/dcid.fa014266-d11a-4f39-b299-35392b892075/logs/azureml/stderrlogs.txt?sv=2019-02-02&sr=b&sig=L8gITAP4j12xIJ%2B0HwciAHaqsZbJs2AE48xTKDx2NfE%3D&st=2020-11-25T10%3A05

'Finished'

The output from the pipeline experiment will be displayed as it runs. keep an eye on the kernel indicator at the top right of the page, when it turns from **&#9899;** to **&#9711;**, the code has finished running. You can also monitor pipeline runs in the **Experiments** page in [Azure Machine Learning studio](https://ml.azure.com).

When the pipeline has finished, a new model should be registered with a *Training context* tag indicating it was trained in a pipeline. Run the following code to verify this.

In [10]:
from azureml.core import Model

for model in Model.list(ws):
    print(model.name, 'version:', model.version)
    for tag_name in model.tags:
        tag = model.tags[tag_name]
        print ('\t',tag_name, ':', tag)
    for prop_name in model.properties:
        prop = model.properties[prop_name]
        print ('\t',prop_name, ':', prop)
    print('\n')

diabetes_model version: 3
	 Training context : Pipeline


diabetes_model version: 2
	 Training context : Parameterized SKLearn Estimator
	 AUC : 0.8483904671874223
	 Accuracy : 0.7736666666666666


diabetes_model version: 1
	 Training context : Estimator
	 AUC : 0.8484929598487486
	 Accuracy : 0.774


amlstudio-predict-diabetes version: 1
	 CreatedByAMLStudio : true


amlstudio-predict-auto-price version: 1
	 CreatedByAMLStudio : true


AutoMLcd52161af2 version: 1




## Publish the Pipeline

Now that you've created a pipeline and verified it works, you can publish it as a REST service.

In [11]:
published_pipeline = pipeline.publish(name="Diabetes_Training_Pipeline",
                                      description="Trains diabetes model",
                                      version="1.0")
rest_endpoint = published_pipeline.endpoint
print(rest_endpoint)

https://eastus2.api.azureml.ms/pipelines/v1.0/subscriptions/71bfcf50-7e10-4546-9c9a-fd4f1ee42434/resourceGroups/nikhil-suthardp100/providers/Microsoft.MachineLearningServices/workspaces/nikhilsuthardp100/PipelineRuns/PipelineSubmit/07a089a6-aef4-416e-a1a2-b708f43cb3bc


To use the endpoint, client applications need to make a REST call over HTTP. This request must be authenticated, so an authorization header is required. A real application would require a service principal with which to be authenticated, but to test this out, we'll use the authorization header from your current connection to your Azure workspace, which you can get using the following code:

In [12]:
from azureml.core.authentication import InteractiveLoginAuthentication

interactive_auth = InteractiveLoginAuthentication()
auth_header = interactive_auth.get_authentication_header()

Now you're ready to call the REST interface. The pipeline runs asynchronously, so you'll get an identifier back, which you can use to track the pipeline experiment as it runs:

In [13]:
import requests
experiment_name = 'Run-diabetes-pipeline'

response = requests.post(rest_endpoint, 
                         headers=auth_header, 
                         json={"ExperimentName": experiment_name})
run_id = response.json()["Id"]
run_id

'c779854e-d68d-4a06-bdaa-8a0279faae58'

Since you have the run ID, you can use it to wait for the run to complete.

> **Note**: The pipeline should complete quickly, because each step was configured to allow output reuse. This was done primarily for convenience and to save time in this example. In reality, you'd likely want the first step to run every time in case the data has changed, and trigger the subsequent steps only if the output from step one changes.

In [14]:
from azureml.pipeline.core.run import PipelineRun

published_pipeline_run = PipelineRun(ws.experiments[experiment_name], run_id)
published_pipeline_run.wait_for_completion(show_output=True)

PipelineRunId: c779854e-d68d-4a06-bdaa-8a0279faae58
Link to Azure Machine Learning Portal: https://ml.azure.com/experiments/Run-diabetes-pipeline/runs/c779854e-d68d-4a06-bdaa-8a0279faae58?wsid=/subscriptions/71bfcf50-7e10-4546-9c9a-fd4f1ee42434/resourcegroups/nikhil-suthardp100/workspaces/nikhilsuthardp100
PipelineRun Status: Running


StepRunId: f137d214-f2e7-4b84-92f8-8aa5e708ae31
Link to Azure Machine Learning Portal: https://ml.azure.com/experiments/Run-diabetes-pipeline/runs/f137d214-f2e7-4b84-92f8-8aa5e708ae31?wsid=/subscriptions/71bfcf50-7e10-4546-9c9a-fd4f1ee42434/resourcegroups/nikhil-suthardp100/workspaces/nikhilsuthardp100
StepRun( Train Model ) Status: NotStarted

Streaming azureml-logs/55_azureml-execution-tvmps_a527a08f0b4e0c38d884e9d58aae7f365886dabc5956acd3a998d99c4ff0eb71_d.txt
2020-11-25T10:30:24Z Starting output-watcher...
2020-11-25T10:30:24Z IsDedicatedCompute == True, won't poll for Low Pri Preemption
StepRun( Train Model ) Status: Running
2020-11-25T10:30:24Z Exe


Streaming azureml-logs/70_driver_log.txt
2020/11/25 10:30:35 Attempt 1 of http call to http://10.0.0.5:16384/sendlogstoartifacts/info
2020/11/25 10:30:35 Attempt 1 of http call to http://10.0.0.5:16384/sendlogstoartifacts/status
[2020-11-25T10:30:36.500471] Entering context manager injector.
[context_manager_injector.py] Command line Options: Namespace(inject=['ProjectPythonPath:context_managers.ProjectPythonPath', 'RunHistory:context_managers.RunHistory', 'TrackUserError:context_managers.TrackUserError'], invocation=['train_diabetes.py', '--output_folder', '/mnt/batch/tasks/shared/LS_root/jobs/nikhilsuthardp100/azureml/f137d214-f2e7-4b84-92f8-8aa5e708ae31/mounts/workspaceblobstore/azureml/f137d214-f2e7-4b84-92f8-8aa5e708ae31/model_folder'])
Script type = None
Starting the daemon thread to refresh tokens in background for process with pid = 106
Entering Run History Context Manager.
Current directory:  /mnt/batch/tasks/shared/LS_root/jobs/nikhilsuthardp100/azureml/f137d214-f2e7-4b84-92




StepRunId: 8a79253b-2dd7-4503-b45a-f85714ed873d
Link to Azure Machine Learning Portal: https://ml.azure.com/experiments/Run-diabetes-pipeline/runs/8a79253b-2dd7-4503-b45a-f85714ed873d?wsid=/subscriptions/71bfcf50-7e10-4546-9c9a-fd4f1ee42434/resourcegroups/nikhil-suthardp100/workspaces/nikhilsuthardp100
StepRun( Register Model ) Status: NotStarted
StepRun( Register Model ) Status: Running

Streaming azureml-logs/55_azureml-execution-tvmps_a527a08f0b4e0c38d884e9d58aae7f365886dabc5956acd3a998d99c4ff0eb71_d.txt
2020-11-25T10:31:33Z Starting output-watcher...
2020-11-25T10:31:33Z IsDedicatedCompute == True, won't poll for Low Pri Preemption
2020-11-25T10:31:33Z Executing 'Copy ACR Details file' on 10.0.0.5
2020-11-25T10:31:33Z Copy ACR Details file succeeded on 10.0.0.5. Output: 
>>>   
>>>   
Login Succeeded
Using default tag: latest
latest: Pulling from azureml/azureml_4c436a746d1f86e10a04b3459370ec37
Digest: sha256:e24632faa61a197298cfbb748d9f148fcce15d72ec2c97c3cae1ab06f89a0a92
Statu

2020-11-25T10:31:49Z job exited with code 0
2020-11-25T10:31:50Z Executing 'JobRelease task' on 10.0.0.5
2020-11-25T10:31:54Z JobRelease task succeeded on 10.0.0.5. Output: 
>>>   2020/11/25 10:31:50 Starting App Insight Logger for task:  jobRelease
>>>   2020/11/25 10:31:50 Version: 3.0.01417.0012 Branch: 56 Commit: d244ddd
>>>   2020/11/25 10:31:50 runSpecialJobTask: os.GetEnv constants.StdouterrDir: /mnt/batch/tasks/shared/LS_root/jobs/nikhilsuthardp100/azureml/8a79253b-2dd7-4503-b45a-f85714ed873d/mounts/workspaceblobstore/azureml/8a79253b-2dd7-4503-b45a-f85714ed873d/azureml_compute_logs
>>>   2020/11/25 10:31:50 runSpecialJobTask: Raw cmd for postprocessing is passed is: export AZ_BATCHAI_RUN_STATUS='SUCCEEDED';export AZ_BATCHAI_LOG_UPLOAD_FAILED='false';/azureml-envs/azureml_76d0c57fc1c2b25401278bcbc2779419/bin/python $AZ_BATCHAI_JOB_MOUNT_ROOT/workspaceblobstore/azureml/8a79253b-2dd7-4503-b45a-f85714ed873d/azureml-setup/job_release.py -i DataStoreCopy:context_managers.DataStores




PipelineRun Execution Summary
PipelineRun Status: Finished
{'runId': 'c779854e-d68d-4a06-bdaa-8a0279faae58', 'status': 'Completed', 'startTimeUtc': '2020-11-25T10:29:57.402105Z', 'endTimeUtc': '2020-11-25T10:32:06.580729Z', 'properties': {'azureml.runsource': 'azureml.PipelineRun', 'runSource': 'Unavailable', 'runType': 'HTTP', 'azureml.parameters': '{}', 'azureml.pipelineid': '07a089a6-aef4-416e-a1a2-b708f43cb3bc'}, 'inputDatasets': [], 'logFiles': {'logs/azureml/executionlogs.txt': 'https://nikhilsuthardp7211953111.blob.core.windows.net/azureml/ExperimentRun/dcid.c779854e-d68d-4a06-bdaa-8a0279faae58/logs/azureml/executionlogs.txt?sv=2019-02-02&sr=b&sig=722N70E0Qu8ov362XvqzTTLukTrZHQjeA0osk9BlESU%3D&st=2020-11-25T10%3A20%3A32Z&se=2020-11-25T18%3A30%3A32Z&sp=r', 'logs/azureml/stderrlogs.txt': 'https://nikhilsuthardp7211953111.blob.core.windows.net/azureml/ExperimentRun/dcid.c779854e-d68d-4a06-bdaa-8a0279faae58/logs/azureml/stderrlogs.txt?sv=2019-02-02&sr=b&sig=6H2vBwonVIwDxeLPz%2BuXk

'Finished'

This is a simple example, designed to demonstrate the principle. In reality, you could build more sophisticated logic into the pipeline steps - for example, evaluating the model against some test data to calculate a performance metric like AUC or accuracy, comparing the metric to that of any previously registered versions of the model, and only registering the new model if it performs better.

You can use the [Azure Machine Learning extension for Azure DevOps](https://marketplace.visualstudio.com/items?itemName=ms-air-aiagility.vss-services-azureml) to combine Azure ML pipelines with Azure DevOps pipelines (yes, it *is* confusing that they have the same name!) and integrate model retraining into a *continuous integration/continuous deployment (CI/CD)* process. For example you could use an Azure DevOps *build* pipeline to trigger an Azure ML pipeline that trains and registers a model, and when the model is registered it could trigger an Azure Devops *release* pipeline that deploys the model as a web service, along with the application or service that consumes the model.