This notebook is designed to create and execute an **Azure Machine Learning (AML) Pipeline** that automates the process of:

1. **Data preparation**: Pre-processing the data.
2. **Model training**: Training a machine learning model.
3. **Model registration**: Saving the trained model into the Azure workspace for later deployment or use.

### Why is this notebook important?
- **Automation**: It streamlines the end-to-end machine learning workflow, encapsulating the steps into a reusable and scalable pipeline. This is useful for recurring tasks, such as weekly model retraining.
- **Modularity**: Each step in the pipeline (data preparation, training, registration) is independent, allowing flexibility in defining compute environments and reusing outputs.
- **Scalability**: The pipeline can be executed on multiple compute nodes, allowing efficient handling of large datasets and complex models.
- **Reproducibility**: The pipeline structure ensures that the steps and configurations are well-defined, allowing the same process to be repeated consistently.
- **Integration with CI/CD**: The pipeline can be integrated with continuous integration and deployment pipelines (e.g., Azure DevOps), automating the entire machine learning lifecycle from data ingestion to deployment.

### What is the pipeline doing?
1. **Installing necessary dependencies**: The notebook installs the latest version of the AML SDK.
2. **Connecting to Azure Workspace**: The notebook connects to an Azure Machine Learning workspace to manage datasets, compute resources, and experiments.
3. **Data Preparation Step**: It loads the diabetes dataset, performs preprocessing (e.g., normalizing data, handling missing values), and saves the prepared data for the next step.
4. **Model Training Step**: It trains a machine learning model (Decision Tree in this case) using the preprocessed data, logs metrics (accuracy, AUC), and saves the model.
5. **Model Registration**: The trained model is registered in the Azure workspace, making it available for deployment.
6. **Pipeline Creation**: The steps are combined into a pipeline, which can be executed either manually or on a schedule.
7. **Pipeline Scheduling**: The pipeline is scheduled to run weekly, ensuring that the model is periodically retrained with updated data.

### Why is it useful?
- **Efficient Workflow**: This notebook automates the repetitive tasks of data preparation, model training, and model registration, saving time and reducing manual errors.
- **Scheduled Retraining**: It ensures that your model stays up to date with new data, maintaining its performance over time.
- **Collaboration**: The registered model and pipeline can be shared within a team, making it easier for multiple stakeholders to collaborate on a project.
- **Seamless Deployment**: The pipeline can be integrated with a CI/CD process, enabling automated deployment once the model is registered.

# Create a Pipeline

You can perform the various steps required to ingest data, train a model, and register the model individually by using the Azure ML SDK to run script-based experiments. However, in an enterprise environment it is common to encapsulate the sequence of discrete steps required to build a machine learning solution into a *pipeline* that can be run on one or more compute targets, either on-demand by a user, from an automated build process, or on a schedule.

In this notebook, you'll bring together all of these elements to create a simple pipeline that pre-processes data and then trains and registers a model.

## Install the Azure Machine Learning SDK

The Azure Machine Learning SDK is updated frequently. Run the following cell to upgrade to the latest release, along with the additional package to support notebook widgets.

In [1]:
!pip install --upgrade azureml-sdk azureml-widgets



## Connect to your workspace

With the latest version of the SDK installed, now you're ready to connect to your workspace.

> **Note**: If you haven't already established an authenticated session with your Azure subscription, you'll be prompted to authenticate by clicking a link, entering an authentication code, and signing into Azure.

In [2]:
import azureml.core
from azureml.core import Workspace

# Load the workspace from the saved config file
ws = Workspace.from_config()
print('Ready to use Azure ML {} to work with {}'.format(azureml.core.VERSION, ws.name))

Ready to use Azure ML 1.57.0 to work with coursera


What it does: This block imports necessary libraries, connects to your Azure Machine Learning workspace using a configuration file, and prints a message confirming the connection.
Why it's useful: The workspace is the central place for managing your data, models, and compute resources in Azure ML.

## Prepare data

In your pipeline, you'll use a dataset containing details of diabetes patients. Run the cell below to create this dataset (if you created it in previously, the code will find the existing version)

In [3]:
from azureml.core import Dataset

default_ds = ws.get_default_datastore()

if 'diabetes dataset' not in ws.datasets:
    default_ds.upload_files(files=['./data/diabetes.csv', './data/diabetes2.csv'], # Upload the diabetes csv files in /data
                        target_path='diabetes-data/', # Put it in a folder path in the datastore
                        overwrite=True, # Replace existing files of the same name
                        show_progress=True)

    #Create a tabular dataset from the path on the datastore (this may take a short while)
    tab_data_set = Dataset.Tabular.from_delimited_files(path=(default_ds, 'diabetes-data/*.csv'))

    # Register the tabular dataset
    try:
        tab_data_set = tab_data_set.register(workspace=ws, 
                                name='diabetes dataset',
                                description='diabetes data',
                                tags = {'format':'CSV'},
                                create_new_version=True)
        print('Dataset registered.')
    except Exception as ex:
        print(ex)
else:
    print('Dataset already registered.')

Dataset already registered.


What it does: This block checks if the diabetes dataset is already registered in the workspace. If not, it uploads the specified CSV files to Azure's default datastore, creates a tabular dataset, and registers it in the workspace.
Why it's useful: By registering the dataset, you can easily access and use it in your machine learning tasks without needing to upload it again.

## Create scripts for pipeline steps

Pipelines consist of one or more *steps*, which can be Python scripts, or specialized steps like a data transfer step that copies data from one location to another. Each step can run in its own compute context. In this exercise, you'll build a simple pipeline that contains two Python script steps: one to pre-process some training data, and another to use the pre-processed data to train and register a model.

First, let's create a folder for the script files we'll use in the pipeline steps.

In [4]:
import os
# Create a folder for the pipeline step files
experiment_folder = 'diabetes_pipeline'
os.makedirs(experiment_folder, exist_ok=True)

print(experiment_folder)

diabetes_pipeline


What it does: This block creates a directory called diabetes_pipeline to store the Python scripts that will define the pipeline steps.
Why it's useful: Organizing your code in a specific folder helps keep your workspace tidy and makes it easier to manage the pipeline scripts.

Now let's create the first script, which will read data from the diabetes dataset and apply some simple pre-processing to remove any rows with missing data and normalize the numeric features so they're on a similar scale.

The script includes a argument named **--prepped-data**, which references the folder where the resulting data should be saved.

In [5]:
%%writefile $experiment_folder/prep_diabetes.py
# Import libraries
import os
import argparse
import pandas as pd
from azureml.core import Run
from sklearn.preprocessing import MinMaxScaler

# Get parameters
parser = argparse.ArgumentParser()
parser.add_argument("--input-data", type=str, dest='raw_dataset_id', help='raw dataset')
parser.add_argument('--prepped-data', type=str, dest='prepped_data', default='prepped_data', help='Folder for results')
args = parser.parse_args()
save_folder = args.prepped_data

# Get the experiment run context
run = Run.get_context()

# load the data (passed as an input dataset)
print("Loading Data...")
diabetes = run.input_datasets['raw_data'].to_pandas_dataframe()

# Log raw row count
row_count = (len(diabetes))
run.log('raw_rows', row_count)

# remove nulls
diabetes = diabetes.dropna()

# Normalize the numeric columns
scaler = MinMaxScaler()
num_cols = ['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree']
diabetes[num_cols] = scaler.fit_transform(diabetes[num_cols])

# Log processed rows
row_count = (len(diabetes))
run.log('processed_rows', row_count)

# Save the prepped data
print("Saving Data...")
os.makedirs(save_folder, exist_ok=True)
save_path = os.path.join(save_folder,'data.csv')
diabetes.to_csv(save_path, index=False, header=True)

# End the run
run.complete()

Writing diabetes_pipeline/prep_diabetes.py


What it does: This script (prep_diabetes.py) preprocesses the diabetes dataset. It:
Loads the dataset.
Removes any rows with missing values.
Normalizes certain numerical features to a scale between 0 and 1.
Saves the preprocessed data to a specified folder.
Why it's useful: Data preprocessing is a critical step in machine learning. This script ensures that the model is trained on clean and normalized data, which helps improve model performance.

Now you can create the script for the second step, which will train a model. The script includes a argument named **--training-folder**, which references the folder where the prepared data was saved by the previous step.

In [6]:
%%writefile $experiment_folder/train_diabetes.py
# Import libraries
from azureml.core import Run, Model
import argparse
import pandas as pd
import numpy as np
import joblib
import os
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve
import matplotlib.pyplot as plt

# Get parameters
parser = argparse.ArgumentParser()
parser.add_argument("--training-folder", type=str, dest='training_folder', help='training data folder')
args = parser.parse_args()
training_folder = args.training_folder

# Get the experiment run context
run = Run.get_context()

# load the prepared data file in the training folder
print("Loading Data...")
file_path = os.path.join(training_folder,'data.csv')
diabetes = pd.read_csv(file_path)

# Separate features and labels
X, y = diabetes[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, diabetes['Diabetic'].values

# Split data into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)

# Train adecision tree model
print('Training a decision tree model...')
model = DecisionTreeClassifier().fit(X_train, y_train)

# calculate accuracy
y_hat = model.predict(X_test)
acc = np.average(y_hat == y_test)
print('Accuracy:', acc)
run.log('Accuracy', np.float(acc))

# calculate AUC
y_scores = model.predict_proba(X_test)
auc = roc_auc_score(y_test,y_scores[:,1])
print('AUC: ' + str(auc))
run.log('AUC', np.float(auc))

# plot ROC curve
fpr, tpr, thresholds = roc_curve(y_test, y_scores[:,1])
fig = plt.figure(figsize=(6, 4))
# Plot the diagonal 50% line
plt.plot([0, 1], [0, 1], 'k--')
# Plot the FPR and TPR achieved by our model
plt.plot(fpr, tpr)
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
run.log_image(name = "ROC", plot = fig)
plt.show()

# Save the trained model in the outputs folder
print("Saving model...")
os.makedirs('outputs', exist_ok=True)
model_file = os.path.join('outputs', 'diabetes_model.pkl')
joblib.dump(value=model, filename=model_file)

# Register the model
print('Registering model...')
Model.register(workspace=run.experiment.workspace,
               model_path = model_file,
               model_name = 'diabetes_model',
               tags={'Training context':'Pipeline'},
               properties={'AUC': np.float(auc), 'Accuracy': np.float(acc)})


run.complete()

Writing diabetes_pipeline/train_diabetes.py


What it does: This script (train_diabetes.py) trains a Decision Tree model using the preprocessed data. It:
Loads the prepared data.
Splits it into training and test sets.
Trains a Decision Tree classifier.
Evaluates the model's performance by calculating accuracy and AUC (Area Under the Curve).
Plots the ROC curve to visualize the model's performance.
Saves the trained model and registers it in the Azure workspace.
Why it's useful: This script automates the model training process, allowing you to quickly evaluate and register the model, making it available for deployment.

## Prepare a compute environment for the pipeline steps

In this exercise, you'll use the same compute for both steps, but it's important to realize that each step is run independently; so you could specify different compute contexts for each step if appropriate.

First, get the compute target you created in a previous lab (if it doesn't exist, it will be created).

> **Important**: Change *your-compute-cluster* to the name of your compute cluster in the code below before running it! Cluster names must be globally unique names between 2 to 16 characters in length. Valid characters are letters, digits, and the - character.

In [7]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

cluster_name = ""

try:
    # Check for existing compute target
    pipeline_cluster = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    # If it doesn't already exist, create it
    try:
        compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS11_V2', max_nodes=2)
        pipeline_cluster = ComputeTarget.create(ws, cluster_name, compute_config)
        pipeline_cluster.wait_for_completion(show_output=True)
    except Exception as ex:
        print(ex)
    

InProgress..
SucceededProvisioning operation finished, operation "Succeeded"
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


What it does: This block checks for an existing compute cluster. If it doesn’t exist, it creates a new one with specified VM size and maximum nodes.
Why it's useful: A compute cluster provides the resources (CPUs/GPUs) needed for running the pipeline steps, especially when dealing with large datasets or complex models.


The compute will require a Python environment with the necessary package dependencies installed, so you'll need to create a run configuration.

In [8]:
from azureml.core import Environment
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.runconfig import RunConfiguration

# Create a Python environment for the experiment
diabetes_env = Environment("diabetes-pipeline-env")
diabetes_env.python.user_managed_dependencies = False # Let Azure ML manage dependencies
diabetes_env.docker.enabled = True # Use a docker container

# Create a set of package dependencies
diabetes_packages = CondaDependencies.create(conda_packages=['scikit-learn','ipykernel','matplotlib','pandas','pip'],
                                             pip_packages=['azureml-defaults','azureml-dataprep[pandas]','pyarrow'])

# Add the dependencies to the environment
diabetes_env.python.conda_dependencies = diabetes_packages

# Register the environment 
diabetes_env.register(workspace=ws)
registered_env = Environment.get(ws, 'diabetes-pipeline-env')

# Create a new runconfig object for the pipeline
pipeline_run_config = RunConfiguration()

# Use the compute you created above. 
pipeline_run_config.target = pipeline_cluster

# Assign the environment to the run configuration
pipeline_run_config.environment = registered_env

print ("Run configuration created.")

'enabled' is deprecated. Please use the azureml.core.runconfig.DockerConfiguration object with the 'use_docker' param instead.


Run configuration created.


## Create and run a pipeline

Now you're ready to create and run a pipeline.

First you need to define the steps for the pipeline, and any data references that need to passed between them. In this case, the first step must write the prepared data to a folder that can be read from by the second step. Since the steps will be run on remote compute (and in fact, could each be run on different compute), the folder path must be passed as a data reference to a location in a datastore within the workspace. The **PipelineData** object is a special kind of data reference that is used for interim storage locations that can be passed between pipeline steps, so you'll create one and use at as the output for the first step and the input for the second step. Note that you also need to pass it as a script argument so our code can access the datastore location referenced by the data reference.

In [9]:
from azureml.pipeline.core import PipelineData
from azureml.pipeline.steps import PythonScriptStep

# Get the training dataset
diabetes_ds = ws.datasets.get("diabetes dataset")

# Create a PipelineData (temporary Data Reference) for the model folder
prepped_data_folder = PipelineData("prepped_data_folder", datastore=ws.get_default_datastore())

# Step 1, Run the data prep script
train_step = PythonScriptStep(name = "Prepare Data",
                                source_directory = experiment_folder,
                                script_name = "prep_diabetes.py",
                                arguments = ['--input-data', diabetes_ds.as_named_input('raw_data'),
                                             '--prepped-data', prepped_data_folder],
                                outputs=[prepped_data_folder],
                                compute_target = pipeline_cluster,
                                runconfig = pipeline_run_config,
                                allow_reuse = True)

# Step 2, run the training script
register_step = PythonScriptStep(name = "Train and Register Model",
                                source_directory = experiment_folder,
                                script_name = "train_diabetes.py",
                                arguments = ['--training-folder', prepped_data_folder],
                                inputs=[prepped_data_folder],
                                compute_target = pipeline_cluster,
                                runconfig = pipeline_run_config,
                                allow_reuse = True)

print("Pipeline steps defined")

Pipeline steps defined


OK, you're ready build the pipeline from the steps you've defined and run it as an experiment.

In [10]:
from azureml.core import Experiment
from azureml.pipeline.core import Pipeline
from azureml.widgets import RunDetails

# Construct the pipeline
pipeline_steps = [train_step, register_step]
pipeline = Pipeline(workspace=ws, steps=pipeline_steps)
print("Pipeline is built.")

# Create an experiment and run the pipeline
experiment = Experiment(workspace=ws, name = 'mslearn-diabetes-pipeline')
pipeline_run = experiment.submit(pipeline, regenerate_outputs=True)
print("Pipeline submitted for execution.")
RunDetails(pipeline_run).show()
pipeline_run.wait_for_completion(show_output=True)

Pipeline is built.
Created step Prepare Data [3695b0ef][2300b8dd-686b-4777-a406-d54f70bb036c], (This step will run and generate new outputs)
Created step Train and Register Model [1fcc1c9d][e132ad5a-df1d-4972-8b5b-b6488113026c], (This step will run and generate new outputs)
Submitted PipelineRun b23a3757-6fe1-452d-bafb-e63188c7442d
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/b23a3757-6fe1-452d-bafb-e63188c7442d?wsid=/subscriptions/69c8f7e5-d34a-43ca-8fc8-17cce3dd848d/resourcegroups/saifahmed.k-rg/workspaces/coursera&tid=44f1ecac-984f-479a-8b55-63bb294261f6
Pipeline submitted for execution.


2024-10-13 12:40:14.782417: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-10-13 12:40:15.636656: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-10-13 12:40:15.871588: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-10-13 12:40:17.715458: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


_PipelineWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', …

PipelineRunId: b23a3757-6fe1-452d-bafb-e63188c7442d
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/b23a3757-6fe1-452d-bafb-e63188c7442d?wsid=/subscriptions/69c8f7e5-d34a-43ca-8fc8-17cce3dd848d/resourcegroups/saifahmed.k-rg/workspaces/coursera&tid=44f1ecac-984f-479a-8b55-63bb294261f6
PipelineRun Status: Running


StepRunId: 27e3507a-d999-4397-9e1e-4c23d5de3d95
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/27e3507a-d999-4397-9e1e-4c23d5de3d95?wsid=/subscriptions/69c8f7e5-d34a-43ca-8fc8-17cce3dd848d/resourcegroups/saifahmed.k-rg/workspaces/coursera&tid=44f1ecac-984f-479a-8b55-63bb294261f6
StepRun( Prepare Data ) Status: Running

Streaming azureml-logs/20_image_build_log.txt
The run ID for the image build on serverless compute is imgbldrun_e0cd2ee
Additional logs for the run: https://ml.azure.com/experiments/id/prepare_image/runs/imgbldrun_e0cd2ee?wsid=/subscriptions/69c8f7e5-d34a-43ca-8fc8-17cce3dd848d/resourcegroups/saifahmed.k-rg/workspaces/Coursera&

Ran into a deserialization error. Ignoring since this is failsafe deserialization
Traceback (most recent call last):
  File "/anaconda/envs/azureml_py38/lib/python3.10/site-packages/msrest/serialization.py", line 1509, in failsafe_deserialize
    return self(target_obj, data, content_type=content_type)
  File "/anaconda/envs/azureml_py38/lib/python3.10/site-packages/msrest/serialization.py", line 1375, in __call__
    data = self._unpack_content(response_data, content_type)
  File "/anaconda/envs/azureml_py38/lib/python3.10/site-packages/msrest/serialization.py", line 1543, in _unpack_content
    raise ValueError("This pipeline didn't have the RawDeserializer policy; can't deserialize")
ValueError: This pipeline didn't have the RawDeserializer policy; can't deserialize
[NOT_SUPPORTED_API_USE_ATTEMPT] The [_DatasetClient.get] API has been deprecated and is no longer supported
Ran into a deserialization error. Ignoring since this is failsafe deserialization
Traceback (most recent call la

'Finished'

A graphical representation of the pipeline experiment will be displayed in the widget as it runs. keep an eye on the kernel indicator at the top right of the page, when it turns from **&#9899;** to **&#9711;**, the code has finished running. You can also monitor pipeline runs in the **Experiments** page in [Azure Machine Learning studio](https://ml.azure.com).

When the pipeline has finished, you can examine the metrics recorded by it's child runs.

In [11]:
for run in pipeline_run.get_children():
    print(run.name, ':')
    metrics = run.get_metrics()
    for metric_name in metrics:
        print('\t',metric_name, ":", metrics[metric_name])

Train and Register Model :
	 AUC : 0.8850872543910331
	 Accuracy : 0.9
	 ROC : aml://artifactId/ExperimentRun/dcid.45b02106-8813-46c6-8289-cbe0d6657fde/ROC_1728824235.png
Prepare Data :
	 raw_rows : 15000
	 processed_rows : 15000


Assuming the pipeline was successful, a new model should be registered with a *Training context* tag indicating it was trained in a pipeline. Run the following code to verify this.

In [12]:
from azureml.core import Model

for model in Model.list(ws):
    print(model.name, 'version:', model.version)
    for tag_name in model.tags:
        tag = model.tags[tag_name]
        print ('\t',tag_name, ':', tag)
    for prop_name in model.properties:
        prop = model.properties[prop_name]
        print ('\t',prop_name, ':', prop)
    print('\n')

diabetes_model version: 2
	 Training context : Pipeline
	 AUC : 0.8850872543910331
	 Accuracy : 0.9


diabetes_model version: 1
	 Training context : Compute cluster
	 AUC : 0.88375696004516
	 Accuracy : 0.8986666666666666


amlstudio-predict-penguin-clus version: 1
	 CreatedByAMLStudio : true


amlstudio-predict-diabetes version: 1
	 CreatedByAMLStudio : true


amlstudio-predict-auto-price-1 version: 1
	 CreatedByAMLStudio : true


amlstudio-predict-auto-price version: 1
	 CreatedByAMLStudio : true


stoicpear99fgs621 version: 1




## Publish the pipeline

After you've created and tested a pipeline, you can publish it as a REST service.

In [13]:
# Publish the pipeline from the run
published_pipeline = pipeline_run.publish_pipeline(
    name="diabetes-training-pipeline", description="Trains diabetes model", version="1.0")

published_pipeline

Name,Id,Status,Endpoint
diabetes-training-pipeline,c9e1fd4d-3ece-445d-8ff6-99b390a15710,Active,REST Endpoint


Note that the published pipeline has an endpoint, which you can see in the **Endpoints** page (on the **Pipeline Endpoints** tab) in [Azure Machine Learning studio](https://ml.azure.com). You can also find its URI as a property of the published pipeline object:

In [14]:
rest_endpoint = published_pipeline.endpoint
print(rest_endpoint)

https://eastus2.api.azureml.ms/pipelines/v1.0/subscriptions/69c8f7e5-d34a-43ca-8fc8-17cce3dd848d/resourceGroups/saifahmed.k-rg/providers/Microsoft.MachineLearningServices/workspaces/coursera/PipelineRuns/PipelineSubmit/c9e1fd4d-3ece-445d-8ff6-99b390a15710


## Call the pipeline endpoint

To use the endpoint, client applications need to make a REST call over HTTP. This request must be authenticated, so an authorization header is required. A real application would require a service principal with which to be authenticated, but to test this out, we'll use the authorization header from your current connection to your Azure workspace, which you can get using the following code:

In [15]:
from azureml.core.authentication import InteractiveLoginAuthentication

interactive_auth = InteractiveLoginAuthentication()
auth_header = interactive_auth.get_authentication_header()
print("Authentication header ready.")

Authentication header ready.


Now we're ready to call the REST interface. The pipeline runs asynchronously, so we'll get an identifier back, which we can use to track the pipeline experiment as it runs:

In [16]:
import requests

experiment_name = 'mslearn-diabetes-pipeline'

rest_endpoint = published_pipeline.endpoint
response = requests.post(rest_endpoint, 
                         headers=auth_header, 
                         json={"ExperimentName": experiment_name})
run_id = response.json()["Id"]
run_id

'88c39058-c256-49b1-accf-1ad3cffdd972'

Since you have the run ID, you can use it to wait for the run to complete.

> **Note**: The pipeline should complete quickly, because each step was configured to allow output reuse. This was done primarily for convenience and to save time in this course. In reality, you'd likely want the first step to run every time in case the data has changed, and trigger the subsequent steps only if the output from step one changes.

In [17]:
from azureml.pipeline.core.run import PipelineRun

published_pipeline_run = PipelineRun(ws.experiments[experiment_name], run_id)
pipeline_run.wait_for_completion(show_output=True)

PipelineRunId: b23a3757-6fe1-452d-bafb-e63188c7442d
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/b23a3757-6fe1-452d-bafb-e63188c7442d?wsid=/subscriptions/69c8f7e5-d34a-43ca-8fc8-17cce3dd848d/resourcegroups/saifahmed.k-rg/workspaces/coursera&tid=44f1ecac-984f-479a-8b55-63bb294261f6

PipelineRun Execution Summary
PipelineRun Status: Finished
{'runId': 'b23a3757-6fe1-452d-bafb-e63188c7442d', 'status': 'Completed', 'startTimeUtc': '2024-10-13T12:39:48.857317Z', 'endTimeUtc': '2024-10-13T12:57:25.81599Z', 'services': {}, 'properties': {'azureml.runsource': 'azureml.PipelineRun', 'runSource': 'SDK', 'runType': 'SDK', 'azureml.parameters': '{}', 'azureml.continue_on_step_failure': 'False', 'azureml.continue_on_failed_optional_input': 'True', 'azureml.pipelineComponent': 'pipelinerun', 'azureml.pipelines.stages': '{"Initialization":null,"Execution":{"StartTime":"2024-10-13T12:39:49.1168221+00:00","EndTime":"2024-10-13T12:57:25.7108594+00:00","Status":"Finished"}}'}, 'inputD

'Finished'

## Schedule the Pipeline

Suppose the clinic for the diabetes patients collects new data each week, and adds it to the dataset. You could run the pipeline every week to retrain the model with the new data.

In [18]:
from azureml.pipeline.core import ScheduleRecurrence, Schedule

# Submit the Pipeline every Monday at 00:00 UTC
recurrence = ScheduleRecurrence(frequency="Week", interval=1, week_days=["Monday"], time_of_day="00:00")
weekly_schedule = Schedule.create(ws, name="weekly-diabetes-training", 
                                  description="Based on time",
                                  pipeline_id=published_pipeline.id, 
                                  experiment_name='mslearn-diabetes-pipeline', 
                                  recurrence=recurrence)
print('Pipeline scheduled.')

Pipeline scheduled.


You can retrieve the schedules that are defined in the workspace like this:

In [19]:
schedules = Schedule.list(ws)
schedules

[Pipeline(Name: weekly-diabetes-training,
 Id: da7e300c-8687-4961-b026-747bf7b3d0ed,
 Status: Active,
 Pipeline Id: c9e1fd4d-3ece-445d-8ff6-99b390a15710,
 Pipeline Endpoint Id: None,
 Recurrence Details: Runs at 0:00 on Monday every Week)]

You can check the latest run like this:

In [20]:
pipeline_experiment = ws.experiments.get('mslearn-diabetes-pipeline')
latest_run = list(pipeline_experiment.get_runs())[0]

latest_run.get_details()

{'runId': 'b23a3757-6fe1-452d-bafb-e63188c7442d',
 'status': 'Completed',
 'startTimeUtc': '2024-10-13T12:39:48.857317Z',
 'endTimeUtc': '2024-10-13T12:57:25.81599Z',
 'services': {},
 'properties': {'azureml.runsource': 'azureml.PipelineRun',
  'runSource': 'SDK',
  'runType': 'SDK',
  'azureml.parameters': '{}',
  'azureml.continue_on_step_failure': 'False',
  'azureml.continue_on_failed_optional_input': 'True',
  'azureml.pipelineComponent': 'pipelinerun',
  'azureml.pipelines.stages': '{"Initialization":null,"Execution":{"StartTime":"2024-10-13T12:39:49.1168221+00:00","EndTime":"2024-10-13T12:57:25.7108594+00:00","Status":"Finished"}}'},
 'inputDatasets': [],
 'outputDatasets': [],
 'logFiles': {'logs/azureml/executionlogs.txt': 'https://coursera1546744657.blob.core.windows.net/azureml/ExperimentRun/dcid.b23a3757-6fe1-452d-bafb-e63188c7442d/logs/azureml/executionlogs.txt?sv=2019-07-07&sr=b&sig=joHEwGdrxt31D6sj%2B3IU7J2kfblmcZ%2Fgkjq5LF6ejyc%3D&skoid=a0663b13-76d9-4a7b-bd33-d1f25a0d

This is a simple example, designed to demonstrate the principle. In reality, you could build more sophisticated logic into the pipeline steps - for example, evaluating the model against some test data to calculate a performance metric like AUC or accuracy, comparing the metric to that of any previously registered versions of the model, and only registering the new model if it performs better.

You can use the [Azure Machine Learning extension for Azure DevOps](https://marketplace.visualstudio.com/items?itemName=ms-air-aiagility.vss-services-azureml) to combine Azure ML pipelines with Azure DevOps pipelines (yes, it *is* confusing that they have the same name!) and integrate model retraining into a *continuous integration/continuous deployment (CI/CD)* process. For example you could use an Azure DevOps *build* pipeline to trigger an Azure ML pipeline that trains and registers a model, and when the model is registered it could trigger an Azure Devops *release* pipeline that deploys the model as a web service, along with the application or service that consumes the model.