# Work with Compute

When you run a script as an Azure Machine Learning experiment, you need to define the execution context for the experiment run. The execution context is made up of:

* The Python environment for the script, which must include all Python packages used in the script.
* The compute target on which the script will be run. This could be the local workstation from which the experiment run is initiated, or a remote compute target such as a training cluster that is provisioned on-demand.

In this notebook, you'll explore *environments* and *compute targets* for experiments.

## Install the Azure Machine Learning SDK

The Azure Machine Learning SDK is updated frequently. Run the following cell to upgrade to the latest release, along with the additional package to support notebook widgets.

This notebook is designed to show how to train and manage machine learning models using Azure Machine Learning (Azure ML), which is a cloud-based service that helps data scientists run experiments, manage models, and use scalable resources without worrying about hardware limitations. 

## 

## Install the Necessary Tools (SDK):
The notebook starts by installing the Azure ML SDK, which is a toolkit that allows you to interact with Azure ML services from your local environment (like Jupyter Notebooks).

**Why?** So that you can easily connect to Azure's powerful machine learning tools and manage everything from your local computer.

In [1]:
!pip install --upgrade azureml-sdk azureml-widgets



## Connect to your workspace

With the latest version of the SDK installed, now you're ready to connect to your workspace.

> **Note**: If you haven't already established an authenticated session with your Azure subscription, you'll be prompted to authenticate by clicking a link, entering an authentication code, and signing into Azure.

## Connect to Azure Workspace:

Next, the notebook connects to your Azure ML Workspace, which is like your home base in the cloud where all your experiments, models, and data are stored.

**Why?** To make sure you can access the workspace where you will run your experiments and store your results.

In [2]:
import azureml.core
from azureml.core import Workspace

# Load the workspace from the saved config file
ws = Workspace.from_config()
print('Ready to use Azure ML {} to work with {}'.format(azureml.core.VERSION, ws.name))

Ready to use Azure ML 1.57.0 to work with coursera


## Prepare data for an experiment

In this notebook, you'll use a dataset containing details of diabetes patients. Run the cell below to create this dataset (if it already exists, the code will find the existing version)

## Upload and Register Data:
It uploads a dataset (diabetes data) to the Azure ML Datastore and registers it. This dataset is used to train machine learning models.

**Why?** So that the data is available in the cloud and can be used for multiple experiments without uploading it every time.

In [3]:
from azureml.core import Dataset

default_ds = ws.get_default_datastore()

if 'diabetes dataset' not in ws.datasets:
    default_ds.upload_files(files=['./data/diabetes.csv', './data/diabetes2.csv'], # Upload the diabetes csv files in /data
                        target_path='diabetes-data/', # Put it in a folder path in the datastore
                        overwrite=True, # Replace existing files of the same name
                        show_progress=True)

    #Create a tabular dataset from the path on the datastore (this may take a short while)
    tab_data_set = Dataset.Tabular.from_delimited_files(path=(default_ds, 'diabetes-data/*.csv'))

    # Register the tabular dataset
    try:
        tab_data_set = tab_data_set.register(workspace=ws, 
                                name='diabetes dataset',
                                description='diabetes data',
                                tags = {'format':'CSV'},
                                create_new_version=True)
        print('Dataset registered.')
    except Exception as ex:
        print(ex)
else:
    print('Dataset already registered.')

Dataset already registered.


## Create a training script

Run the following two cells to create:
1. A folder for a new experiment
2. An training script file that uses **scikit-learn** to train a model and **matplotlib** to plot a ROC curve.

## Create a Training Script:
A Python script is created that uses a Logistic Regression model (from scikit-learn) to predict whether patients have diabetes based on their medical data. The script also calculates metrics like accuracy and AUC (a measure of model performance), and plots an ROC curve.

**Why?** The script is the heart of the machine learning model, and this is where the actual training happens. It defines the model, how the data is used, and what metrics will be tracked.

In [4]:
import os

# Create a folder for the experiment files
experiment_folder = 'diabetes_training_logistic'
os.makedirs(experiment_folder, exist_ok=True)
print(experiment_folder, 'folder created')

diabetes_training_logistic folder created


In [5]:
%%writefile $experiment_folder/diabetes_training.py
# Import libraries
import argparse
from azureml.core import Run
import pandas as pd
import numpy as np
import joblib
import os
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve
import matplotlib.pyplot as plt

# Get script arguments
parser = argparse.ArgumentParser()
parser.add_argument('--regularization', type=float, dest='reg_rate', default=0.01, help='regularization rate')
parser.add_argument("--input-data", type=str, dest='training_dataset_id', help='training dataset')
args = parser.parse_args()

# Set regularization hyperparameter
reg = args.reg_rate

# Get the experiment run context
run = Run.get_context()

# load the diabetes data (passed as an input dataset)
print("Loading Data...")
diabetes = run.input_datasets['training_data'].to_pandas_dataframe()

# Separate features and labels
X, y = diabetes[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, diabetes['Diabetic'].values

# Split data into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)

# Train a logistic regression model
print('Training a logistic regression model with regularization rate of', reg)
run.log('Regularization Rate',  np.float(reg))
model = LogisticRegression(C=1/reg, solver="liblinear").fit(X_train, y_train)

# calculate accuracy
y_hat = model.predict(X_test)
acc = np.average(y_hat == y_test)
print('Accuracy:', acc)
run.log('Accuracy', np.float(acc))

# calculate AUC
y_scores = model.predict_proba(X_test)
auc = roc_auc_score(y_test,y_scores[:,1])
print('AUC: ' + str(auc))
run.log('AUC', np.float(auc))

# plot ROC curve
fpr, tpr, thresholds = roc_curve(y_test, y_scores[:,1])
fig = plt.figure(figsize=(6, 4))
# Plot the diagonal 50% line
plt.plot([0, 1], [0, 1], 'k--')
# Plot the FPR and TPR achieved by our model
plt.plot(fpr, tpr)
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
run.log_image(name = "ROC", plot = fig)
plt.show()

os.makedirs('outputs', exist_ok=True)
# note file saved in the outputs folder is automatically uploaded into experiment record
joblib.dump(value=model, filename='outputs/diabetes_model.pkl')

run.complete()

Overwriting diabetes_training_logistic/diabetes_training.py


## Define an environment

When you run a Python script as an experiment in Azure Machine Learning, a Conda environment is created to define the execution context for the script. Azure Machine Learning provides a default environment that includes many common packages; including the **azureml-defaults** package that contains the libraries necessary for working with an experiment run, as well as popular packages like **pandas** and **numpy**.

You can also define your own environment and add packages by using **conda** or **pip**, to ensure your experiment has access to all the libraries it requires.

> **Note**: The conda dependencies are installed first, followed by the pip dependencies. Since the **pip** package is required to install the pip dependencies, it's good practice to include it in the conda dependencies (Azure ML will install it for you if you forget, but you'll see a warning in the log!)

## Define the Environment:
An Environment is set up in Azure. This defines all the packages (like scikit-learn, pandas, etc.) that the script needs to run. It ensures the right tools are available during training.

**Why?** To make sure the model runs in the right Python environment with all the necessary libraries installed. It makes the experiment reproducible, so you can run it again without worrying about missing dependencies.

In [6]:
from azureml.core import Environment
from azureml.core.conda_dependencies import CondaDependencies

# Create a Python environment for the experiment
diabetes_env = Environment("diabetes-experiment-env")
diabetes_env.python.user_managed_dependencies = False # Let Azure ML manage dependencies
diabetes_env.docker.enabled = True # Use a docker container

# Create a set of package dependencies (conda or pip as required)
diabetes_packages = CondaDependencies.create(conda_packages=['scikit-learn','ipykernel','matplotlib','pandas','pip'],
                                             pip_packages=['azureml-sdk','pyarrow'])

# Add the dependencies to the environment
diabetes_env.python.conda_dependencies = diabetes_packages

print(diabetes_env.name, 'defined.')

'enabled' is deprecated. Please use the azureml.core.runconfig.DockerConfiguration object with the 'use_docker' param instead.


diabetes-experiment-env defined.


Now you can use the environment to run a script as an experiment.

The following code assigns the environment you created to a ScriptRunConfig, and submits an experiment. As the experiment runs, observe the run details in the widget and in the **azureml_logs/60_control_log.txt** output log, you'll see the conda environment being built.

## Run the Experiment:
The training script is then run as an experiment in Azure ML. You can monitor its progress, check results (like accuracy), and view visualizations (like the ROC curve).

**Why?** Azure ML helps you track every experiment, including its results and configurations, which is super useful for comparing different models.

In [7]:
from azureml.core import Experiment, ScriptRunConfig, Environment
from azureml.core.conda_dependencies import CondaDependencies
from azureml.widgets import RunDetails

# Get the training dataset
diabetes_ds = ws.datasets.get("diabetes dataset")

# Create a script config
script_config = ScriptRunConfig(source_directory=experiment_folder,
                                script='diabetes_training.py',
                                arguments = ['--regularization', 0.1, # Regularizaton rate parameter
                                             '--input-data', diabetes_ds.as_named_input('training_data')], # Reference to dataset
                                environment=diabetes_env) 

# submit the experiment
experiment_name = 'mslearn-train-diabetes'
experiment = Experiment(workspace=ws, name=experiment_name)
run = experiment.submit(config=script_config)
RunDetails(run).show()
run.wait_for_completion()

2024-10-13 11:01:18.136252: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-10-13 11:01:18.998908: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-10-13 11:01:19.256360: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-10-13 11:01:21.199623: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

{'runId': 'mslearn-train-diabetes_1728817237_c8c59c09',
 'target': 'local',
 'status': 'Completed',
 'startTimeUtc': '2024-10-13T11:04:01.053718Z',
 'endTimeUtc': '2024-10-13T11:04:19.580812Z',
 'services': {},
 'properties': {'_azureml.ComputeTargetType': 'local',
  '_azureml.ClusterName': 'local',
  'ContentSnapshotId': 'ce35aee4-2bfd-42e4-9128-a2415a0622e3'},
 'inputDatasets': [{'dataset': {'id': 'c00880ed-3c23-4653-880a-8b6d907b9ab1'}, 'consumptionDetails': {'type': 'RunInput', 'inputName': 'training_data', 'mechanism': 'Direct'}}],
 'outputDatasets': [],
 'runDefinition': {'script': 'diabetes_training.py',
  'command': '',
  'useAbsolutePath': False,
  'arguments': ['--regularization',
   '0.1',
   '--input-data',
   'DatasetConsumptionConfig:training_data'],
  'sourceDirectoryDataStore': None,
  'framework': 'Python',
  'communicator': 'None',
  'target': 'local',
  'dataReferences': {},
  'data': {'training_data': {'dataLocation': {'dataset': {'id': 'c00880ed-3c23-4653-880a-8b6d

The experiment successfully used the environment, which included all of the packages it required - you can view the metrics and outputs from the experiment run in Azure Machine Learning Studio, or by running the code below - including the model trained using **scikit-learn** and the ROC chart image generated using **matplotlib**.

In [8]:
# Get logged metrics
metrics = run.get_metrics()
for key in metrics.keys():
        print(key, metrics.get(key))
print('\n')
for file in run.get_file_names():
    print(file)

Regularization Rate 0.1
Accuracy 0.7893333333333333
AUC 0.8568929332039168
ROC aml://artifactId/ExperimentRun/dcid.mslearn-train-diabetes_1728817237_c8c59c09/ROC_1728817447.png


ROC_1728817447.png
azureml-logs/60_control_log.txt
azureml-logs/70_driver_log.txt
logs/azureml/7_azureml.log
outputs/diabetes_model.pkl


## Register the environment

Having gone to the trouble of defining an environment with the packages you need, you can register it in the workspace.

## Reuse and Register the Environment:
After successfully running the experiment, the environment is registered so it can be reused. This saves time because you don't need to redefine it each time you run a similar experiment.

**Why?** It saves time and ensures consistency in future experiments that use the same setup.

In [9]:
# Register the environment
diabetes_env.register(workspace=ws)

{
    "assetId": "azureml://locations/eastus2/workspaces/fbff0866-df0b-44e6-ad41-a794abec9d1a/environments/diabetes-experiment-env/versions/1",
    "databricks": {
        "eggLibraries": [],
        "jarLibraries": [],
        "mavenLibraries": [],
        "pypiLibraries": [],
        "rcranLibraries": []
    },
    "docker": {
        "arguments": [],
        "baseDockerfile": null,
        "baseImage": "mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:20240709.v1",
        "baseImageRegistry": {
            "address": null,
            "password": null,
            "registryIdentity": null,
            "username": null
        },
        "buildContext": null,
        "enabled": true,
        "platform": {
            "architecture": "amd64",
            "os": "Linux"
        },
        "sharedVolumes": true,
        "shmSize": null
    },
    "environmentVariables": {
        "EXAMPLE_ENV_VAR": "EXAMPLE_VALUE"
    },
    "inferencingStackVersion": null,
    "name": "diabetes-exper

Note that the environment is registered with the name you assigned when you first created it (in this case, *diabetes-experiment-env*).

With the environment registered, you can reuse it for any scripts that have the same requirements. For example, let's create a folder and script to train a diabetes model using a different algorithm:

In [10]:
import os

# Create a folder for the experiment files
experiment_folder = 'diabetes_training_tree'
os.makedirs(experiment_folder, exist_ok=True)
print(experiment_folder, 'folder created')

diabetes_training_tree folder created


In [11]:
%%writefile $experiment_folder/diabetes_training.py
# Import libraries
import argparse
from azureml.core import Run
import pandas as pd
import numpy as np
import joblib
import os
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve
import matplotlib.pyplot as plt

# Get script arguments
parser = argparse.ArgumentParser()
parser.add_argument("--input-data", type=str, dest='training_dataset_id', help='training dataset')
args = parser.parse_args()

# Get the experiment run context
run = Run.get_context()

# load the diabetes data (passed as an input dataset)
print("Loading Data...")
diabetes = run.input_datasets['training_data'].to_pandas_dataframe()

# Separate features and labels
X, y = diabetes[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, diabetes['Diabetic'].values

# Split data into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)

# Train a decision tree model
print('Training a decision tree model')
model = DecisionTreeClassifier().fit(X_train, y_train)

# calculate accuracy
y_hat = model.predict(X_test)
acc = np.average(y_hat == y_test)
print('Accuracy:', acc)
run.log('Accuracy', np.float(acc))

# calculate AUC
y_scores = model.predict_proba(X_test)
auc = roc_auc_score(y_test,y_scores[:,1])
print('AUC: ' + str(auc))
run.log('AUC', np.float(auc))

# plot ROC curve
fpr, tpr, thresholds = roc_curve(y_test, y_scores[:,1])
fig = plt.figure(figsize=(6, 4))
# Plot the diagonal 50% line
plt.plot([0, 1], [0, 1], 'k--')
# Plot the FPR and TPR achieved by our model
plt.plot(fpr, tpr)
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
run.log_image(name = "ROC", plot = fig)
plt.show()

os.makedirs('outputs', exist_ok=True)
# note file saved in the outputs folder is automatically uploaded into experiment record
joblib.dump(value=model, filename='outputs/diabetes_model.pkl')

run.complete()

Writing diabetes_training_tree/diabetes_training.py


Now you can retrieve the registered environment and use it in a new experiment that runs the alternative training script (there is no regularization parameter this time because a Decision Tree classifier doesn't require it).

In [12]:
from azureml.core import Experiment, ScriptRunConfig, Environment
from azureml.core.conda_dependencies import CondaDependencies
from azureml.widgets import RunDetails

# get the registered environment
registered_env = Environment.get(ws, 'diabetes-experiment-env')

# Get the training dataset
diabetes_ds = ws.datasets.get("diabetes dataset")

# Create a script config
script_config = ScriptRunConfig(source_directory=experiment_folder,
                              script='diabetes_training.py',
                              arguments = ['--input-data', diabetes_ds.as_named_input('training_data')], # Reference to dataset
                              environment=registered_env) 

# submit the experiment
experiment_name = 'mslearn-train-diabetes'
experiment = Experiment(workspace=ws, name=experiment_name)
run = experiment.submit(config=script_config)
RunDetails(run).show()
run.wait_for_completion()

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

{'runId': 'mslearn-train-diabetes_1728817498_0922f3d2',
 'target': 'local',
 'status': 'Finalizing',
 'startTimeUtc': '2024-10-13T11:04:59.364504Z',
 'services': {},
 'properties': {'_azureml.ComputeTargetType': 'local',
  '_azureml.ClusterName': 'local',
  'ContentSnapshotId': '91de5276-bdd4-4693-971e-2094f5537261'},
 'inputDatasets': [{'dataset': {'id': 'c00880ed-3c23-4653-880a-8b6d907b9ab1'}, 'consumptionDetails': {'type': 'RunInput', 'inputName': 'training_data', 'mechanism': 'Direct'}}],
 'outputDatasets': [],
 'runDefinition': {'script': 'diabetes_training.py',
  'command': '',
  'useAbsolutePath': False,
  'arguments': ['--input-data', 'DatasetConsumptionConfig:training_data'],
  'sourceDirectoryDataStore': None,
  'framework': 'Python',
  'communicator': 'None',
  'target': 'local',
  'dataReferences': {},
  'data': {'training_data': {'dataLocation': {'dataset': {'id': 'c00880ed-3c23-4653-880a-8b6d907b9ab1',
      'name': 'diabetes dataset',
      'version': '3'},
     'dataPat

This time the experiment runs more quickly because a matching environment has been cached from the previous run, so it doesn't need to be recreated on the local compute. However, even on a different compute target, the same environment would be created and used - ensuring consistency for your experiment script execution context.

Let's look at the metrics and outputs from the experiment.

In [13]:
# Get logged metrics
metrics = run.get_metrics()
for key in metrics.keys():
        print(key, metrics.get(key))
print('\n')
for file in run.get_file_names():
    print(file)

Accuracy 0.8975555555555556
AUC 0.8817800992920728
ROC aml://artifactId/ExperimentRun/dcid.mslearn-train-diabetes_1728817498_0922f3d2/ROC_1728817505.png


ROC_1728817505.png
azureml-logs/60_control_log.txt
azureml-logs/70_driver_log.txt
logs/azureml/8_azureml.log
outputs/diabetes_model.pkl


## View registered environments

In addition to registering your own environments, you can leverage pre-built "curated" environments for common experiment types. The following code lists all registered environments:

In [14]:
from azureml.core import Environment

envs = Environment.list(workspace=ws)
for env in envs:
    print("Name",env)

Name diabetes-experiment-env
Name AzureML-ACPT-pytorch-1.13-py38-cuda11.7-gpu


All curated environments have names that begin ***AzureML-*** (you can't use this prefix for your own environments).

Let's explore the curated environments in more depth and see what packages are included in each of them.

In [16]:
for env in envs:
    if env.startswith("AzureML"):
        print("Name:", env)
        # Check if conda_dependencies exists and is not None
        conda_deps = getattr(envs[env].python, 'conda_dependencies', None)
        if conda_deps is not None:
            print("Packages:", conda_deps.serialize_to_string())
        else:
            print("No conda dependencies available.")


Name: AzureML-ACPT-pytorch-1.13-py38-cuda11.7-gpu
No conda dependencies available.


## Create a compute cluster

In many cases, your local compute resources may not be sufficient to process a complex or long-running experiment that needs to process a large volume of data; and you may want to take advantage of the ability to dynamically create and use compute resources in the cloud. Azure Machine Learning supports a range of compute targets, which you can define in your workpace and use to run experiments; paying for the resources only when using them.

You can create a compute cluster in [Azure Machine Learning studio](https://ml.azure.com), or by using the Azure Machine Learning SDK. The following code cell checks your workspace for the existance of a compute cluster with a specified name, and if it doesn't exist, creates it.

> **Important**: Change *your-compute-cluster* to a suitable name for your compute cluster in the code below before running it - you can specify the name of an existing cluster if you have one. Cluster names must be globally unique names between 2 to 16 characters in length. Valid characters are letters, digits, and the - character.

## Create a Compute Cluster:
A Compute Cluster is created. This is a group of cloud-based virtual machines that can run your experiments if your local computer isn’t powerful enough.

**Why?** This is useful when you have large datasets or complex models that require more computing power than your local machine can provide.

In [17]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

cluster_name = "coursera3"

try:
    # Check for existing compute target
    training_cluster = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    # If it doesn't already exist, create it
    try:
        compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS11_V2', max_nodes=2)
        training_cluster = ComputeTarget.create(ws, cluster_name, compute_config)
        training_cluster.wait_for_completion(show_output=True)
    except Exception as ex:
        print(ex)

Found existing cluster, use it.


## Run an experiment on remote compute

Now you're ready to re-run the experiment you ran previously, but this time on the compute cluster you created. 

> **Note**: The experiment will take quite a lot longer because a container image must be built with the conda environment, and then the cluster nodes must be started and the image deployed before the script can be run. For a simple experiment like the diabetes training script, this may seem inefficient; but imagine you needed to run a more complex experiment that takes several hours - dynamically creating more scalable compute may reduce the overall time significantly.

## Run the Experiment on the Cluster:
The experiment is rerun, but this time on the compute cluster. This allows you to scale up and run more powerful experiments in the cloud.

**Why?** Using the cluster ensures that you can handle larger workloads and process data faster by taking advantage of cloud resources.


In [18]:
# Create a script config
script_config = ScriptRunConfig(source_directory=experiment_folder,
                                script='diabetes_training.py',
                                arguments = ['--input-data', diabetes_ds.as_named_input('training_data')],
                                environment=registered_env,
                                compute_target=cluster_name) 

# submit the experiment
experiment_name = 'mslearn-train-diabetes'
experiment = Experiment(workspace=ws, name=experiment_name)
run = experiment.submit(config=script_config)
RunDetails(run).show()

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

While you're waiting for the experiment to run, you can check on the status of the compute in the widget above or in [Azure Machine Learning studio](https://ml.azure.com). You can also check the status of the compute using the code below.

In [19]:
cluster_state = training_cluster.get_status()
print(cluster_state.allocation_state, cluster_state.current_node_count)

Steady 0


Note that it will take a while before the status changes from *steady* to *resizing* (now might be a good time to take a coffee break!). To block the kernel until the run completes, run the cell below.

In [20]:
run.wait_for_completion()

{'runId': 'mslearn-train-diabetes_1728818248_7efb7915',
 'target': 'coursera3',
 'status': 'Completed',
 'startTimeUtc': '2024-10-13T11:30:35.296681Z',
 'endTimeUtc': '2024-10-13T11:32:13.910148Z',
 'services': {},
 'properties': {'_azureml.ComputeTargetType': 'amlctrain',
  '_azureml.ClusterName': 'coursera3',
  'ContentSnapshotId': '91de5276-bdd4-4693-971e-2094f5537261',
  'ProcessInfoFile': 'azureml-logs/process_info.json',
  'ProcessStatusFile': 'azureml-logs/process_status.json'},
 'inputDatasets': [{'dataset': {'id': 'c00880ed-3c23-4653-880a-8b6d907b9ab1'}, 'consumptionDetails': {'type': 'RunInput', 'inputName': 'training_data', 'mechanism': 'Direct'}}],
 'outputDatasets': [],
 'runDefinition': {'script': 'diabetes_training.py',
  'command': '',
  'useAbsolutePath': False,
  'arguments': ['--input-data', 'DatasetConsumptionConfig:training_data'],
  'sourceDirectoryDataStore': None,
  'framework': 'Python',
  'communicator': 'None',
  'target': 'coursera3',
  'dataReferences': {},

After the experiment has finished, you can get the metrics and files generated by the experiment run. This time, the files will include logs for building the image and managing the compute.

In [21]:
# Get logged metrics
metrics = run.get_metrics()
for key in metrics.keys():
        print(key, metrics.get(key))
print('\n')
for file in run.get_file_names():
    print(file)

ROC aml://artifactId/ExperimentRun/dcid.mslearn-train-diabetes_1728818248_7efb7915/ROC_1728819118.png
Accuracy 0.8986666666666666
AUC 0.88375696004516


ROC_1728819118.png
azureml-logs/20_image_build_log.txt
logs/azureml/dataprep/0/rslex.log.2024-10-13-11
outputs/diabetes_model.pkl
system_logs/cs_capability/cs-capability.log
system_logs/hosttools_capability/hosttools-capability.log
system_logs/lifecycler/execution-wrapper.log
system_logs/lifecycler/lifecycler.log
system_logs/metrics_capability/metrics-capability.log
system_logs/snapshot_capability/snapshot-capability.log
user_logs/std_log.txt


Now you can register the model that was trained by the experiment.

## Register the Model:
Once the experiment finishes, the trained model is saved (or "registered") in Azure ML.

**Why? **The registered model is stored in the cloud and can be reused or deployed (for example, in an app) whenever needed.

In [22]:
from azureml.core import Model

# Register the model
run.register_model(model_path='outputs/diabetes_model.pkl', model_name='diabetes_model',
                   tags={'Training context':'Compute cluster'}, properties={'AUC': run.get_metrics()['AUC'], 'Accuracy': run.get_metrics()['Accuracy']})

# List registered models
for model in Model.list(ws):
    print(model.name, 'version:', model.version)
    for tag_name in model.tags:
        tag = model.tags[tag_name]
        print ('\t',tag_name, ':', tag)
    for prop_name in model.properties:
        prop = model.properties[prop_name]
        print ('\t',prop_name, ':', prop)
    print('\n')

diabetes_model version: 1
	 Training context : Compute cluster
	 AUC : 0.88375696004516
	 Accuracy : 0.8986666666666666


amlstudio-predict-penguin-clus version: 1
	 CreatedByAMLStudio : true


amlstudio-predict-diabetes version: 1
	 CreatedByAMLStudio : true


amlstudio-predict-auto-price-1 version: 1
	 CreatedByAMLStudio : true


amlstudio-predict-auto-price version: 1
	 CreatedByAMLStudio : true


stoicpear99fgs621 version: 1




Why is this useful?
Simplifies Experimentation:
You can keep track of every model you train, including its configuration, data, and results. This makes it easier to compare models and choose the best one.
Ensures Reproducibility:
By defining environments and registering models, you make sure that your experiments can be easily repeated and shared with others, ensuring consistency.
Scales Easily:
If you need more computational power, Azure ML allows you to scale up by creating compute clusters. This is especially useful for large datasets or complex models.
Centralized Management:
Everything (data, models, experiments) is managed centrally in Azure, so you don’t have to worry about storing or losing files on your local machine.
Model Deployment:
Once your model is trained and registered, Azure ML allows you to deploy it for real-world use. This means you can integrate it into applications, websites, or services.
How can this be useful in real-world projects?
Data Scientists: Azure ML can help data scientists streamline their workflows, manage large datasets, and automate model training and deployment.
Machine Learning Teams: Teams can collaborate by sharing environments, datasets, and models, making it easier to work together on large projects.
Businesses: Companies can use Azure ML to develop and deploy predictive models for business intelligence, customer insights, or product recommendations, without worrying about managing hardware.
In summary, this notebook walks you through using Azure ML to handle all aspects of machine learning experiments, from preparing data to scaling experiments and saving trained models—all with the help of cloud resources!

> **More Information**:
>
> - For more information about environments in Azure Machine Learning, see [Create & use software environments in Azure Machine Learning](https://docs.microsoft.com/azure/machine-learning/how-to-use-environments)
> - For more information about compute targets in Azure Machine Learning, see the [What are compute targets in Azure Machine Learning?](https://docs.microsoft.com/azure/machine-learning/concept-compute-target).