# Training Models

The central goal of machine learning is to train predictive models that can be used by applications. In Azure Machine Learning,  you can use scripts to train models leveraging common machine learning frameworks like Scikit-Learn, Tensorflow, PyTorch, SparkML, and others. You can run these training scripts as experiments in order to track metrics and outputs - in particular, the trained models.

## Before You Start

Before you start this lab, ensure that you have completed the *Create an Azure Machine Learning Workspace* and *Create a Compute Instance* tasks in [Lab 1: Getting Started with Azure Machine Learning](./labdocs/Lab01.md). Then open this notebook in Jupyter on your Compute Instance.

## Connect to Your Workspace

The first thing you need to do is to connect to your workspace using the Azure ML SDK.

> **Note**: If you do not have a current authenticated session with your Azure subscription, you'll be prompted to authenticate. Follow the instructions to authenticate using the code provided.

In [1]:
from azureml import core
from azureml.core import Workspace

ws = Workspace.from_config()
print('Ready to use Azure ML {} to work with {}'.format(core.VERSION, ws.name))

Ready to use Azure ML 1.10.0 to work with workspace


## Use an Estimator to Run the Script as an Experiment

You can run experiment scripts using a **RunConfiguration** and a **ScriptRunConfig**, or you can use an **Estimator**, which abstracts both of these configurations in a single object.

In this case, we'll use a generic **Estimator** object to run the training experiment. Note that the default environment for this estimator does not include the **scikit-learn** package, so you need to explicitly add that to the configuration. The conda environment is built on-demand the first time the estimator is used, and cached for future runs that use the same configuration; so the first run will take a little longer. On subsequent runs, the cached environment can be re-used so they'll complete more quickly.

In [2]:
from azureml.train import estimator

training_folder = 'diabetes-training'
config = estimator.Estimator(
    source_directory=training_folder,
    entry_script='diabetes_training.py',
    compute_target='local',
    conda_packages=['scikit-learn'],
)

experiment_name = 'diabetes-training'
experiment = core.Experiment(workspace = ws, name = experiment_name)

run = experiment.submit(config=config)
run.wait_for_completion(show_output=True)

RunId: diabetes-training_1597255046_f4e3b34b
Web View: https://ml.azure.com/experiments/diabetes-training/runs/diabetes-training_1597255046_f4e3b34b?wsid=/subscriptions/84170def-2683-47c0-91ed-1f34057afd69/resourcegroups/resources/workspaces/workspace

Streaming azureml-logs/60_control_log.txt

[2020-08-12T17:57:28.517895] Using urllib.request Python 3.0 or later
Streaming log file azureml-logs/60_control_log.txt
Starting the daemon thread to refresh tokens in background for process with pid = 4132
Running: ['/bin/bash', '/tmp/azureml_runs/diabetes-training_1597255046_f4e3b34b/azureml-environment-setup/docker_env_checker.sh']

Found materialized image on target: azureml/azureml_18a2c352852de1e0e7ad8b589dd0927b


Logging experiment running status in history service.
Running: ['docker', 'run', '--name', 'diabetes-training_1597255046_f4e3b34b', '--rm', '-v', '/tmp/azureml_runs/diabetes-training_1597255046_f4e3b34b:/azureml-run', '--shm-size', '2g', '-e', 'EXAMPLE_ENV_VAR=EXAMPLE_VALUE', '

{'runId': 'diabetes-training_1597255046_f4e3b34b',
 'target': 'local',
 'status': 'Completed',
 'startTimeUtc': '2020-08-12T17:57:30.567187Z',
 'endTimeUtc': '2020-08-12T17:57:43.27416Z',
 'properties': {'_azureml.ComputeTargetType': 'local',
  'ContentSnapshotId': '55f28bbb-e4e8-43c7-a63f-cf41349a138d',
  'azureml.git.repository_uri': 'https://github.com/susumuasaga/mslearn-aml-labs',
  'mlflow.source.git.repoURL': 'https://github.com/susumuasaga/mslearn-aml-labs',
  'azureml.git.branch': 'master',
  'mlflow.source.git.branch': 'master',
  'azureml.git.commit': '7b9034780f2e35afd404a7b9a5292a4e60194f77',
  'mlflow.source.git.commit': '7b9034780f2e35afd404a7b9a5292a4e60194f77',
  'azureml.git.dirty': 'False'},
 'inputDatasets': [],
 'runDefinition': {'script': 'diabetes_training.py',
  'scriptType': None,
  'useAbsolutePath': False,
  'arguments': [],
  'sourceDirectoryDataStore': None,
  'framework': 'Python',
  'communicator': 'None',
  'target': 'local',
  'dataReferences': {},
  'd

As with any experiment run, you can use the **RunDetails** widget to view information about the run and get a link to it in Azure Machine Learning studio.

In [3]:
from azureml import widgets

widgets.RunDetails(run).show()

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

You can also retrieve the metrics and outputs from the **Run** object.

In [4]:
metrics = run.get_metrics()
for key in metrics.keys():
        print(key, metrics[key])
print('\n')
for file in run.get_file_names():
    print(file)

Regularization Rate 0.01
Accuracy 0.774
AUC 0.8484929598487486


azureml-logs/60_control_log.txt
azureml-logs/70_driver_log.txt
logs/azureml/8_azureml.log
outputs/diabetes_model.pkl


## Register the Trained Model

Note that the outputs of the experiment include the trained model file (**diabetes_model.pkl**). You can register this model in your Azure Machine Learning workspace, making it possible to track model versions and retrieve them later.

In [5]:
run.register_model(
    model_path='outputs/diabetes_model.pkl', model_name='diabetes_model',
    tags={'Training context':'Estimator'},
    properties={'AUC': metrics['AUC'], 'Accuracy': metrics['Accuracy']},
)

for model in core.Model.list(ws):
    print(model.name, 'version:', model.version)
    for tag_name in model.tags:
        tag = model.tags[tag_name]
        print ('\t',tag_name, ':', tag)
    for prop_name in model.properties:
        prop = model.properties[prop_name]
        print ('\t',prop_name, ':', prop)
    print('\n')

diabetes_model version: 3
	 Training context : Estimator
	 AUC : 0.8484929598487486
	 Accuracy : 0.774


diabetes_model version: 2
	 Training context : Estimator
	 AUC : 0.8483377282451863
	 Accuracy : 0.774


diabetes_model version: 1
	 Training context : Estimator
	 AUC : 0.8483377282451863
	 Accuracy : 0.774




## Use a Framework-Specific Estimator

You used a generic **Estimator** class to run the training script, but you can also take advantage of framework-specific estimators that include environment definitions for common machine learning frameworks. In this case, you're using Scikit-Learn, so you can use the **SKLearn** estimator. This means that you don't need to specify the **scikit-learn** package in the configuration.

> **Note**: Once again, the training experiment uses a new environment; which must be created the first time it is run.

In [6]:
from azureml.train import sklearn

training_folder = 'diabetes-training-params'
config = sklearn.SKLearn(
    source_directory=training_folder,
    entry_script='diabetes_training.py',
    script_params = {'--reg_rate': 0.1},
    compute_target='local',
)

experiment_name = 'diabetes-training'
experiment = core.Experiment(workspace = ws, name = experiment_name)

run = experiment.submit(config=config)

widgets.RunDetails(run).show()
run.wait_for_completion()

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

{'runId': 'diabetes-training_1597256522_b3d9a8d6',
 'target': 'local',
 'status': 'Completed',
 'startTimeUtc': '2020-08-12T18:24:26.585943Z',
 'endTimeUtc': '2020-08-12T18:24:37.315532Z',
 'properties': {'_azureml.ComputeTargetType': 'local',
  'ContentSnapshotId': '3947fb4a-0cbe-46db-907f-86290d63d7b4',
  'azureml.git.repository_uri': 'https://github.com/susumuasaga/mslearn-aml-labs',
  'mlflow.source.git.repoURL': 'https://github.com/susumuasaga/mslearn-aml-labs',
  'azureml.git.branch': 'master',
  'mlflow.source.git.branch': 'master',
  'azureml.git.commit': 'd8f213ff6d4bc5c9fa726492f4e96a04002761ad',
  'mlflow.source.git.commit': 'd8f213ff6d4bc5c9fa726492f4e96a04002761ad',
  'azureml.git.dirty': 'False'},
 'inputDatasets': [],
 'runDefinition': {'script': 'diabetes_training.py',
  'scriptType': None,
  'useAbsolutePath': False,
  'arguments': ['--reg_rate', '0.1'],
  'sourceDirectoryDataStore': None,
  'framework': 'Python',
  'communicator': 'None',
  'target': 'local',
  'dataR

Once again, you can get the metrics and outputs from the run.

In [7]:
# Get logged metrics
metrics = run.get_metrics()
for key in metrics.keys():
        print(key, metrics[key])
print('\n')
for file in run.get_file_names():
    print(file)

Regularization Rate 0.1
Accuracy 0.7736666666666666
AUC 0.8483904671874223


azureml-logs/60_control_log.txt
azureml-logs/70_driver_log.txt
logs/azureml/9_azureml.log
outputs/diabetes_model.pkl


## Register A New Version of the Model

Now that you've trained a new model, you can register it as a new version in the workspace.

In [9]:
run.register_model(
    model_path='outputs/diabetes_model.pkl', model_name='diabetes_model',
    tags={'Training context':'Parameterized SKLearn Estimator'},
    properties={'AUC': metrics['AUC'], 'Accuracy': metrics['Accuracy']}
)

for model in core.Model.list(ws):
    print(model.name, 'version:', model.version)
    for tag_name in model.tags:
        tag = model.tags[tag_name]
        print ('\t',tag_name, ':', tag)
    for prop_name in model.properties:
        prop = model.properties[prop_name]
        print ('\t',prop_name, ':', prop)
    print('\n')

diabetes_model version: 5
	 Training context : Parameterized SKLearn Estimator
	 AUC : 0.8483904671874223
	 Accuracy : 0.7736666666666666


diabetes_model version: 4
	 Training context : Parameterized SKLearn Estimator
	 AUC : 0.8483904671874223
	 Accuracy : 0.7736666666666666


diabetes_model version: 3
	 Training context : Estimator
	 AUC : 0.8484929598487486
	 Accuracy : 0.774


diabetes_model version: 2
	 Training context : Estimator
	 AUC : 0.8483377282451863
	 Accuracy : 0.774


diabetes_model version: 1
	 Training context : Estimator
	 AUC : 0.8483377282451863
	 Accuracy : 0.774




## Clean Up

If you've finished exploring, you can close this notebook and shut down your Compute Instance.