# Hyperparameter Tuning using HyperDrive

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [6]:
from azureml.core import Workspace, Experiment
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.widgets import RunDetails
from azureml.train.sklearn import SKLearn
from azureml.train.hyperdrive.run import PrimaryMetricGoal
from azureml.train.hyperdrive.policy import BanditPolicy
from azureml.train.hyperdrive.sampling import RandomParameterSampling
from azureml.train.hyperdrive.runconfig import HyperDriveConfig
from azureml.train.hyperdrive.parameter_expressions import uniform, choice
import os
import joblib
from azureml.core.dataset import Dataset
from azureml.data.dataset_factory import TabularDatasetFactory

## Dataset

TODO: Get data. In the cell below, write code to access the data you will be using in this project. Remember that the dataset needs to be external.

In [7]:
ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\n')
experiment_name = 'hyperdrive-heart-experiment'

experiment=Experiment(ws, experiment_name)
experiment
run = experiment.start_logging()

quick-starts-ws-147669
aml-quickstarts-147669
southcentralus
48a74bb7-9950-4cc1-9caa-5d50f995cc55


In [8]:
example_dataset = 'https://raw.githubusercontent.com/santosh-gatech/nd00333-capstone/master/starter_file/heart_failure_clinical_records_dataset.csv'
dataset = Dataset.Tabular.from_delimited_files(example_dataset)

In [9]:
# NOTE: update the cluster name to match the existing cluster
# Choose a name for your CPU cluster
amlcompute_cluster_name = "cpu-cluster-exp"

# Verify that cluster does not exist already
try:
    compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',# for GPU, use "STANDARD_NC6"
                                                           #vm_priority = 'lowpriority', # optional
                                                           max_nodes=4)
    compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)

compute_target.wait_for_completion(show_output=True)

Found existing cluster, use it.
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


## Hyperdrive Configuration

TODO: Explain the model you are using and the reason for chosing the different hyperparameters, termination policy and config settings.

In [14]:
# TODO: Create an early termination policy. This is not required if you are using Bayesian sampling.
early_termination_policy = BanditPolicy(evaluation_interval=2, slack_factor=0.1)

#TODO: Create the different params that you will be using during training
param_sampling = RandomParameterSampling(
    {
        "--C": uniform(0.5,1),
        "max_iter": choice(50, 100)
    }
)

#TODO: Create your estimator and hyperdrive config
# a SKLearn estimator for use with train.py
est = SKLearn(source_directory='./', compute_target=compute_target, entry_script='train.py')

hyperdrive_run_config = HyperDriveConfig(
    estimator=est,
    hyperparameter_sampling=param_sampling,
    policy=early_termination_policy,
    primary_metric_name='Accuracy',
    primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
    max_total_runs=5,
    max_concurrent_runs=4
)



In [15]:
#TODO: Submit your experiment
hyperdrive_run = experiment.submit(config=hyperdrive_run_config, show_output=True)



## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [16]:
RunDetails(hyperdrive_run).show()

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

In [17]:
hyperdrive_run.wait_for_completion(show_output=True)

RunId: HD_789b4851-cee5-4750-af95-a0233d5dff00
Web View: https://ml.azure.com/runs/HD_789b4851-cee5-4750-af95-a0233d5dff00?wsid=/subscriptions/48a74bb7-9950-4cc1-9caa-5d50f995cc55/resourcegroups/aml-quickstarts-147669/workspaces/quick-starts-ws-147669&tid=660b3398-b80e-49d2-bc5b-ac1dc93b5254

Streaming azureml-logs/hyperdrive.txt

"<START>[2021-06-22T08:15:04.624942][API][INFO]Experiment created<END>\n""<START>[2021-06-22T08:15:05.113045][GENERATOR][INFO]Trying to sample '4' jobs from the hyperparameter space<END>\n""<START>[2021-06-22T08:15:05.282401][GENERATOR][INFO]Successfully sampled '4' jobs, they will soon be submitted to the execution target.<END>\n"

Execution Summary
RunId: HD_789b4851-cee5-4750-af95-a0233d5dff00
Web View: https://ml.azure.com/runs/HD_789b4851-cee5-4750-af95-a0233d5dff00?wsid=/subscriptions/48a74bb7-9950-4cc1-9caa-5d50f995cc55/resourcegroups/aml-quickstarts-147669/workspaces/quick-starts-ws-147669&tid=660b3398-b80e-49d2-bc5b-ac1dc93b5254



{'runId': 'HD_789b4851-cee5-4750-af95-a0233d5dff00',
 'target': 'cpu-cluster-exp',
 'status': 'Completed',
 'startTimeUtc': '2021-06-22T08:15:04.430662Z',
 'endTimeUtc': '2021-06-22T08:26:08.91267Z',
 'properties': {'primary_metric_config': '{"name": "Accuracy", "goal": "maximize"}',
  'resume_from': 'null',
  'runTemplate': 'HyperDrive',
  'azureml.runsource': 'hyperdrive',
  'platform': 'AML',
  'ContentSnapshotId': 'ab92918d-692c-4b46-97b0-11fd7654be04',
  'score': '0.8533333333333334',
  'best_child_run_id': 'HD_789b4851-cee5-4750-af95-a0233d5dff00_1',
  'best_metric_status': 'Succeeded'},
 'inputDatasets': [],
 'outputDatasets': [],
 'logFiles': {'azureml-logs/hyperdrive.txt': 'https://mlstrg147669.blob.core.windows.net/azureml/ExperimentRun/dcid.HD_789b4851-cee5-4750-af95-a0233d5dff00/azureml-logs/hyperdrive.txt?sv=2019-02-02&sr=b&sig=lwWY2t77mKdpQpfBjBtOz2YKL9qhAOk7j3L4fBo1nGI%3D&st=2021-06-22T08%3A16%3A28Z&se=2021-06-22T16%3A26%3A28Z&sp=r'},
 'submittedBy': 'ODL_User 147669'}

## Best Model

TODO: In the cell below, get the best model from the hyperdrive experiments and display all the properties of the model.

In [18]:
best_hd_model = hyperdrive_run.get_best_run_by_primary_metric()
best_hd_model_metrics = best_hd_model.get_metrics()
best_hd_model_parameter_values = best_hd_model.get_details()['runDefinition']['arguments']
print(best_hd_model_metrics, best_hd_model_parameter_values)
print(best_hd_model.get_file_names())

{'Regularization Strength:': 0.6889751349251101, 'Max iterations:': 50, 'Accuracy': 0.8533333333333334} ['--C', '0.6889751349251101', '--max_iter', '50']
['azureml-logs/55_azureml-execution-tvmps_d8cf8e6e16cec9f426cd46893def5b54d33e2469ad683c6d0ea3d07a2f729fee_d.txt', 'azureml-logs/65_job_prep-tvmps_d8cf8e6e16cec9f426cd46893def5b54d33e2469ad683c6d0ea3d07a2f729fee_d.txt', 'azureml-logs/70_driver_log.txt', 'azureml-logs/75_job_post-tvmps_d8cf8e6e16cec9f426cd46893def5b54d33e2469ad683c6d0ea3d07a2f729fee_d.txt', 'azureml-logs/process_info.json', 'azureml-logs/process_status.json', 'logs/azureml/105_azureml.log', 'logs/azureml/job_prep_azureml.log', 'logs/azureml/job_release_azureml.log', 'outputs/hd_model.joblib']


In [19]:
#TODO: Save the best model
model = best_hd_model.register_model(
    model_name='best_hyperdrive_model', 
    model_path='outputs/hd_model.joblib', 
    properties={'Accuracy': best_hd_model_metrics['Accuracy']}, 
    tags={'Method': 'Hyperdrive'})

In [20]:
model

Model(workspace=Workspace.create(name='quick-starts-ws-147669', subscription_id='48a74bb7-9950-4cc1-9caa-5d50f995cc55', resource_group='aml-quickstarts-147669'), name=best_hyperdrive_model, id=best_hyperdrive_model:1, version=1, tags={'Method': 'Hyperdrive'}, properties={'Accuracy': '0.8533333333333334'})