# Hyperparameter Tuning using HyperDrive

In [1]:
import logging
import os
import csv

from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
from sklearn import datasets
import pkg_resources
import azureml.core
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.train.automl import AutoMLConfig
from azureml.core.dataset import Dataset
from azureml.core.datastore import Datastore

from azureml.pipeline.steps import AutoMLStep

# Check core SDK version number
print("SDK version:", azureml.core.VERSION)

SDK version: 1.43.0


## Dataset

In [2]:
ws = Workspace.from_config()

# choose a name for experiment
experiment_name = 'capstone-experiment'
experiment=Experiment(ws, experiment_name)

## Hyperdrive Configuration

I opted to use a simple LogisticRegression model for this experiment as it fit the problem of binary classification well. The hyperparameters that were tuned for the model were `C` and `max_iter`, which are the regularization strength and the maximum number of iterations, respectively.

For the parameter sampling method, RandomParameterSampling was used in order to reduce compute resource usage that would have been used up by a Grid Search to only give a slightly improved model. The range of values that I set for `C` was between 0.001 and 100.0, while the values I selected for `max_iter` were specifically 10, 50, 100, 250, 500, and 1000. I decided to use uniform sampling for `C` and choice sampling for `max_iter` in order to ensure that I was getting a decent combination of hyperparameter pairs to lead to better optimization.

For the termination policy, I used the BanditPolicy in order to terminate any run that does not perform as well as the best performing run based on the slack factor and evaluation interval. This saves both time and resources as poorly performing models that will not perform better than the current best performing model will be terminated accordingly.

In [3]:
from azureml.core.compute import AmlCompute, ComputeTarget
from azureml.core.compute_target import ComputeTargetException

# The name of the CPU cluster to use
amlcompute_cluster_name = "rlegge-compute-cluster"

# Verify that the cluster does not exist already
try:
    compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)
    print(f'Found existing cluster with name: {amlcompute_cluster_name}, will use it')
except ComputeTargetException:
    print(f'Compute cluster with name: {amlcompute_cluster_name} not found, will create it')
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS3_V2', max_nodes=4)
    compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)

compute_target.wait_for_completion(show_output=True, min_node_count=1, timeout_in_minutes=10)

Found existing cluster with name: rlegge-compute-cluster, will use it
Succeeded.....................................................................................................................
AmlCompute wait for completion finished

Wait timeout has been reached
Current provisioning state of AmlCompute is "Succeeded" and current node count is "0"


In [68]:
from azureml.widgets import RunDetails
from azureml.train.sklearn import SKLearn
from azureml.train.hyperdrive.run import PrimaryMetricGoal
from azureml.train.hyperdrive.policy import BanditPolicy
from azureml.train.hyperdrive.sampling import RandomParameterSampling
from azureml.train.hyperdrive.runconfig import HyperDriveConfig
from azureml.train.hyperdrive.parameter_expressions import choice, uniform
from azureml.core import Environment, ScriptRunConfig
import os

early_termination_policy = BanditPolicy(evaluation_interval=1, slack_factor=0.1)

#TODO: Create the different params that you will be using during training
param_sampling = RandomParameterSampling(
    {
        'C': uniform(0.001, 100.0),
        'max_iter': choice(10, 50, 100, 250, 500, 1000)
    })

#TODO: Create your estimator and hyperdrive config
sklearn_env = Environment.from_conda_specification(name='sklearn-env', file_path='conda_dependencies.yml')
src = ScriptRunConfig(source_directory='.',
                     script='train.py',
                     compute_target=compute_target,
                     environment=sklearn_env)

hyperdrive_run_config = HyperDriveConfig(hyperparameter_sampling=param_sampling,
                                        primary_metric_name='AUC_weighted',
                                        primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
                                        max_total_runs=36,
                                        max_concurrent_runs=4,
                                        policy=early_termination_policy,
                                        run_config=src)

In [69]:
# Submit the hyperdrive run to the experiment
hyperdrive_run = experiment.submit(config=hyperdrive_run_config)

## Run Details

Unlike with AutoML, all of these models use the same algorithm, which is a LogisticRegression. The main difference is the specification for parameter sampling. In this case, I used RandomParameterSampling as it has been found to be just as effective as GridSearchSampling and uses significantly fewer resources. By randomly selecting the values for C (regularization strength) and max_iter (maximum number of iterations), we are able to train multiple models while tuning the hyperparameters to find the optimal values to return the highest performing model.

In [70]:
RunDetails(hyperdrive_run).show()
hyperdrive_run.wait_for_completion(show_output = True)

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

RunId: HD_b689b0dd-ee11-41c2-b127-23d6122a2dc7
Web View: https://ml.azure.com/runs/HD_b689b0dd-ee11-41c2-b127-23d6122a2dc7?wsid=/subscriptions/a655a150-1940-4a0c-91bb-d29b336c52aa/resourcegroups/rlegge-rg/workspaces/test&tid=db05faca-c82a-4b9d-b9c5-0f64b6755421

Streaming azureml-logs/hyperdrive.txt

[2022-08-12T06:38:45.847790][GENERATOR][INFO]Trying to sample '4' jobs from the hyperparameter space
[2022-08-12T06:38:46.4231694Z][SCHEDULER][INFO]Scheduling job, id='HD_b689b0dd-ee11-41c2-b127-23d6122a2dc7_0' 
[2022-08-12T06:38:46.5700933Z][SCHEDULER][INFO]Scheduling job, id='HD_b689b0dd-ee11-41c2-b127-23d6122a2dc7_1' 
[2022-08-12T06:38:46.636249][GENERATOR][INFO]Successfully sampled '4' jobs, they will soon be submitted to the execution target.
[2022-08-12T06:38:46.7195593Z][SCHEDULER][INFO]Scheduling job, id='HD_b689b0dd-ee11-41c2-b127-23d6122a2dc7_3' 
[2022-08-12T06:38:46.7208487Z][SCHEDULER][INFO]Scheduling job, id='HD_b689b0dd-ee11-41c2-b127-23d6122a2dc7_2' 
[2022-08-12T06:38:47.149

{'runId': 'HD_b689b0dd-ee11-41c2-b127-23d6122a2dc7',
 'target': 'rlegge-compute-cluster',
 'status': 'Completed',
 'startTimeUtc': '2022-08-12T06:38:45.298347Z',
 'endTimeUtc': '2022-08-12T06:48:19.209725Z',
 'services': {},
 'properties': {'primary_metric_config': '{"name":"AUC_weighted","goal":"maximize"}',
  'resume_from': 'null',
  'runTemplate': 'HyperDrive',
  'azureml.runsource': 'hyperdrive',
  'platform': 'AML',
  'ContentSnapshotId': 'e8536cf2-1046-46dc-bb9d-f5d92eaff426',
  'user_agent': 'python/3.8.13 (Linux-5.15.0-1014-azure-x86_64-with-glibc2.17) msrest/0.6.21 Hyperdrive.Service/1.0.0 Hyperdrive.SDK/core.1.43.0',
  'space_size': 'infinite_space_size',
  'score': '0.9157469751872698',
  'best_child_run_id': 'HD_b689b0dd-ee11-41c2-b127-23d6122a2dc7_31',
  'best_metric_status': 'Succeeded',
  'best_data_container_id': 'dcid.HD_b689b0dd-ee11-41c2-b127-23d6122a2dc7_31'},
 'inputDatasets': [],
 'outputDatasets': [],
 'runDefinition': {'configuration': None,
  'attribution': Non

## Best Model

In [92]:
best_run = hyperdrive_run.get_best_run_by_primary_metric()
print(best_run.get_details())

{'runId': 'HD_b689b0dd-ee11-41c2-b127-23d6122a2dc7_31', 'target': 'rlegge-compute-cluster', 'status': 'Completed', 'startTimeUtc': '2022-08-12T06:45:51.841233Z', 'endTimeUtc': '2022-08-12T06:46:07.357754Z', 'services': {}, 'properties': {'_azureml.ComputeTargetType': 'amlctrain', 'ContentSnapshotId': 'e8536cf2-1046-46dc-bb9d-f5d92eaff426', 'ProcessInfoFile': 'azureml-logs/process_info.json', 'ProcessStatusFile': 'azureml-logs/process_status.json'}, 'inputDatasets': [], 'outputDatasets': [], 'runDefinition': {'script': 'train.py', 'command': '', 'useAbsolutePath': False, 'arguments': ['--C', '17.02299055359339', '--max_iter', '250'], 'sourceDirectoryDataStore': None, 'framework': 'Python', 'communicator': 'None', 'target': 'rlegge-compute-cluster', 'dataReferences': {}, 'data': {}, 'outputData': {}, 'datacaches': [], 'jobName': None, 'maxRunDurationSeconds': 2592000, 'nodeCount': 1, 'instanceTypes': [], 'priority': None, 'credentialPassthrough': False, 'identity': None, 'environment': {

In [93]:
print(f'Best Run ID: {best_run.id}')
print(f'Metrics: {best_run.get_metrics()}')

Best Run ID: HD_b689b0dd-ee11-41c2-b127-23d6122a2dc7_31
Metrics: {'C': 17.02299055359339, 'max_iter': 250, 'AUC_weighted': 0.9157469751872698}


## Model Deployment

Remember you have to deploy only one of the two models you trained but you still need to register both the models. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

TODO: In the cell below, send a request to the web service you deployed to test it.

TODO: In the cell below, print the logs of the web service and delete the service

**Submission Checklist**
- I have registered the model.
- I have deployed the model with the best accuracy as a webservice.
- I have tested the webservice by sending a request to the model endpoint.
- I have deleted the webservice and shutdown all the computes that I have used.
- I have taken a screenshot showing the model endpoint as active.
- The project includes a file containing the environment details.

