# Hyperparameter Tuning using HyperDrive

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

## Dependencies

First we will be importing all the needed dependencies to complete the project.

In [1]:
import logging
import os
import csv

from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
from sklearn import datasets
import pkg_resources
from sklearn.model_selection import train_test_split

import azureml.core
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace

from azureml.core.dataset import Dataset
from azureml.core import Datastore
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

from azureml.data.dataset_factory import TabularDatasetFactory

from azureml.widgets import RunDetails

# Check core SDK version number
print("SDK version:", azureml.core.VERSION)

SDK version: 1.27.0


## Workspace Configuration

In this cell we import the workspace configuration and create an experiment that we will use later.

In [2]:
ws = Workspace.from_config()

print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep = '\n')

# choose a name for experiment
experiment_name = 'hyperdrivecovid'

experiment=Experiment(ws, experiment_name)
experiment

Workspace name: quick-starts-ws-144339
Azure region: southcentralus
Subscription id: 9b72f9e6-56c5-4c16-991b-19c652994860
Resource group: aml-quickstarts-144339


Name,Workspace,Report Page,Docs Page
hyperdrivecovid,quick-starts-ws-144339,Link to Azure Machine Learning studio,Link to Documentation


## Compute Cluster creation
In this cell a cpu cluster is created for running our experiments, it checks if a compute cluster with the same name exists, if it exists then uses it, if not it creates it.

If the cluster does not exists we define the configuration for it. For this project we will be using `min_nodes = 1`, in your own project this will incurr in extra costs, so consider leaving it to 0.

In [3]:
compute_cluster_name = "cpu-cluster"

try:
    compute_target = ComputeTarget(workspace=ws, name=compute_cluster_name)
    print("Found existing compute cluster...")
except:
    print("Creating new compute cluster...")
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D16s_V3', max_nodes=4, min_nodes=1)
    compute_target = ComputeTarget.create(ws, compute_cluster_name, compute_config)
    
compute_target.wait_for_completion(show_output=True)
print("Cluster details: ", compute_target.get_status().serialize())

Found existing compute cluster...
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned
Cluster details:  {'currentNodeCount': 4, 'targetNodeCount': 4, 'nodeStateCounts': {'preparingNodeCount': 3, 'runningNodeCount': 1, 'idleNodeCount': 0, 'unusableNodeCount': 0, 'leavingNodeCount': 0, 'preemptedNodeCount': 0}, 'allocationState': 'Steady', 'allocationStateTransitionTime': '2021-05-10T05:34:28.241000+00:00', 'errors': None, 'creationTime': '2021-05-10T01:43:20.670500+00:00', 'modifiedTime': '2021-05-10T01:43:36.013311+00:00', 'provisioningState': 'Succeeded', 'provisioningStateTransitionTime': None, 'scaleSettings': {'minNodeCount': 1, 'maxNodeCount': 4, 'nodeIdleTimeBeforeScaleDown': 'PT120S'}, 'vmPriority': 'Dedicated', 'vmSize': 'STANDARD_D16S_V3'}


## Hyperdrive Configuration

TODO: Explain the model you are using and the reason for chosing the different hyperparameters, termination policy and config settings.

In [4]:
from azureml.widgets import RunDetails
from azureml.train.sklearn import SKLearn
from azureml.train.hyperdrive.run import PrimaryMetricGoal
from azureml.train.hyperdrive.policy import BanditPolicy
from azureml.train.hyperdrive.sampling import RandomParameterSampling
from azureml.train.hyperdrive.runconfig import HyperDriveConfig
from azureml.train.hyperdrive.parameter_expressions import choice, uniform
import os

#TODO: Create the different params that you will be using during training
#param_sampling = <your params here>

param_sampling = RandomParameterSampling({
    "--C" : choice(0.01, 0.1, 1),
    "--max_iter" : choice(20, 40, 60, 80, 100, 120, 140, 160, 180, 200)
})

# TODO: Create an early termination policy. This is not required if you are using Bayesian sampling.
#early_termination_policy = <your policy here>
early_termination_policy = BanditPolicy(slack_factor = 0.1, evaluation_interval=1, delay_evaluation=5)

if "training" not in os.listdir():
    os.mkdir("./training")

#TODO: Create your estimator and hyperdrive config
#estimator = <your estimator here>
estimator = SKLearn(source_directory = "./",
            compute_target=compute_target,
            vm_size='STANDARD_D16s_V3',
            entry_script="train.py"
            )

#hyperdrive_run_config = <your config here?
hyperdrive_config = HyperDriveConfig(hyperparameter_sampling = param_sampling,
                                     primary_metric_name = "Accuracy",
                                     primary_metric_goal = PrimaryMetricGoal.MAXIMIZE,
                                     max_total_runs = 20,
                                     max_concurrent_runs = 4,
                                     policy = early_termination_policy,
                                     estimator = estimator)

'SKLearn' estimator is deprecated. Please use 'ScriptRunConfig' from 'azureml.core.script_run_config' with your own defined environment or the AzureML-Tutorial curated environment.
'enabled' is deprecated. Please use the azureml.core.runconfig.DockerConfiguration object with the 'use_docker' param instead.


## Submit the experiment and Run Details
In this cell we submit the experiment, with `show_output = True` to see the run logs in real time, and we use the `RunDetails` widget to show the different experiments.

In [5]:
#Submitting the experiment
hyperdrive_run = experiment.submit(hyperdrive_config)

RunDetails(hyperdrive_run).show()

hyperdrive_run.get_status()

hyperdrive_run.wait_for_completion(show_output=True)



_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'â€¦

RunId: HD_1eb00549-813a-4866-acf8-74ca59f366b0
Web View: https://ml.azure.com/runs/HD_1eb00549-813a-4866-acf8-74ca59f366b0?wsid=/subscriptions/9b72f9e6-56c5-4c16-991b-19c652994860/resourcegroups/aml-quickstarts-144339/workspaces/quick-starts-ws-144339&tid=660b3398-b80e-49d2-bc5b-ac1dc93b5254

Streaming azureml-logs/hyperdrive.txt

"<START>[2021-05-10T05:35:57.251339][API][INFO]Experiment created<END>\n""<START>[2021-05-10T05:35:57.731253][GENERATOR][INFO]Trying to sample '4' jobs from the hyperparameter space<END>\n""<START>[2021-05-10T05:35:57.899288][GENERATOR][INFO]Successfully sampled '4' jobs, they will soon be submitted to the execution target.<END>\n"

Execution Summary
RunId: HD_1eb00549-813a-4866-acf8-74ca59f366b0
Web View: https://ml.azure.com/runs/HD_1eb00549-813a-4866-acf8-74ca59f366b0?wsid=/subscriptions/9b72f9e6-56c5-4c16-991b-19c652994860/resourcegroups/aml-quickstarts-144339/workspaces/quick-starts-ws-144339&tid=660b3398-b80e-49d2-bc5b-ac1dc93b5254



{'runId': 'HD_1eb00549-813a-4866-acf8-74ca59f366b0',
 'target': 'cpu-cluster',
 'status': 'Completed',
 'startTimeUtc': '2021-05-10T05:35:57.061735Z',
 'endTimeUtc': '2021-05-10T05:43:32.092509Z',
 'properties': {'primary_metric_config': '{"name": "Accuracy", "goal": "maximize"}',
  'resume_from': 'null',
  'runTemplate': 'HyperDrive',
  'azureml.runsource': 'hyperdrive',
  'platform': 'AML',
  'ContentSnapshotId': '5322bb2e-1795-4103-864e-3ecd6b33f063',
  'score': '0.6308778321757942',
  'best_child_run_id': 'HD_1eb00549-813a-4866-acf8-74ca59f366b0_18',
  'best_metric_status': 'Succeeded'},
 'inputDatasets': [],
 'outputDatasets': [],
 'logFiles': {'azureml-logs/hyperdrive.txt': 'https://mlstrg144339.blob.core.windows.net/azureml/ExperimentRun/dcid.HD_1eb00549-813a-4866-acf8-74ca59f366b0/azureml-logs/hyperdrive.txt?sv=2019-02-02&sr=b&sig=AE9wcuXJjCEMogk%2Fw1UbsJjVcjpXAAEgsdTJpjeowmE%3D&st=2021-05-10T05%3A33%3A38Z&se=2021-05-10T13%3A43%3A38Z&sp=r'},
 'submittedBy': 'ODL_User 144339'}

## Best Model

In the cell below, we get the best model from the hyperdrive experiments and display all the properties of it.

In [6]:
#Saving the best model
best_run = hyperdrive_run.get_best_run_by_primary_metric()
best_run_metrics = best_run.get_metrics()

print('Best Run Id: ', best_run.id)
print('\n Accuracy:', best_run_metrics['Accuracy'])
print('\n Regularization Strength:',best_run_metrics['Regularization Strength:'])
print('\n Max Iterations:',best_run_metrics['Max iterations:'])

Best Run Id:  HD_1eb00549-813a-4866-acf8-74ca59f366b0_18

 Accuracy: 0.6308778321757942

 Regularization Strength: 0.1

 Max Iterations: 20
