# Hyperparameter Tuning using HyperDrive



In [1]:
# Import Dependencies

from azureml.core import Workspace, Experiment
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.core import Environment
from azureml.core import ScriptRunConfig
from azureml.widgets import RunDetails
from azureml.train.sklearn import SKLearn 
from azureml.train.hyperdrive.run import PrimaryMetricGoal
from azureml.train.hyperdrive.policy import BanditPolicy
from azureml.train.hyperdrive.sampling import RandomParameterSampling
from azureml.train.hyperdrive.runconfig import HyperDriveConfig
from azureml.train.hyperdrive.parameter_expressions import uniform, choice
import joblib
import os

## Dataset

### Overview

The dataset that we will be using for this project is the [Heart Failure Prediction](https://www.kaggle.com/andrewmvd/heart-failure-clinical-data) dataset from Kaggle. 

Heart failure is a common event caused by CVDs and this dataset contains 12 features that can be used to predict mortality by heart failure.

People with cardiovascular disease or who are at high cardiovascular risk need early detection and management wherein a machine learning model can be of great help.

**12 clinical features:**

* age - Age

* anaemia - Decrease of red blood cells or hemoglobin (boolean)

* creatinine_phosphokinase - Level of the CPK enzyme in the blood (mcg/L)

* diabetes - If the patient has diabetes (boolean)

* ejection_fraction - Percentage of blood leaving the heart at each contraction (percentage)

* high_blood_pressure - If the patient has hypertension (boolean)
  
* platelets - Platelets in the blood (kiloplatelets/mL)

* serum_creatinine - Level of serum creatinine in the blood (mg/dL)

* serum_sodium - Level of serum sodium in the blood (mEq/L)
  
* sex - Woman or man (binary)
  
* smoking - If the patient smokes or not (boolean)

* time - Follow-up period (days)

In this project we will use Logistic Regression and tune its hyperparameters using HyperDrive to make prediction on the death event based on the above mentioned clinical features.


In [2]:
ws = Workspace.from_config()

experiment_name = 'new-experiment'

experiment = Experiment(ws, experiment_name)


In [3]:
print('Workspace name: '+ ws.name,
     'Azure region: '+ ws.location,
      'Subscription id: '+ ws.subscription_id,
     'Resource group: '+ ws.resource_group, sep="\n")

run = experiment.start_logging()

Workspace name: quick-starts-ws-139012
Azure region: southcentralus
Subscription id: f5091c60-1c3c-430f-8d81-d802f6bf2414
Resource group: aml-quickstarts-139012


In [4]:
# Create compute cluster

cpu_cluster_name = "new-compute"

#Verify that the cluster does not exist already
try:
    cpu_cluster = ComputeTarget(workspace = ws, name = cpu_cluster_name)
    print("Found existing cluster. Use it")
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', max_nodes =4)
    cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)
    
cpu_cluster.wait_for_completion(show_output=True)

Creating
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


## Create an environment

In [5]:
%%writefile conda_dependencies.yml

dependencies:
- python=3.6.2
- scikit-learn
- pip:
  - azureml-defaults

Writing conda_dependencies.yml


In [6]:
sklearn_env = Environment.from_conda_specification(name = 'sklearn-env', file_path = './conda_dependencies.yml')

## Hyperdrive Configuration

* `early_termination_policy` : 
 *Automatically terminate poorly performing runs with an early termination policy. Early termination improves computational efficiency.*
 <nl>
 Here Bandit Policy is used as it terminates runs where the primary metric is not within the specified slack factor/slack amount compared to the best performing run thus being highly compute saving.

* `param_sampling` : 
 *Specify the parameter sampling method to use over the hyperparameter space.*
 <nl>
 Here random sampling is used in which hyperparameter values are randomly selected from the defined search space and it It supports early termination of low-performance runs.


 

In [7]:
# Create an early termination policy. 
early_termination_policy = BanditPolicy(slack_factor=0.1, evaluation_interval = 2, delay_evaluation=5)


# Create the different params that we will be using during training
param_sampling = RandomParameterSampling({'--C' : uniform(0.01,100),
                                        '--max_iter': choice(16,32,64,128,256)})

if "training" not in os.listdir():
    os.mkdir("./training")

# Create the estimator and hyperdrive config
estimator = ScriptRunConfig(source_directory=os.path.join('./'), 
                            compute_target=cpu_cluster_name, 
                            script='train.py', 
                            environment=sklearn_env)

hyperdrive_run_config = HyperDriveConfig(hyperparameter_sampling = param_sampling,
                                         primary_metric_name = "Accuracy", 
                                         primary_metric_goal = PrimaryMetricGoal.MAXIMIZE, 
                                         max_total_runs = 25, 
                                         max_concurrent_runs=4, 
                                         policy=early_termination_policy, 
                                         run_config=estimator)

In [8]:
#Submit your experiment

hyperdrive_run  = experiment.submit(config = hyperdrive_run_config)

## Run Details

The `RunDetails` widget shows the different experiments.

In [9]:
RunDetails(hyperdrive_run).show()
hyperdrive_run.wait_for_completion(show_output =True)

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

RunId: HD_e8fbb7f3-8d1f-4b3b-af23-5e7a524bb26b
Web View: https://ml.azure.com/experiments/new-experiment/runs/HD_e8fbb7f3-8d1f-4b3b-af23-5e7a524bb26b?wsid=/subscriptions/f5091c60-1c3c-430f-8d81-d802f6bf2414/resourcegroups/aml-quickstarts-139012/workspaces/quick-starts-ws-139012

Streaming azureml-logs/hyperdrive.txt

"<START>[2021-02-16T10:14:29.861447][API][INFO]Experiment created<END>\n""<START>[2021-02-16T10:14:30.302438][GENERATOR][INFO]Trying to sample '4' jobs from the hyperparameter space<END>\n"<START>[2021-02-16T10:14:30.7444803Z][SCHEDULER][INFO]The execution environment is being prepared. Please be patient as it can take a few minutes.<END>"<START>[2021-02-16T10:14:30.459296][GENERATOR][INFO]Successfully sampled '4' jobs, they will soon be submitted to the execution target.<END>\n"

Execution Summary
RunId: HD_e8fbb7f3-8d1f-4b3b-af23-5e7a524bb26b
Web View: https://ml.azure.com/experiments/new-experiment/runs/HD_e8fbb7f3-8d1f-4b3b-af23-5e7a524bb26b?wsid=/subscriptions/f5091c6

{'runId': 'HD_e8fbb7f3-8d1f-4b3b-af23-5e7a524bb26b',
 'target': 'new-compute',
 'status': 'Completed',
 'startTimeUtc': '2021-02-16T10:14:29.703809Z',
 'endTimeUtc': '2021-02-16T10:35:21.807294Z',
 'properties': {'primary_metric_config': '{"name": "Accuracy", "goal": "maximize"}',
  'resume_from': 'null',
  'runTemplate': 'HyperDrive',
  'azureml.runsource': 'hyperdrive',
  'platform': 'AML',
  'ContentSnapshotId': '4f7b77da-59eb-4909-991f-a88c2cd88d82',
  'score': '0.8',
  'best_child_run_id': 'HD_e8fbb7f3-8d1f-4b3b-af23-5e7a524bb26b_0',
  'best_metric_status': 'Succeeded'},
 'inputDatasets': [],
 'outputDatasets': [],
 'logFiles': {'azureml-logs/hyperdrive.txt': 'https://mlstrg139012.blob.core.windows.net/azureml/ExperimentRun/dcid.HD_e8fbb7f3-8d1f-4b3b-af23-5e7a524bb26b/azureml-logs/hyperdrive.txt?sv=2019-02-02&sr=b&sig=0Qj0gFCZNk07RQA%2BLjae5dXYA2rG26Hvo8Yd1sIqM9g%3D&st=2021-02-16T10%3A25%3A37Z&se=2021-02-16T18%3A35%3A37Z&sp=r'},
 'submittedBy': 'ODL_User 139012'}

## Best Model

The best model from the hyperdrive experiments and all the properties of the model.

In [10]:
best_hyperdrive_run = hyperdrive_run.get_best_run_by_primary_metric()
best_run_metrics = best_hyperdrive_run.get_metrics()
best_hyperdrive_run

Experiment,Id,Type,Status,Details Page,Docs Page
new-experiment,HD_e8fbb7f3-8d1f-4b3b-af23-5e7a524bb26b_0,azureml.scriptrun,Completed,Link to Azure Machine Learning studio,Link to Documentation


In [11]:
print('Best Run Id: ', best_hyperdrive_run.id)
print('\nAccuracy: ', best_run_metrics['Accuracy'])

Best Run Id:  HD_e8fbb7f3-8d1f-4b3b-af23-5e7a524bb26b_0

Accuracy:  0.8


In [16]:
# Save the best model
model = best_hyperdrive_run.register_model(model_name='best_hyperdrive_model', model_path='outputs/model.joblib', 
                                tags = {'Training context':'Hyper Drive'},
                                properties={'Accuracy': best_run_metrics['Accuracy']})