# Hyperparameter Tuning using HyperDrive

In [1]:
from azureml.core import Workspace, Experiment
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.widgets import RunDetails
from azureml.train.sklearn import SKLearn
from azureml.train.hyperdrive.run import PrimaryMetricGoal
from azureml.train.hyperdrive.policy import BanditPolicy
from azureml.train.hyperdrive.sampling import RandomParameterSampling
from azureml.train.hyperdrive.runconfig import HyperDriveConfig
from azureml.train.hyperdrive.parameter_expressions import choice, uniform
import os
import shutil
import joblib

## Dataset

The dataset used for this project is the [Heart Failure Prediction dataset](https://www.kaggle.com/andrewmvd/heart-failure-clinical-data) taken from Kaggle.

This dataset contains 12 features that can be used to predict mortality by heart failure:
- age: Age of the patient 
- amaemia: Decrease of red blood cells or hemoglobin 
- creatinine_phosphokinase: Level of the CPK enzyme in the blood (mcg/L)
- diabetes: If the patient has diabetes
- ejection_fraction: Percentage of blood leaving the heart at each contraction 
- high_blood_pressure: If the patient has hypertension
- platelets: Platelets in the blood (kiloplatelets/mL)
- serum_creatinine: Level of serum creatinine in the blood (mg/dL)
- serum_sodium: Level of serum sodium in the blood (mEq/L)
- sex: Woman or man
- smoking: If the patient smokes or not
- time: Follow-up period (days)

The target column is DEATH_EVENT which tells if the patient deceased during the follow-up period

In [2]:
ws = Workspace.from_config()
experiment_name = 'heart-failure-hyperdrive'

experiment=Experiment(ws, experiment_name)

print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep = '\n')

run = experiment.start_logging()

Workspace name: quick-starts-ws-138271
Azure region: southcentralus
Subscription id: cdbe0b43-92a0-4715-838a-f2648cc7ad21
Resource group: aml-quickstarts-138271


In [3]:
#Create compute cluster
compute_cluster_name= "my-compute"

#Check if compute cluster already exists
try:
    compute_cluster=ComputeTarget(workspace=ws, name=compute_cluster_name)
    print("Found existing cluster, use it...")
except ComputeTargetException:
    print("Creating new cluster...")
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',max_nodes=5)
    compute_cluster = ComputeTarget.create(ws, compute_cluster_name, compute_config)
    
compute_cluster.wait_for_completion(show_output=True)

Found existing cluster, use it...
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


## Hyperdrive Configuration

The Scikit-learn model is a Logistic Regression Model which is trained using a custom coded script- train.py. The dataset is fetched from a url as a TabularDataset. The hyperparameters chosen for the Scikit-learn model are regularization strength (C) and max iterations (max_iter). The trained model is scored against 20% data selected from the original dataset.

The hyperparameter tuning using HyperDrive requires several steps- Defining parameter search space, defining a sampling method, choosing a primary metric to optimize and selecting an early stopping policy.

The parameter sampling method used for this project is Random Sampling. It randomly selects the best hyperparameters for the model, that way the entire search space does not need to be searched. The random sampling method saves on time and is a lot faster than grid sampling and bayesian sampling which are recommended only if you have budget to explore the entire search space

The early stopping policy used in this project is Bandit Policy which is based on a slack factor (0.1 in this case) and an evaluation interval (1 in this case). This policy terminates runs where the primary metric is not within the specified slack factor as compared to the best performing run. This would save on time and resources as runs which won't potentially lead to good results would be terminated early.

In [4]:
# Create an early termination policy.
early_termination_policy = BanditPolicy(
    evaluation_interval=1,
    slack_factor= 0.1
)

# Create the different params that will be needed during training
param_sampling = RandomParameterSampling(
    {
        "--C": uniform(0.001, 100),
        "--max_iter": choice(50, 75, 100, 125, 150)
    }
)

if "training" not in os.listdir():
    os.mkdir("./training")
    
script_folder = './training'
os.makedirs(script_folder, exist_ok=True)

shutil.copy('./train.py', script_folder)

# Create estimator and hyperdrive config
estimator = SKLearn(
    source_directory= script_folder,
    compute_target= compute_cluster,
    entry_script= "train.py",
    vm_size="Standard_D2_V2",
    vm_priority="lowpriority"
)

hyperdrive_run_config = HyperDriveConfig(
    estimator=estimator,
    hyperparameter_sampling= param_sampling,
    policy= early_termination_policy,
    primary_metric_name= "Accuracy",
    primary_metric_goal= PrimaryMetricGoal.MAXIMIZE,
    max_total_runs=20,
    max_concurrent_runs=5
)

'SKLearn' estimator is deprecated. Please use 'ScriptRunConfig' from 'azureml.core.script_run_config' with your own defined environment or the AzureML-Tutorial curated environment.


In [5]:
# Submit the experiment
hyperdrive_run=experiment.submit(config=hyperdrive_run_config)



## Run Details

In [6]:
RunDetails(hyperdrive_run).show()

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

In [7]:
hyperdrive_run.wait_for_completion(show_output= True)

RunId: HD_f8e85456-bafa-4f60-9cdb-eb7694de738d
Web View: https://ml.azure.com/experiments/heart-failure-hyperdrive/runs/HD_f8e85456-bafa-4f60-9cdb-eb7694de738d?wsid=/subscriptions/cdbe0b43-92a0-4715-838a-f2648cc7ad21/resourcegroups/aml-quickstarts-138271/workspaces/quick-starts-ws-138271

Streaming azureml-logs/hyperdrive.txt

"<START>[2021-02-09T17:53:43.252672][API][INFO]Experiment created<END>\n""<START>[2021-02-09T17:53:43.775449][GENERATOR][INFO]Trying to sample '5' jobs from the hyperparameter space<END>\n""<START>[2021-02-09T17:53:43.932022][GENERATOR][INFO]Successfully sampled '5' jobs, they will soon be submitted to the execution target.<END>\n"<START>[2021-02-09T17:53:44.3045579Z][SCHEDULER][INFO]The execution environment is being prepared. Please be patient as it can take a few minutes.<END>

Execution Summary
RunId: HD_f8e85456-bafa-4f60-9cdb-eb7694de738d
Web View: https://ml.azure.com/experiments/heart-failure-hyperdrive/runs/HD_f8e85456-bafa-4f60-9cdb-eb7694de738d?wsid=/s

{'runId': 'HD_f8e85456-bafa-4f60-9cdb-eb7694de738d',
 'target': 'my-compute',
 'status': 'Completed',
 'startTimeUtc': '2021-02-09T17:53:43.024325Z',
 'endTimeUtc': '2021-02-09T18:03:25.07958Z',
 'properties': {'primary_metric_config': '{"name": "Accuracy", "goal": "maximize"}',
  'resume_from': 'null',
  'runTemplate': 'HyperDrive',
  'azureml.runsource': 'hyperdrive',
  'platform': 'AML',
  'ContentSnapshotId': '30b1612f-8254-4eb0-85c8-1347fcef8043',
  'score': '0.8666666666666667',
  'best_child_run_id': 'HD_f8e85456-bafa-4f60-9cdb-eb7694de738d_1',
  'best_metric_status': 'Succeeded'},
 'inputDatasets': [],
 'outputDatasets': [],
 'logFiles': {'azureml-logs/hyperdrive.txt': 'https://mlstrg138271.blob.core.windows.net/azureml/ExperimentRun/dcid.HD_f8e85456-bafa-4f60-9cdb-eb7694de738d/azureml-logs/hyperdrive.txt?sv=2019-02-02&sr=b&sig=vCMv7%2Bwoa4yAedRFLFiW8cuY9k%2F%2BEQmm7Qdsqna%2B1M4%3D&st=2021-02-09T17%3A53%3A33Z&se=2021-02-10T02%3A03%3A33Z&sp=r'},
 'submittedBy': 'ODL_User 138271'

## Best Model

In [8]:
# Get best model and diplay all details
best_run= hyperdrive_run.get_best_run_by_primary_metric()
best_run_metrics=best_run.get_metrics()
print(best_run.get_details()['runDefinition']['arguments'])
print(best_run.get_file_names())
print('Best Run Accuracy:',best_run_metrics['Accuracy'])

['--C', '85.35037168577276', '--max_iter', '75']
['azureml-logs/55_azureml-execution-tvmps_3e512f63b1ee6974935909182dbd586a2f30eaa5e4ea3949f5dcba554c6b186c_d.txt', 'azureml-logs/65_job_prep-tvmps_3e512f63b1ee6974935909182dbd586a2f30eaa5e4ea3949f5dcba554c6b186c_d.txt', 'azureml-logs/70_driver_log.txt', 'azureml-logs/75_job_post-tvmps_3e512f63b1ee6974935909182dbd586a2f30eaa5e4ea3949f5dcba554c6b186c_d.txt', 'azureml-logs/process_info.json', 'azureml-logs/process_status.json', 'logs/azureml/104_azureml.log', 'logs/azureml/job_prep_azureml.log', 'logs/azureml/job_release_azureml.log', 'outputs/model.joblib']
Best Run Accuracy: 0.8666666666666667


In [10]:
best_run.get_file_names()

['azureml-logs/55_azureml-execution-tvmps_3e512f63b1ee6974935909182dbd586a2f30eaa5e4ea3949f5dcba554c6b186c_d.txt',
 'azureml-logs/65_job_prep-tvmps_3e512f63b1ee6974935909182dbd586a2f30eaa5e4ea3949f5dcba554c6b186c_d.txt',
 'azureml-logs/70_driver_log.txt',
 'azureml-logs/75_job_post-tvmps_3e512f63b1ee6974935909182dbd586a2f30eaa5e4ea3949f5dcba554c6b186c_d.txt',
 'azureml-logs/process_info.json',
 'azureml-logs/process_status.json',
 'logs/azureml/104_azureml.log',
 'logs/azureml/job_prep_azureml.log',
 'logs/azureml/job_release_azureml.log',
 'outputs/model.joblib']

In [9]:
# Save the best model
model=best_run.register_model(model_name='heart-failure-sklearn', model_path='outputs/model.joblib')
best_run.download_file('/outputs/model.joblib', 'hyperdrive_model.joblib')

In [11]:
# Clean up allocated resources
compute_cluster.delete()

Current provisioning state of AmlCompute is "Deleting"

