# Hyperparameter Tuning using HyperDrive

In the cell below, we import all the dependencies that we need to complete the project.

In [1]:
import azureml.core
from azureml.core import (
    Workspace,
    Experiment, 
    Dataset, 
    ComputeTarget,
    ScriptRunConfig,
    Environment,
)

from azureml.train.hyperdrive import (
    BanditPolicy,
    RandomParameterSampling,
    choice,
    uniform,
    loguniform,
    HyperDriveConfig,
    PrimaryMetricGoal,
)

from azureml.widgets import RunDetails
from azureml.core.model import InferenceConfig, Model
from azureml.core.webservice import AciWebservice
from azureml.train.sklearn import SKLearn
from azureml.train.hyperdrive.run import PrimaryMetricGoal

import joblib
import uuid
import pandas as pd 
import requests
import json
import os

## Setup workspace and experiment

In [2]:
ws = Workspace.from_config()
experiment_name = 'capstone-experiment'
experiment=Experiment(ws, experiment_name)

print(f'Workspace name: {ws.name} / AZ region: {ws.location} ' \
    f'/ Subscription ID: {ws.subscription_id} / Resource group: {ws.resource_group}')

run = experiment.start_logging()

Workspace name: quick-starts-ws-239936 / AZ region: westus2 / Subscription ID: 9b72f9e6-56c5-4c16-991b-19c652994860 / Resource group: aml-quickstarts-239936


## Compute target assignment

In [3]:
from azureml.core.compute import AmlCompute

cluster_name = "capstone-cluster"

try:
    compute_target = ComputeTarget(workspace=ws, name=cluster_name)
    print(f"Found existing compute target: {compute_target}")
except Exception as e:
    print(f"Creating a new compute target (error: {e}")
    compute_cnfg = AmlCompute.provisioning_configuration(
        vm_size = "Standard_DS3_V2",
        min_nodes = 0,
        max_nodes = 4,
    )
    compute_target = ComputeTarget.create(
        ws,
        cluster_name,
        compute_cnfg,
    )
    compute_target.wait_for_completion(
        show_output=True,
        min_node_count=None,
        timeout_in_minutes=60,
    )

# message if ready
print(f'compute target: {compute_target.get_status().serialize()}')

Found existing compute target: AmlCompute(workspace=Workspace.create(name='quick-starts-ws-239936', subscription_id='9b72f9e6-56c5-4c16-991b-19c652994860', resource_group='aml-quickstarts-239936'), name=capstone-cluster, id=/subscriptions/9b72f9e6-56c5-4c16-991b-19c652994860/resourceGroups/aml-quickstarts-239936/providers/Microsoft.MachineLearningServices/workspaces/quick-starts-ws-239936/computes/capstone-cluster, type=AmlCompute, provisioning_state=Succeeded, location=westus2, tags={})
compute target: {'currentNodeCount': 4, 'targetNodeCount': 3, 'nodeStateCounts': {'preparingNodeCount': 0, 'runningNodeCount': 0, 'idleNodeCount': 4, 'unusableNodeCount': 0, 'leavingNodeCount': 0, 'preemptedNodeCount': 0}, 'allocationState': 'Resizing', 'allocationStateTransitionTime': '2023-08-08T17:25:13.665000+00:00', 'errors': None, 'creationTime': '2023-08-08T16:38:00.642955+00:00', 'modifiedTime': '2023-08-08T16:38:04.594612+00:00', 'provisioningState': 'Succeeded', 'provisioningStateTransitionTi

## Dataset

For our capstone project, we use [the kaggle heart failure dataset](https://www.kaggle.com/datasets/andrewmvd/heart-failure-clinical-data). We uploaded and registered this dataset to the workspace beforehand. The dataset stems from [a publication on heart failure prediction using machine learning](https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-020-1023-5). It contains clinical data and the patient's survival or death as a binary variable. As the variable is binary, we are dealing with a classification problem here.

In [4]:
ds_name = 'heart_failure_kaggle_ml'
dataset = Dataset.get_by_name(workspace=ws, name=ds_name)

In [6]:
# inspect the dataframe 
hf_prediction = dataset.to_pandas_dataframe()
hf_prediction.head()

Unnamed: 0,age,anaemia,creatinine_phosphokinase,diabetes,ejection_fraction,high_blood_pressure,platelets,serum_creatinine,serum_sodium,sex,smoking,time,DEATH_EVENT
0,75.0,0,582,0,20,1,265000.0,1.9,130,1,0,4,1
1,55.0,0,7861,0,38,0,263358.03,1.1,136,1,0,6,1
2,65.0,0,146,0,20,0,162000.0,1.3,129,1,1,7,1
3,50.0,1,111,0,20,0,210000.0,1.9,137,1,0,7,1
4,65.0,1,160,1,20,0,327000.0,2.7,116,0,0,8,1


In [7]:
hf_prediction.tail()

Unnamed: 0,age,anaemia,creatinine_phosphokinase,diabetes,ejection_fraction,high_blood_pressure,platelets,serum_creatinine,serum_sodium,sex,smoking,time,DEATH_EVENT
294,62.0,0,61,1,38,1,155000.0,1.1,143,1,1,270,0
295,55.0,0,1820,0,38,0,270000.0,1.2,139,0,0,271,0
296,45.0,0,2060,1,60,0,742000.0,0.8,138,0,0,278,0
297,45.0,0,2413,0,38,0,140000.0,1.4,140,1,1,280,0
298,50.0,0,196,0,45,0,395000.0,1.6,136,1,1,285,0


## Hyperdrive Configuration

We use a Gradient Boosting Classifier here, because we are dealing with tabular data and have a binary variable as the target in our classification problem. This is an additive modeling approach, often providing very good performance, a lot of flexibility, can work with categorical and numerical values as-is and naturally handles missing data. 

To tune and adapt the basic classifier, we are using HyperDrive to select the best hyperparameters which are here:

- the learning rate (default: 0.1, we vary this)
- n_estimators, the number of base estimators (decision trees) used in the gradient boosting modeling process

We use a bandit early stopping policy, which halts the experiments if there is no more improvement in model accuracy, i.e. the model primary metric of the last run is no within the specified slack factor of the most successful run.

To progress through the hyperparameter search space (defined on `n_estimators` and the `learning_rate`) fast and easy, we use a random parameter sampler. This is bc. of its non-exhaustive nature, sampling suitable hyperparameters randomly.

To best track our experimentation success, we optimize for the best-possible accuracy (primary metric) w.r.t. the classification problem, to most accurately predict the (non-) survival of patient's based on their clinical data.

In [12]:
# preliminaries
primary_metric_name = "accuracy"
if "training" not in os.listdir():
    os.mkdir("training/")

# setup training environment
task_env = Environment.from_pip_requirements(
    name="venv", 
    file_path="requirements.txt"
)

# policy specification
early_termination_policy = BanditPolicy(
    evaluation_interval=2, 
    slack_factor=0.2
)

# parameter sampler specification
param_sampling = RandomParameterSampling(
    {
        "--learning_rate": uniform(0.1, 0.5),
        "--n_estimators": choice(100, 200, 300, 350),
    }
)

# estimator and hyperdrive config specification
estimator = ScriptRunConfig(
    source_directory="./steps",
    script="train.py",
    environment=task_env,
    compute_target=compute_target,
)

hyperdrive_run_config = HyperDriveConfig(
    run_config=estimator,
    hyperparameter_sampling=param_sampling,
    policy=early_termination_policy,
    primary_metric_name="accuracy",
    primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
    max_total_runs=10,
    max_concurrent_runs=4,
)

In [13]:
hyperdrive_run = experiment.submit(hyperdrive_run_config)

## Run Details

In the cell below, we use the `RunDetails` widget to show the results of the hyperdrive experiment defined above and look at the best model as well as its properties.

In [14]:
RunDetails(hyperdrive_run).show()
hyperdrive_run.wait_for_completion(show_output=True)

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

RunId: HD_a4a39d61-19ab-4984-afee-61b5bc108990
Web View: https://ml.azure.com/runs/HD_a4a39d61-19ab-4984-afee-61b5bc108990?wsid=/subscriptions/9b72f9e6-56c5-4c16-991b-19c652994860/resourcegroups/aml-quickstarts-239936/workspaces/quick-starts-ws-239936&tid=660b3398-b80e-49d2-bc5b-ac1dc93b5254

Streaming azureml-logs/hyperdrive.txt

[2023-08-08T17:44:44.353258][GENERATOR][INFO]Trying to sample '4' jobs from the hyperparameter space
[2023-08-08T17:44:44.8416297Z][SCHEDULER][INFO]Scheduling job, id='HD_a4a39d61-19ab-4984-afee-61b5bc108990_0' 
[2023-08-08T17:44:45.1404992Z][SCHEDULER][INFO]Scheduling job, id='HD_a4a39d61-19ab-4984-afee-61b5bc108990_1' 
[2023-08-08T17:44:45.1383271Z][SCHEDULER][INFO]Scheduling job, id='HD_a4a39d61-19ab-4984-afee-61b5bc108990_2' 
[2023-08-08T17:44:45.094455][GENERATOR][INFO]Successfully sampled '4' jobs, they will soon be submitted to the execution target.
[2023-08-08T17:44:45.2817621Z][SCHEDULER][INFO]Scheduling job, id='HD_a4a39d61-19ab-4984-afee-61b5bc1089

{'runId': 'HD_a4a39d61-19ab-4984-afee-61b5bc108990',
 'target': 'capstone-cluster',
 'status': 'Completed',
 'startTimeUtc': '2023-08-08T17:44:43.607508Z',
 'endTimeUtc': '2023-08-08T17:48:49.360919Z',
 'services': {},
 'properties': {'primary_metric_config': '{"name":"accuracy","goal":"maximize"}',
  'resume_from': 'null',
  'runTemplate': 'HyperDrive',
  'azureml.runsource': 'hyperdrive',
  'platform': 'AML',
  'ContentSnapshotId': '22ca1c56-5eff-4320-91f5-649986d889d3',
  'user_agent': 'python/3.8.5 (Linux-5.15.0-1040-azure-x86_64-with-glibc2.10) msrest/0.7.1 Hyperdrive.Service/1.0.0 Hyperdrive.SDK/core.1.51.0',
  'space_size': 'infinite_space_size',
  'score': '0.88',
  'best_child_run_id': 'HD_a4a39d61-19ab-4984-afee-61b5bc108990_1',
  'best_metric_status': 'Succeeded',
  'best_data_container_id': 'dcid.HD_a4a39d61-19ab-4984-afee-61b5bc108990_1'},
 'inputDatasets': [],
 'outputDatasets': [],
 'runDefinition': {'configuration': None,
  'attribution': None,
  'telemetryValues': {'am

## Best Model

In the cell below, get the best model from the hyperdrive experiments and display all the properties of the model.

In [15]:
brun = hyperdrive_run.get_best_run_by_primary_metric()
print(f"ID for best model run: {brun.id}")

ID for best model run: HD_a4a39d61-19ab-4984-afee-61b5bc108990_1


In [16]:
brun_metrics = brun.get_metrics()
print(f"Best metrics collected from the best run: {brun_metrics}")
print(f"Reached accuracy: {brun_metrics['accuracy']}")

Best metrics collected from the best run: {'learning_rate:': 0.10642205301112365, 'n_estimators:': 100, 'clf_report': '              precision    recall  f1-score   support\n\n           0       0.89      0.94      0.91        51\n           1       0.86      0.75      0.80        24\n\n    accuracy                           0.88        75\n   macro avg       0.87      0.85      0.86        75\nweighted avg       0.88      0.88      0.88        75\n', 'accuracy': 0.88}
Reached accuracy: 0.88


In [17]:
# display best run details
print(brun)
brun

Run(Experiment: capstone-experiment,
Id: HD_a4a39d61-19ab-4984-afee-61b5bc108990_1,
Type: azureml.scriptrun,
Status: Completed)


Experiment,Id,Type,Status,Details Page,Docs Page
capstone-experiment,HD_a4a39d61-19ab-4984-afee-61b5bc108990_1,azureml.scriptrun,Completed,Link to Azure Machine Learning studio,Link to Documentation


Then, we save the model along with all other hd-run files.

In [19]:
brun.get_file_names()

['outputs/model.joblib',
 'system_logs/cs_capability/cs-capability.log',
 'system_logs/hosttools_capability/hosttools-capability.log',
 'system_logs/lifecycler/execution-wrapper.log',
 'system_logs/lifecycler/lifecycler.log',
 'system_logs/metrics_capability/metrics-capability.log',
 'system_logs/snapshot_capability/snapshot-capability.log',
 'user_logs/std_log.txt']

In [20]:
brun.download_files("./outputs")

## Model Deployment

Remember you have to deploy only one of the two models you trained but you still need to register both the models. Perform the steps in the rest of this notebook only if you wish to deploy this model.

As the model created here is equally good as the AutoML model, we chose to register and directly use it here in the cells below.

In [25]:
model = brun.register_model(
    model_name="hyperdrive_model",
    model_path="./outputs/model.joblib"
)

To make use of the model in an endpoint, we first create an Aci Webservice deployment and inference config and then deploy the model here. 

In [27]:
task_env = Environment.from_pip_requirements(
    name="venv", 
    file_path="requirements.txt"
)

deployment_config = AciWebservice.deploy_configuration(
    cpu_cores=1,
    memory_gb=1,
)

inference_config = InferenceConfig(
    entry_script="score.py",
    environment=task_env,
)

service = Model.deploy(
    ws,
    "capstone-service",
    [model],
    inference_config,
    deployment_config,
    overwrite=True,
)

service.wait_for_deployment(show_output=True)

azureml.core.model:
To leverage new model deployment capabilities, AzureML recommends using CLI/SDK v2 to deploy models as online endpoint, 
please refer to respective documentations 
https://docs.microsoft.com/azure/machine-learning/how-to-deploy-managed-online-endpoints /
https://docs.microsoft.com/azure/machine-learning/how-to-attach-kubernetes-anywhere 
For more information on migration, see https://aka.ms/acimoemigration 


Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2023-08-08 19:06:52+00:00 Creating Container Registry if not exists.
2023-08-08 19:06:52+00:00 Registering the environment.
2023-08-08 19:06:53+00:00 Use the existing image.
2023-08-08 19:06:53+00:00 Generating deployment configuration.
2023-08-08 19:06:54+00:00 Submitting deployment to compute.
2023-08-08 19:06:56+00:00 Checking the status of deployment capstone-service..
2023-08-08 19:15:12+00:00 Checking the status of inference endpoint capstone-service.
Succeeded
ACI service creation operation finished, operation "Succeeded"


In [28]:
print(service.get_logs())

/bin/bash: /azureml-envs/azureml_58f440cd5db21b9ab3c363f087e7b355/lib/libtinfo.so.6: no version information available (required by /bin/bash)
/bin/bash: /azureml-envs/azureml_58f440cd5db21b9ab3c363f087e7b355/lib/libtinfo.so.6: no version information available (required by /bin/bash)
/bin/bash: /azureml-envs/azureml_58f440cd5db21b9ab3c363f087e7b355/lib/libtinfo.so.6: no version information available (required by /bin/bash)
2023-08-08T19:14:12,767994423+00:00 - rsyslog/run 
2023-08-08T19:14:12,776993918+00:00 - gunicorn/run 
bash: /azureml-envs/azureml_58f440cd5db21b9ab3c363f087e7b355/lib/libtinfo.so.6: no version information available (required by bash)
2023-08-08T19:14:12,781754615+00:00 | gunicorn/run | 
2023-08-08T19:14:12,783397515+00:00 | gunicorn/run | ###############################################
2023-08-08T19:14:12,784923414+00:00 | gunicorn/run | AzureML Container Runtime Information
2023-08-08T19:14:12,790125411+00:00 | gunicorn/run | ########################################

In the cell below, we test the service with three random rows from the dataset:

In [35]:
target_var = "DEATH_EVENT"
json_payload = {
    "data": hf_prediction.drop(columns=target_var).sample(n=3).to_dict("records")
}

raw_data = json.dumps(json_payload)
print(raw_data)

{"data": [{"age": 61.0, "anaemia": 0, "creatinine_phosphokinase": 582, "diabetes": 1, "ejection_fraction": 38, "high_blood_pressure": 0, "platelets": 147000.0, "serum_creatinine": 1.2, "serum_sodium": 141, "sex": 1, "smoking": 0, "time": 237}, {"age": 66.0, "anaemia": 1, "creatinine_phosphokinase": 68, "diabetes": 1, "ejection_fraction": 38, "high_blood_pressure": 1, "platelets": 162000.0, "serum_creatinine": 1.0, "serum_sodium": 136, "sex": 0, "smoking": 0, "time": 95}, {"age": 70.0, "anaemia": 0, "creatinine_phosphokinase": 212, "diabetes": 1, "ejection_fraction": 17, "high_blood_pressure": 1, "platelets": 389000.0, "serum_creatinine": 1.0, "serum_sodium": 136, "sex": 1, "smoking": 1, "time": 188}]}


In [36]:
# we use requests lib to consume endpoint
req_headers = {"Content-Type": "application/json"}
uri = service.scoring_uri

print(f"Service scoring URI: {uri}")
response = requests.post(uri, data=raw_data, headers=req_headers)
print(response.json())

Service scoring URI: http://1295abf5-d570-47ef-975f-bd6effe28af4.westus2.azurecontainer.io/score
[0, 0, 0]


And we get three predictions back, all indicating that the patients' will not suffer from death by heart failure.

In the cell below, we print the logs of the web service and delete the service

In [37]:
print(service.get_logs())

/bin/bash: /azureml-envs/azureml_58f440cd5db21b9ab3c363f087e7b355/lib/libtinfo.so.6: no version information available (required by /bin/bash)
/bin/bash: /azureml-envs/azureml_58f440cd5db21b9ab3c363f087e7b355/lib/libtinfo.so.6: no version information available (required by /bin/bash)
/bin/bash: /azureml-envs/azureml_58f440cd5db21b9ab3c363f087e7b355/lib/libtinfo.so.6: no version information available (required by /bin/bash)
2023-08-08T19:14:12,767994423+00:00 - rsyslog/run 
2023-08-08T19:14:12,776993918+00:00 - gunicorn/run 
bash: /azureml-envs/azureml_58f440cd5db21b9ab3c363f087e7b355/lib/libtinfo.so.6: no version information available (required by bash)
2023-08-08T19:14:12,781754615+00:00 | gunicorn/run | 
2023-08-08T19:14:12,783397515+00:00 | gunicorn/run | ###############################################
2023-08-08T19:14:12,784923414+00:00 | gunicorn/run | AzureML Container Runtime Information
2023-08-08T19:14:12,790125411+00:00 | gunicorn/run | ########################################

Delete and shutdown webservice:

In [None]:
service.delete()
model.delete()
compute_target.delete()

**Submission Checklist**
- I have registered the model.
- I have deployed the model with the best accuracy as a webservice.
- I have tested the webservice by sending a request to the model endpoint.
- I have deleted the webservice and shutdown all the computes that I have used.
- I have taken a screenshot showing the model endpoint as active.
- The project includes a file containing the environment details.

