# Hyperparameter Tuning using HyperDrive

In the cell below, we import all the dependencies that we need to complete the project.

In [17]:
import joblib
import uuid

from azureml.core import (
    Workspace,
    Experiment,
    Dataset,
    ComputeTarget,
    ScriptRunConfig,
    Environment
)

from azureml.train.hyperdrive import (
    BanditPolicy, 
    RandomParameterSampling,
    choice, 
    loguniform, 
    HyperDriveConfig, 
    PrimaryMetricGoal
)

from azureml.widgets import RunDetails

## Workspace

In [2]:
subscription_id = '2c48c51c-bd47-40d4-abbe-fb8eabd19c8c'
resource_group = 'aml-quickstarts-239553'
workspace_name = 'quick-starts-ws-239553'

workspace = Workspace(subscription_id, resource_group, workspace_name)

## Experiment

In [3]:
experiment_name = 'edu_hf_hyperdrive_exp'
experiment = Experiment(workspace, experiment_name)

## Compute target

We assume a compute cluster with the given name has already been created.

In [4]:
compute_cluster_name = "edu-compute-cluster"
compute_target = workspace.compute_targets[compute_cluster_name]

## Dataset

We use the [heart failure dataset](https://www.kaggle.com/datasets/andrewmvd/heart-failure-clinical-data) from Kaggle.
We assume it has already been registered as an Azure ML dataset.

In [5]:
dataset_name = 'edu_heart_failure_dataset'
dataset = Dataset.get_by_name(workspace, name=dataset_name)

In [6]:
# Make a dataframe and take a look at it
patients = dataset.to_pandas_dataframe()
patients.head()

Unnamed: 0,age,anaemia,creatinine_phosphokinase,diabetes,ejection_fraction,high_blood_pressure,platelets,serum_creatinine,serum_sodium,sex,smoking,time,DEATH_EVENT
0,75.0,0,582,0,20,1,265000.0,1.9,130,1,0,4,1
1,55.0,0,7861,0,38,0,263358.03,1.1,136,1,0,6,1
2,65.0,0,146,0,20,0,162000.0,1.3,129,1,1,7,1
3,50.0,1,111,0,20,0,210000.0,1.9,137,1,0,7,1
4,65.0,1,160,1,20,0,327000.0,2.7,116,0,0,8,1


## Hyperdrive Configuration

We're using a random forest (RF) classifier, because RF tend to generate reasonable predictions across a wide range of data while requiring little configuration.

We're letting HyperDrive select the best combination of the hyperparameters `n_estimators`, the number of trees in the forest, and `min_samples_list`, the minimum fraction of samples required to split an internal node.

We're using a Bandit early termination policy, which ends runs when the primary metric isn't within the specified slack factor of the most successful run.

Our primary metric is mean accuracy, which training should maximize.

In [40]:
primary_metric_name = "mean accuracy"

venv = Environment.from_pip_requirements(name="venv", file_path="requirements.txt")

train_cfg = ScriptRunConfig(
    source_directory="steps",
    script="train.py",
    environment=venv,
    compute_target=compute_target,
)

param_sampling = RandomParameterSampling({
    "n_estimators": choice(20, 50, 100, 200),
    "min_samples_split": loguniform(-6, -2),
})

early_termination_policy = BanditPolicy(slack_factor=0.2)

hyperdrive_run_config = HyperDriveConfig(
    run_config=train_cfg,
    hyperparameter_sampling=param_sampling,
    policy=early_termination_policy,
    primary_metric_name=primary_metric_name,
    primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
    max_total_runs=40,
    max_concurrent_runs=4
)

In [8]:
remote_run = experiment.submit(hyperdrive_run_config)

## Run Details

In the cell below, we use the `RunDetails` widget to show the different experiments.

In [9]:
RunDetails(remote_run).show()

No such comm: e7a8f9e13da84fe893c02d38e4ec7408


_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

No such comm: 7461f01ced2a4d1aae93b02be6ec363b
No such comm: ef89dd458fb945fc8b3f0df3cbcd264a
No such comm: 1705adfd5ae3476e9a67eabe32c3421d
No such comm: 2061e18f0e52421d8a3d543d746edbb5
No such comm: bfb81feabc4548e4920c58dae04c672f
No such comm: e7a8f9e13da84fe893c02d38e4ec7408
No such comm: 7461f01ced2a4d1aae93b02be6ec363b
No such comm: ef89dd458fb945fc8b3f0df3cbcd264a
No such comm: 1705adfd5ae3476e9a67eabe32c3421d
No such comm: 2061e18f0e52421d8a3d543d746edbb5
No such comm: bfb81feabc4548e4920c58dae04c672f
No such comm: e7a8f9e13da84fe893c02d38e4ec7408
No such comm: 7461f01ced2a4d1aae93b02be6ec363b
No such comm: ef89dd458fb945fc8b3f0df3cbcd264a
No such comm: 1705adfd5ae3476e9a67eabe32c3421d
No such comm: 2061e18f0e52421d8a3d543d746edbb5
No such comm: bfb81feabc4548e4920c58dae04c672f
No such comm: e7a8f9e13da84fe893c02d38e4ec7408
No such comm: 7461f01ced2a4d1aae93b02be6ec363b
No such comm: ef89dd458fb945fc8b3f0df3cbcd264a
No such comm: 1705adfd5ae3476e9a67eabe32c3421d
No such comm:

## Best Model

In the cells below, we get the best model from the hyperdrive experiments and display all the properties of the model.

In [11]:
best_run = remote_run.get_best_run_by_primary_metric()
best_run_metrics = best_run.get_metrics()

print(f"Best run id: {best_run.id}")
print(f"Best run {primary_metric_name}: {best_run_metrics[primary_metric_name]}")

Best run id: HD_cc76c1bb-a826-450b-aa99-e566e43ad2ed_41
Best run mean accuracy: 0.9066666666666666


In [12]:
print(best_run.get_properties())

{'_azureml.ComputeTargetType': 'amlctrain', 'ContentSnapshotId': '7db49ef9-5b4b-4a37-b946-cc62e5c620b1', 'ProcessInfoFile': 'azureml-logs/process_info.json', 'ProcessStatusFile': 'azureml-logs/process_status.json'}


In [13]:
best_run.get_file_names()

['outputs/model_HD_cc76c1bb-a826-450b-aa99-e566e43ad2ed_41.joblib',
 'system_logs/cs_capability/cs-capability.log',
 'system_logs/hosttools_capability/hosttools-capability.log',
 'system_logs/lifecycler/execution-wrapper.log',
 'system_logs/lifecycler/lifecycler.log',
 'system_logs/metrics_capability/metrics-capability.log',
 'system_logs/snapshot_capability/snapshot-capability.log',
 'user_logs/std_log.txt']

No such comm: ef89dd458fb945fc8b3f0df3cbcd264a
No such comm: 1705adfd5ae3476e9a67eabe32c3421d
No such comm: 2061e18f0e52421d8a3d543d746edbb5
No such comm: bfb81feabc4548e4920c58dae04c672f
No such comm: e7a8f9e13da84fe893c02d38e4ec7408
No such comm: 7461f01ced2a4d1aae93b02be6ec363b
No such comm: ef89dd458fb945fc8b3f0df3cbcd264a
No such comm: 1705adfd5ae3476e9a67eabe32c3421d
No such comm: 2061e18f0e52421d8a3d543d746edbb5
No such comm: bfb81feabc4548e4920c58dae04c672f
No such comm: e7a8f9e13da84fe893c02d38e4ec7408
No such comm: 7461f01ced2a4d1aae93b02be6ec363b
No such comm: ef89dd458fb945fc8b3f0df3cbcd264a
No such comm: 1705adfd5ae3476e9a67eabe32c3421d
No such comm: 2061e18f0e52421d8a3d543d746edbb5
No such comm: bfb81feabc4548e4920c58dae04c672f
No such comm: e7a8f9e13da84fe893c02d38e4ec7408
No such comm: 7461f01ced2a4d1aae93b02be6ec363b
No such comm: ef89dd458fb945fc8b3f0df3cbcd264a
No such comm: 1705adfd5ae3476e9a67eabe32c3421d
No such comm: 2061e18f0e52421d8a3d543d746edbb5
No such comm:

In [14]:
best_run.download_files()

## Model Deployment

Remember you have to deploy only one of the two models you trained but you still need to register both the models. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

In [19]:
model_name = f"model_{best_run.id}.joblib"
model = best_run.register_model(model_name=model_name, model_path="outputs")

In [41]:
# create environment for the deploy
from azureml.core.environment import Environment
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.webservice import AciWebservice

# create deployment config i.e. compute resources
aciconfig = AciWebservice.deploy_configuration(
    cpu_cores=1,
    memory_gb=1,
    tags={"data": "heart_failure", "method": "sklearn"},
    description="Predict heart failure risk with sklearn",
)

In [42]:
from azureml.core.model import InferenceConfig
from azureml.core.environment import Environment
from azureml.core.model import Model

# create an inference config i.e. the scoring script and environment
inference_config = InferenceConfig(entry_script="score.py", environment=venv)

# deploy the service
service_name = "edu-heart-failure-svc-" + str(uuid.uuid4())[:4]
service = Model.deploy(
    workspace=workspace,
    name=service_name,
    models=[model],
    inference_config=inference_config,
    deployment_config=aciconfig,
)

service.wait_for_deployment(show_output=True)

azureml.core.model:
To leverage new model deployment capabilities, AzureML recommends using CLI/SDK v2 to deploy models as online endpoint, 
please refer to respective documentations 
https://docs.microsoft.com/azure/machine-learning/how-to-deploy-managed-online-endpoints /
https://docs.microsoft.com/azure/machine-learning/how-to-attach-kubernetes-anywhere 
For more information on migration, see https://aka.ms/acimoemigration 
Service deployment polling reached non-successful terminal state, current service state: Unhealthy
Operation ID: 7038aac6-c842-4f3c-9eb9-701f65f7a3db
More information can be found using '.get_logs()'
Error:
{
  "code": "ContainerGroupQuotaReached",
  "statusCode": 400,
  "message": "ACI Service request failed. Reason: Resource type 'Microsoft.ContainerInstance/containerGroups' container group quota 'StandardCores' exceeded in region 'westus2'. Limit: '10', Usage: '9.8' Requested: '1.1'.."
}



Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2023-08-01 16:23:14+00:00 Creating Container Registry if not exists.
2023-08-01 16:23:14+00:00 Registering the environment.
2023-08-01 16:23:15+00:00 Building image..
2023-08-01 16:33:29+00:00 Generating deployment configuration.
2023-08-01 16:33:29+00:00 Submitting deployment to compute.
Failed


WebserviceException: WebserviceException:
	Message: Service deployment polling reached non-successful terminal state, current service state: Unhealthy
Operation ID: 7038aac6-c842-4f3c-9eb9-701f65f7a3db
More information can be found using '.get_logs()'
Error:
{
  "code": "ContainerGroupQuotaReached",
  "statusCode": 400,
  "message": "ACI Service request failed. Reason: Resource type 'Microsoft.ContainerInstance/containerGroups' container group quota 'StandardCores' exceeded in region 'westus2'. Limit: '10', Usage: '9.8' Requested: '1.1'.."
}
	InnerException None
	ErrorResponse 
{
    "error": {
        "message": "Service deployment polling reached non-successful terminal state, current service state: Unhealthy\nOperation ID: 7038aac6-c842-4f3c-9eb9-701f65f7a3db\nMore information can be found using '.get_logs()'\nError:\n{\n  \"code\": \"ContainerGroupQuotaReached\",\n  \"statusCode\": 400,\n  \"message\": \"ACI Service request failed. Reason: Resource type 'Microsoft.ContainerInstance/containerGroups' container group quota 'StandardCores' exceeded in region 'westus2'. Limit: '10', Usage: '9.8' Requested: '1.1'..\"\n}"
    }
}

No such comm: bfb81feabc4548e4920c58dae04c672f
No such comm: e7a8f9e13da84fe893c02d38e4ec7408
No such comm: 7461f01ced2a4d1aae93b02be6ec363b
No such comm: ef89dd458fb945fc8b3f0df3cbcd264a
No such comm: 1705adfd5ae3476e9a67eabe32c3421d
No such comm: 2061e18f0e52421d8a3d543d746edbb5
No such comm: bfb81feabc4548e4920c58dae04c672f
No such comm: e7a8f9e13da84fe893c02d38e4ec7408
No such comm: 7461f01ced2a4d1aae93b02be6ec363b
No such comm: ef89dd458fb945fc8b3f0df3cbcd264a
No such comm: 1705adfd5ae3476e9a67eabe32c3421d
No such comm: 2061e18f0e52421d8a3d543d746edbb5
No such comm: ef89dd458fb945fc8b3f0df3cbcd264a
No such comm: 1705adfd5ae3476e9a67eabe32c3421d
No such comm: 2061e18f0e52421d8a3d543d746edbb5
No such comm: bfb81feabc4548e4920c58dae04c672f
No such comm: e7a8f9e13da84fe893c02d38e4ec7408
No such comm: 7461f01ced2a4d1aae93b02be6ec363b


In [35]:
print(service.get_logs())

None


In [None]:
# model.deploy()

In [18]:
# Create a unique name for the endpoint
online_endpoint_name = "edu-endpoint-" + str(uuid.uuid4())[:8]
print(online_endpoint_name)

edu-endpoint-881ca6e6


TODO: In the cell below, send a request to the web service you deployed to test it.

TODO: In the cell below, print the logs of the web service and delete the service

**Submission Checklist**
- I have registered the model.
- I have deployed the model with the best accuracy as a webservice.
- I have tested the webservice by sending a request to the model endpoint.
- I have deleted the webservice and shutdown all the computes that I have used.
- I have taken a screenshot showing the model endpoint as active.
- The project includes a file containing the environment details.

