# Hyperparameter Tuning using HyperDrive

## Azure ML imports

In [1]:
import logging
import os
import csv
import pkg_resources
import joblib

import numpy as np
import pandas as pd

from matplotlib import pyplot as plt

from sklearn import datasets
from sklearn.metrics import confusion_matrix

import azureml.core
from azureml.core.compute import AmlCompute
from azureml.core.environment import Environment
from azureml.core.compute import ComputeTarget
from azureml.core.compute_target import ComputeTargetException
from azureml.core.dataset import Dataset
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.train.hyperdrive.parameter_expressions import uniform, choice
from azureml.train.hyperdrive.policy import BanditPolicy
from azureml.train.hyperdrive.runconfig import HyperDriveConfig
from azureml.train.hyperdrive.run import PrimaryMetricGoal
from azureml.train.hyperdrive.sampling import RandomParameterSampling
from azureml.train.sklearn import SKLearn
from azureml.widgets import RunDetails

# Check core SDK version number
print("SDK version:", azureml.core.VERSION)

SDK version: 1.57.0


## Initialize workspace

Initialize a workspace object from persisted configuration.

In [2]:
ws = Workspace.from_config()

print(
    'Workspace name: ' + ws.name, 
    'Azure region: ' + ws.location, 
    'Subscription id: ' + ws.subscription_id, 
    'Resource group: ' + ws.resource_group, sep = '\n'
    )

Workspace name: quick-starts-ws-276055
Azure region: southcentralus
Subscription id: 48a74bb7-9950-4cc1-9caa-5d50f995cc55
Resource group: aml-quickstarts-276055


## Create an Azure HyperDrive experiment

Let's create an experiment named `heart-failure-hd-exp` and a folder to hold the training scripts. The script runs will be recorded under the experiment in Azure.

The best practice is to use separate folders for scripts and its dependent files for each step and specify that folder as the `source_directory for the step. This helps reduce the size of the snapshot created for the step (only the specific folder is snapshotted). Since changes in any files in the source_directory would trigger a re-upload of the snapshot, this helps keep the reuse of the step when there are no changes in the source_directory of the step.

In [3]:
# Choose a name for the run history container in the workspace
experiment_name = 'heart-failure-hd-experiment'
project_folder = './Capstone-Project'

experiment = Experiment(ws, experiment_name)
experiment

Name,Workspace,Report Page,Docs Page
heart-failure-hd-experiment,quick-starts-ws-276055,Link to Azure Machine Learning studio,Link to Documentation


### Create or attach an AmlCompute cluster

You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for your HyperDrive run.

In [4]:
# Choose a name for your CPU cluster
compute_cluster_name = "atul-trdigi-compute"

# Verify that cluster does not exist already
try:
    compute_target = ComputeTarget(workspace=ws, name=compute_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(
        vm_size='STANDARD_D2_V2',# for GPU, use "STANDARD_NC6"
        #vm_priority = 'lowpriority', # optional
        min_nodes=0,
        max_nodes=5)
    compute_target = ComputeTarget.create(ws, compute_cluster_name, compute_config)

compute_target.wait_for_completion(show_output=True)

# For a more detailed view of current AmlCompute status, use get_status()
print(compute_target.get_status().serialize())

InProgress..
SucceededProvisioning operation finished, operation "Succeeded"
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned
{'currentNodeCount': 0, 'targetNodeCount': 0, 'nodeStateCounts': {'preparingNodeCount': 0, 'runningNodeCount': 0, 'idleNodeCount': 0, 'unusableNodeCount': 0, 'leavingNodeCount': 0, 'preemptedNodeCount': 0}, 'allocationState': 'Steady', 'allocationStateTransitionTime': '2025-03-22T13:14:04.385000+00:00', 'errors': None, 'creationTime': '2025-03-22T13:13:56.012573+00:00', 'modifiedTime': '2025-03-22T13:14:05.963117+00:00', 'provisioningState': 'Succeeded', 'provisioningStateTransitionTime': None, 'scaleSettings': {'minNodeCount': 0, 'maxNodeCount': 5, 'nodeIdleTimeBeforeScaleDown': 'PT1800S'}, 'vmPriority': 'Dedicated', 'vmSize': 'STANDARD_D2_V2'}


## Dataset

The data is loaded into the workspace using `TabularDataFactory` in the `train.py` script.

## Hyperdrive Configuration

For the HyperDrive experiment, we chose the [`LogisticRegression`](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) classifier from scikit-learn.

The script `train.py` takes care of data collection, cleansing and splitting, model training and testing. Hyperparameter sampling and applying the early stopping policy is performed by HyperDrive.

### Data collection, cleansing and splitting

The dataset is loaded using `TabularDatasetFactory`. The cleansing process drops rows with empty values and performs one hot encoding for categorical columns (our dataset does not have any). The dataset is split into train and test sets. 70% of the data is used for training and 30% for testing.

### Hyperparameter sampling

The project uses two hyperparameters:

- `--C`: inverse regularization strength
- `--max_iter`: maximum iteration to converge for the scikit-learn Logistic Regression model

I use [random parameter sampling](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.hyperdrive.randomparametersampling?view=azure-ml-py). Random sampling supports discrete and continuous hyperparameters. It supports early termination of low-performance runs. In random sampling, hyperparameter values are randomly selected from the defined search space. Random parameter sampling is good approach for discovery learning as well as hyperparameter combinations.

### Model training and testing

Model training and testing is performed using scikit-learn's Logistical Regression model. In `train.py`, metrics are generated and logged. The accuracy is used to benchmark the model.

### Applying early stopping policy

The execution of the pipeline is stopped if the conditions specified by the policy are met.

The model uses [BanditPolicy](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.hyperdrive.banditpolicy?view=azure-ml-py).

Bandit policy is based on slack factor/slack amount and evaluation interval. Bandit ends runs when the primary metric isn't within the specified slack factor/slack amount of the most successful run.

See [HyperDriveConfig Class](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.hyperdrive.hyperdriveconfig?view=azure-ml-py) for a complete list of configuration parameters.


In [5]:
env = Environment.get(workspace=ws,name = 'Azure-ML-Tutorial')

In [6]:
# Early termination policy (not required if using Bayesian sampling)
early_termination_policy = BanditPolicy(evaluation_interval=2, slack_factor=0.1)

# Params that you will be using during training
param_sampling = RandomParameterSampling({
    "--C": uniform(0.001, 100),
    "--max_iter": choice(10, 50, 100, 150, 200)
    })

# Training directory and script
train_dir = "./hyperdrive-training-script"
train_script = "train.py"

from azureml.core import ScriptRunConfig
# SKLearn estimator for use with train.py
estimator = ScriptRunConfig(
    source_directory=train_dir,
    script=train_script,
    compute_target=compute_cluster_name,
    environment=env
    )

# HyperDriveConfig using the estimator, hyperparameter sampler, and policy
hyperdrive_run_config = HyperDriveConfig(
    run_config=estimator,
    hyperparameter_sampling=param_sampling,
    primary_metric_name='Accuracy',
    primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
    max_total_runs=25,
    max_concurrent_runs=5,
    policy=early_termination_policy,
    )

In [7]:
# Submit your experiment
hyperdrive_run = experiment.submit(hyperdrive_run_config)

## Run Details

Use the `RunDetails` widget to show the different experiments.

In [8]:
RunDetails(hyperdrive_run).show()
hyperdrive_run.wait_for_completion(show_output=True)
hyperdrive_run

2025-03-22 13:21:26.835816: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-03-22 13:21:28.238401: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-03-22 13:21:28.650813: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-03-22 13:21:31.767802: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

RunId: HD_835ff3f9-1054-421e-8046-ecee2859decd
Web View: https://ml.azure.com/runs/HD_835ff3f9-1054-421e-8046-ecee2859decd?wsid=/subscriptions/48a74bb7-9950-4cc1-9caa-5d50f995cc55/resourcegroups/aml-quickstarts-276055/workspaces/quick-starts-ws-276055&tid=660b3398-b80e-49d2-bc5b-ac1dc93b5254

Streaming azureml-logs/hyperdrive.txt

[2025-03-22T13:18:17.3550920Z][GENERATOR][DEBUG]Sampled 5 jobs from search space 
[2025-03-22T13:18:17.9580345Z][SCHEDULER][INFO]Scheduling job, id='HD_835ff3f9-1054-421e-8046-ecee2859decd_0' 
[2025-03-22T13:18:18.0581321Z][SCHEDULER][INFO]Scheduling job, id='HD_835ff3f9-1054-421e-8046-ecee2859decd_2' 
[2025-03-22T13:18:17.9937968Z][SCHEDULER][INFO]Scheduling job, id='HD_835ff3f9-1054-421e-8046-ecee2859decd_1' 
[2025-03-22T13:18:18.1606738Z][SCHEDULER][INFO]Scheduling job, id='HD_835ff3f9-1054-421e-8046-ecee2859decd_4' 
[2025-03-22T13:18:18.1616112Z][SCHEDULER][INFO]Scheduling job, id='HD_835ff3f9-1054-421e-8046-ecee2859decd_3' 
[2025-03-22T13:18:18.4548207Z]

Experiment,Id,Type,Status,Details Page,Docs Page
heart-failure-hd-experiment,HD_835ff3f9-1054-421e-8046-ecee2859decd,hyperdrive,Completed,Link to Azure Machine Learning studio,Link to Documentation


## Best Model

Get the best model from the hyperdrive experiments and display all the properties of the model.

In [9]:
# Get your best run
best_run = hyperdrive_run.get_best_run_by_primary_metric()
print(f"Best run arguments: {best_run.get_details()['runDefinition']['arguments']}")
print(f"Best run metrics: {best_run.get_metrics()}")
print(f"Best run file names: {best_run.get_file_names()}")

Best run arguments: ['--C', '0.4589073784297085', '--max_iter', '100']
Best run metrics: {'Regularization strength:': 0.4589073784297085, 'Max iterations:': 100, 'Accuracy': 0.8}
Best run file names: ['logs/azureml/dataprep/0/rslex.log.2025-03-22-13', 'system_logs/cs_capability/cs-capability.log', 'system_logs/hosttools_capability/hosttools-capability.log', 'system_logs/lifecycler/execution-wrapper.log', 'system_logs/lifecycler/lifecycler.log', 'system_logs/metrics_capability/metrics-capability.log', 'system_logs/snapshot_capability/snapshot-capability.log', 'user_logs/std_log.txt']


In [None]:
# Save the best model
joblib.dump(value=best_run.id, filename="./outputs/hyperdrive_model.joblib")

## Model Deployment

As AutoML produced the better model, the HyperDrive model will not be deployed.