# Hyperparameter Tuning using HyperDrive

Firstly, we will import all dependencies that are required.

In [5]:
from azureml.core import Workspace, Experiment
from azureml.widgets import RunDetails
from azureml.train.sklearn import SKLearn
from azureml.train.hyperdrive.run import PrimaryMetricGoal
from azureml.train.hyperdrive.policy import BanditPolicy
from azureml.train.hyperdrive.sampling import RandomParameterSampling
from azureml.train.hyperdrive.runconfig import HyperDriveConfig
from azureml.train.hyperdrive.parameter_expressions import uniform,choice
import os
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Dataset

This dataset is composed of a range of biomedical voice measurements from 31 people. Out of these,23 people had Parkinson's disease. Each column in the table corresponds to a particular voice measure. Each row corresponds to the voice recordings of 195 individuals.

The main aim of this dataset is to distinguish healthy people from those with the disease.The column "status" will be used for the same. The results will be binary where "0" represents healthy and "1" represents those with disease.


The Dataset is multivariate and in ASCII CSV format with following attributes:

## Attribute Information:

**Matrix column entries (attributes):**

- **name** - ASCII subject name and recording number <br>
- **MDVP:Fo(Hz)** - Average vocal fundamental frequency <br>
- **MDVP:Fhi(Hz)** - Maximum vocal fundamental frequency <br>
- **MDVP:Flo(Hz)** - Minimum vocal fundamental frequency <br>
- **MDVP:Jitter(%),MDVP:Jitter(Abs),MDVP:RAP,MDVP:PPQ,Jitter:DDP** - Several measures of variation in fundamental frequency <br>
- **MDVP:Shimmer,MDVP:Shimmer(dB),Shimmer:APQ3,Shimmer:APQ5,MDVP:APQ,Shimmer:DDA** - Several measures of variation in amplitude<br>
- **NHR,HNR** - Two measures of ratio of noise to tonal components in the voice <br>
- **status** - Health status of the subject (one) - Parkinson's, (zero) - healthy<br>
- **RPDE,D2** - Two nonlinear dynamical complexity measures<br>
- **DFA** - Signal fractal scaling exponent<br>
- **spread1,spread2,PPE** - Three nonlinear measures of fundamental frequency variation<br>


In [6]:
ws = Workspace.from_config()  
exp = Experiment(workspace=ws, name="Parkinson-hyperdrive") 

print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep = '\n')

run = exp.start_logging()

Workspace name: quick-starts-ws-133204
Azure region: southcentralus
Subscription id: 2c48c51c-bd47-40d4-abbe-fb8eabd19c8c
Resource group: aml-quickstarts-133204


# Hyperdrive Configuration

In [4]:
cluster_name = "MyNewCluster"
try:
    compute = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_configuration = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS3_V2',
                                                           max_nodes=4)
    compute = ComputeTarget.create(ws, cluster_name, compute_configuration)

Found existing cluster, use it.


# Explaination of the model we are using and the reason for chosing the different hyperparameters, termination policy and config settings:

We have used Azure ML's Hyperdrive here. Firstly, we create the different parameters that will be used during the training.
They are "--C" and "--max_iter". On these we have used the RandomParameterSampling. Then we use "uniform" that specifies the uniform distribution from which the samplers are taken for "--C" and "choice" to choose values from the discrete set of values for "--max_iter".

Then, we define our early termination policy with evaluation_interval=1 slack_factor=0.02, slack_amount=None, delay_evaluation=0 using the BanditPolicy class. This is done to terminate the run that are not performing up to the mark.

Then, we create the estimator and the hyperdrive. We have used train.py to perform the Logistic Regression algorithm. Since the output that we will predict is binary i.e. "0" for healthy or "1" for those with disease , hence we used Logistic Regression.

Now, we define the Hyperdrive Configuration. We give max_concurrent_runs value of 4, i.e. the maximum parallel iterations will be four and max_total_runs will be 22 since we only have 195 rows to evaluate.

In [5]:
ps= RandomParameterSampling(
    {
        "--C":uniform(0.01,0.05),
        "--max_iter": choice(100, 150, 200, 250, 300)
    }
)

policy = BanditPolicy(evaluation_interval=1, slack_factor=0.02, slack_amount=None, delay_evaluation=0)

if "training" not in os.listdir():
    os.mkdir("./training")

est = SKLearn(source_directory='./', compute_target=compute,entry_script='train.py',vm_priority=None) 
hyperdrive_config = HyperDriveConfig(estimator=est,
                                     hyperparameter_sampling=ps,
                                     policy=policy,
                                     max_concurrent_runs=4,
                                     primary_metric_name="Accuracy",
                                     primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
                                     max_total_runs=22)

'SKLearn' estimator is deprecated. Please use 'ScriptRunConfig' from 'azureml.core.script_run_config' with your own defined environment or the AzureML-Tutorial curated environment.


## Run Details

In [6]:
experiment_run = exp.submit(config=hyperdrive_config)
RunDetails(experiment_run).show()
experiment_run.wait_for_completion(show_output=True)



_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

RunId: HD_24a1c38c-6175-4b4b-b05d-b18fa916d652
Web View: https://ml.azure.com/experiments/Parkinson-hyperdrive/runs/HD_24a1c38c-6175-4b4b-b05d-b18fa916d652?wsid=/subscriptions/2c48c51c-bd47-40d4-abbe-fb8eabd19c8c/resourcegroups/aml-quickstarts-133204/workspaces/quick-starts-ws-133204

Streaming azureml-logs/hyperdrive.txt

"<START>[2021-01-03T22:30:39.530778][GENERATOR][INFO]Trying to sample '4' jobs from the hyperparameter space<END>\n""<START>[2021-01-03T22:30:39.036731][API][INFO]Experiment created<END>\n""<START>[2021-01-03T22:30:39.848475][GENERATOR][INFO]Successfully sampled '4' jobs, they will soon be submitted to the execution target.<END>\n"<START>[2021-01-03T22:30:41.1190129Z][SCHEDULER][INFO]The execution environment is being prepared. Please be patient as it can take a few minutes.<END>

Execution Summary
RunId: HD_24a1c38c-6175-4b4b-b05d-b18fa916d652
Web View: https://ml.azure.com/experiments/Parkinson-hyperdrive/runs/HD_24a1c38c-6175-4b4b-b05d-b18fa916d652?wsid=/subscript

{'runId': 'HD_24a1c38c-6175-4b4b-b05d-b18fa916d652',
 'target': 'MyNewCluster',
 'status': 'Completed',
 'startTimeUtc': '2021-01-03T22:30:38.732577Z',
 'endTimeUtc': '2021-01-03T22:40:44.031413Z',
 'properties': {'primary_metric_config': '{"name": "Accuracy", "goal": "maximize"}',
  'resume_from': 'null',
  'runTemplate': 'HyperDrive',
  'azureml.runsource': 'hyperdrive',
  'platform': 'AML',
  'ContentSnapshotId': '94dbb81b-150f-41ab-ae14-5e9780b6afe9',
  'score': '0.9056603773584906',
  'best_child_run_id': 'HD_24a1c38c-6175-4b4b-b05d-b18fa916d652_19',
  'best_metric_status': 'Succeeded'},
 'inputDatasets': [],
 'outputDatasets': [],
 'logFiles': {'azureml-logs/hyperdrive.txt': 'https://mlstrg133204.blob.core.windows.net/azureml/ExperimentRun/dcid.HD_24a1c38c-6175-4b4b-b05d-b18fa916d652/azureml-logs/hyperdrive.txt?sv=2019-02-02&sr=b&sig=BQUmC8tPBpHuBsNJVLGDESGbuzGIW0VFkTjGztVpbSA%3D&st=2021-01-03T22%3A30%3A52Z&se=2021-01-04T06%3A40%3A52Z&sp=r'}}

## Best Model

In [7]:
best = experiment_run.get_best_run_by_primary_metric()
print(best.get_metrics())
print(best.get_file_names())
print(best.get_details()['runDefinition']['arguments'])
best_model=best.register_model(model_name='Parkinson_Disease_HYPERDRIVE',model_path='./')

{'Regularization Strength:': 0.04411012133409599, 'Max iterations:': 200, 'Accuracy': 0.9056603773584906}
['azureml-logs/55_azureml-execution-tvmps_dbb62b67411047b7b4af0ed2729599ede67583a41e4a4b40cd3471120d4dff9f_d.txt', 'azureml-logs/65_job_prep-tvmps_dbb62b67411047b7b4af0ed2729599ede67583a41e4a4b40cd3471120d4dff9f_d.txt', 'azureml-logs/70_driver_log.txt', 'azureml-logs/75_job_post-tvmps_dbb62b67411047b7b4af0ed2729599ede67583a41e4a4b40cd3471120d4dff9f_d.txt', 'azureml-logs/process_info.json', 'azureml-logs/process_status.json', 'logs/azureml/96_azureml.log', 'logs/azureml/dataprep/backgroundProcess.log', 'logs/azureml/dataprep/backgroundProcess_Telemetry.log', 'logs/azureml/dataprep/engine_spans_l_f050c829-8873-4d0f-a08a-6afab3178661.jsonl', 'logs/azureml/dataprep/python_span_l_f050c829-8873-4d0f-a08a-6afab3178661.jsonl', 'logs/azureml/job_prep_azureml.log', 'logs/azureml/job_release_azureml.log']
['--C', '0.04411012133409599', '--max_iter', '200']
