# Hyperparameter Tuning using HyperDrive

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [1]:
from azureml.core import Dataset, Environment, Experiment, Workspace

from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

from azureml.train.sklearn import SKLearn
from azureml.train.hyperdrive.run import PrimaryMetricGoal
from azureml.train.hyperdrive.policy import BanditPolicy
from azureml.train.hyperdrive.sampling import BayesianParameterSampling
from azureml.train.hyperdrive.runconfig import HyperDriveConfig
from azureml.train.hyperdrive.parameter_expressions import uniform, choice

from azureml.widgets import RunDetails

## Dataset

TODO: Get data. In the cell below, write code to access the data you will be using in this project. Remember that the dataset needs to be external.

In [2]:
ws = Workspace.from_config()
experiment_name = 'hyperdrive-capstone'

experiment=Experiment(ws, experiment_name)

In [4]:
# create a TabularDataset from a dataset
dataset = Dataset.get_by_name(ws, name='heart_failure')
clinical_records = dataset.to_pandas_dataframe()

# preview the first 3 rows of the dataset
clinical_records.head(3)

Unnamed: 0,age,anaemia,creatinine_phosphokinase,diabetes,ejection_fraction,high_blood_pressure,platelets,serum_creatinine,serum_sodium,sex,smoking,time,DEATH_EVENT
0,75.0,0,582,0,20,1,265000.0,1.9,130,1,0,4,1
1,55.0,0,7861,0,38,0,263358.03,1.1,136,1,0,6,1
2,65.0,0,146,0,20,0,162000.0,1.3,129,1,1,7,1


## Hyperdrive Configuration

TODO: Explain the model you are using and the reason for chosing the different hyperparameters, termination policy and config settings.

In [5]:
# Create compute cluster to run HyperDrive
cluster_name = 'cpu-cluster-5'
try:
    compute_target = ComputeTarget(ws, cluster_name)
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D3_V2', min_nodes=1, max_nodes=4)
    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)

compute_target.wait_for_completion(show_output=True)

Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


In [8]:
#TODO: Create the different params that you will be using during training
# The two parameters that we are tuning are
# n-estimators: The number of trees in the forest, a larger forest can lead to better performance
# max-depth: The maximum depth of the tree. It is important to find the right depth to balance generalization/overfitting
param_sampling = BayesianParameterSampling(
   parameter_space={
       "--n-estimators": choice(range(50, 300, 20)),
       "--max-depth": choice(range(1, 20))
    }
)

#TODO: Create your estimator and hyperdrive config
estimator = est = SKLearn(
    source_directory="./",
    entry_script="train.py",
    script_params={'--input-data': dataset.id},
    compute_target=compute_target
)

hyperdrive_run_config = HyperDriveConfig(
    primary_metric_name="accuracy",
    primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
    max_total_runs=20,
    max_concurrent_runs=4,
    hyperparameter_sampling=param_sampling,
    estimator=est
)

'SKLearn' estimator is deprecated. Please use 'ScriptRunConfig' from 'azureml.core.script_run_config' with your own defined environment or the AzureML-Tutorial curated environment.


In [9]:
#TODO: Submit your experiment
hyperdrive_run = experiment.submit(hyperdrive_run_config)



## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [10]:
RunDetails(hyperdrive_run).show()
hyperdrive_run.wait_for_completion(show_output=False)

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

{'runId': 'HD_b8279c50-f6e3-4562-a40c-31bec81c26c4',
 'target': 'cpu-cluster-5',
 'status': 'Completed',
 'startTimeUtc': '2021-01-03T12:44:08.754929Z',
 'endTimeUtc': '2021-01-03T12:56:23.663241Z',
 'properties': {'primary_metric_config': '{"name": "accuracy", "goal": "maximize"}',
  'resume_from': 'null',
  'runTemplate': 'HyperDrive',
  'azureml.runsource': 'hyperdrive',
  'platform': 'AML',
  'ContentSnapshotId': '1a8438d5-9ffa-47b1-a828-5974a0a62b7a',
  'score': '0.8',
  'best_child_run_id': 'HD_b8279c50-f6e3-4562-a40c-31bec81c26c4_1',
  'best_metric_status': 'Succeeded'},
 'inputDatasets': [],
 'outputDatasets': [],
 'logFiles': {'azureml-logs/hyperdrive.txt': 'https://udacityml3129747833.blob.core.windows.net/azureml/ExperimentRun/dcid.HD_b8279c50-f6e3-4562-a40c-31bec81c26c4/azureml-logs/hyperdrive.txt?sv=2019-02-02&sr=b&sig=sCPFAQwdkAISlbZwn49xdemibqjjwt4HrK240z49Gx8%3D&st=2021-01-03T12%3A46%3A24Z&se=2021-01-03T20%3A56%3A24Z&sp=r'}}

## Best Model

TODO: In the cell below, get the best model from the hyperdrive experiments and display all the properties of the model.

In [13]:
best_run = hyperdrive_run.get_best_run_by_primary_metric()
best_run.get_metrics()

{'Num estimators:': 230,
 'Max depth:': 19,
 'accuracy': 0.8,
 'Confusion Matrix': 'aml://artifactId/ExperimentRun/dcid.HD_b8279c50-f6e3-4562-a40c-31bec81c26c4_1/Confusion Matrix'}

In [14]:
#TODO: Save the best model
model = best_run.register_model(model_name='model', model_path='outputs/model.pkl')
print(f"Model {model.name}.v{model.version} correctly saved")

Model model.v2 correctly saved


In [15]:
# Deprovision and delete the AmlCompute target. 
compute_target.delete()
print('Cluster Deleted')

Cluster Deleted
Current provisioning state of AmlCompute is "Deleting"

