# Hyperparameter Tuning using HyperDrive

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

## Dataset

TODO: Get data. In the cell below, write code to access the data you will be using in this project. Remember that the dataset needs to be external.

The data are downloaded inside the `hyperdrive.py` script. 

In [1]:
from azureml.core import Workspace, Experiment

ws = Workspace.from_config()


print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep = '\n')

Workspace name: quick-starts-ws-138882
Azure region: southcentralus
Subscription id: 61c5c3f0-6dc7-4ed9-a7f3-c704b20e3b30
Resource group: aml-quickstarts-138882


## Hyperdrive Configuration

TODO: Explain the model you are using and the reason for chosing the different hyperparameters, termination policy and config settings.

In [11]:
from azureml.core.environment import Environment
from azureml.train.sklearn import SKLearn
from azureml.train.hyperdrive.run import PrimaryMetricGoal
from azureml.train.hyperdrive.policy import BanditPolicy
from azureml.train.hyperdrive.sampling import RandomParameterSampling
from azureml.train.hyperdrive.runconfig import HyperDriveConfig
from azureml.train.hyperdrive.parameter_expressions import uniform
from azureml.train.hyperdrive import choice, PrimaryMetricGoal
from azureml.core.script_run_config import ScriptRunConfig
from azureml.train.estimator import Estimator
import os

# Specify parameter sampler
params_sampling = RandomParameterSampling({ 
    "n_estimators": choice([10, 100, 250, 500]), 
    "max_depth": choice([1, 5, 10, 15, 20]), 
}) 

# compute target
cpu_cluster_name = "compute-cluster"

# Specify a Policy
policy = BanditPolicy(evaluation_interval=100, slack_factor=0.2)

run_config = ScriptRunConfig(
    source_directory="./", 
    script="hyperdrive.py", 
    arguments=None, #may need to specify arguments https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.scriptrunconfig?view=azure-ml-py
    compute_target=cpu_cluster_name, 
    environment=Environment.get(workspace=ws, name='AzureML-Scikit-learn-0.20.3'), 
)


# Create a HyperDriveConfig using the estimator, hyperparameter sampler, and policy.
hyperdrive_config = HyperDriveConfig(
    hyperparameter_sampling=params_sampling, 
    primary_metric_name="rmse", 
    primary_metric_goal=PrimaryMetricGoal.MINIMIZE, 
    max_total_runs=100, 
    policy=policy, 
    run_config=run_config, 
)

## Submit hyperdrive run

In [12]:
experiment_name = 'hyperDrive-RUL-prediction'
experiment=Experiment(ws, experiment_name)

hyperdrive_run = experiment.submit(hyperdrive_config)
hyperdrive_run.get_status()

'Running'

## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [13]:
from azureml.widgets import RunDetails

run_details = RunDetails(run_instance=hyperdrive_run)
run_details.show()

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

## Best Model

TODO: In the cell below, get the best model from the hyperdrive experiments and display all the properties of the model.

The `run_id` was collected from the AzureML studio UI: 

In [14]:
from azureml.train.hyperdrive.run import HyperDriveRun

run_id = "HD_bd9e0812-fce4-4f37-9478-a457f46f8e1c"

remote_run = HyperDriveRun(
    experiment=Experiment(ws, experiment_name), 
    run_id=run_id
)

best_run = remote_run.get_best_run_by_primary_metric()

parameter_values = best_run.get_details()['runDefinition']['arguments']
best_parameters = dict(zip(parameter_values[::2], parameter_values[1::2]))
print(best_parameters)

{'--max_depth': '20', '--n_estimators': '500'}


### Fit best model and save artifacts

In [18]:
!python hyperdrive.py --n_estimators 500 --max_depth 20 --model_dir model_artifacts --model_name best_random_forest_model_hyperdrive

Attempted to log scalar metric Number of estimators::
500.0
Attempted to log scalar metric Maximum depth of tree::
20
rmse = 8.423969034347044
Attempted to log scalar metric rmse:
8.423969034347044
Saving model to as model_artifacts/best_random_forest_model_hyperdrive.pkl


In [19]:
!ls model_artifacts

autoML_RUL_prediction_model.pkl  best_random_forest_model_hyperdrive.pkl
