<i>Copyright (c) Microsoft Corporation. All rights reserved.</i>

<i>Licensed under the MIT License.</i>

# Testing different Hyperparameters and Benchmarking

In this notebook, we'll cover how to test different hyperparameters for a particular dataset and how to benchmark different parameters across a group of datasets using AzureML

Similar to [11_exploring_hyperparameters.ipynb](https://github.com/microsoft/ComputerVision/blob/master/classification/notebooks/11_exploring_hyperparameters.ipynb), we will learn more about how different learning rates and different image sizes affect our model's accuracy when restricted to 10 epochs, and we want to build an AzureML experiment to test out these hyperparameters. 

We will be using a ResNet50 model to classify a set of images into 4 categories - 'can', 'carton', 'milk_bottle', 'water_bottle'. We will then conduct hyper-parameter tuning to find the best set of parameters for this model. For this,
we present an overall process of utilizing AzureML, specifically [Hyperdrive](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.hyperdrive?view=azure-ml-py) component to run this tuning in parallel (and not successively).We demonstrate the following key steps:  
* Configure AzureML Workspace
* Create Remote Compute Target (GPU cluster)
* Prepare Data
* Prepare Training Script
* Setup and Run Hyperdrive Experiment
* Model Import, Re-train and Test

In [1]:
import os
import sys
sys.path.append("../../")

import fastai
from fastai.vision import *

import azureml.core
from azureml.core import Workspace, Experiment
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
import azureml.data
from azureml.train.hyperdrive import RandomParameterSampling, BanditPolicy, HyperDriveConfig, PrimaryMetricGoal, choice
from azureml.train.estimator import Estimator

import azureml.widgets as widgets

from utils_cv.classification.data import Urls
from utils_cv.common.data import unzip_url

print("SDK version:", azureml.core.VERSION)

SDK version: 1.0.48


Ensure edits to libraries are loaded and plotting is shown in the notebook.

In [2]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

### 1. Config AzureML workspace
Below we setup AzureML workspace and get all its details as follows:

In [3]:
# Azure resources
subscription_id = "YOUR_SUBSCRIPTION_ID"
resource_group = "YOUR_RESOURCE_GROUP_NAME"  
workspace_name = "YOUR_WORKSPACE_NAME"  
workspace_region = "YOUR_WORKSPACE_REGION" #Possible values eastus, eastus2 and so on.

max_total_runs=50


In [4]:
from utils_cv.common.azureml import get_or_create_workspace

ws = get_or_create_workspace(
        subscription_id,
        resource_group,
        workspace_name,
        workspace_region)

# Print the workspace attributes
print('Workspace name: ' + ws.name, 
      'Workspace region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep = '\n')

If you run your code in unattended mode, i.e., where you can't give a user input, then we recommend to use ServicePrincipalAuthentication or MsiAuthentication.
Please refer to aka.ms/aml-notebook-auth for different authentication mechanisms in azureml-sdk.


Workspace name: smoketestwsnew
Workspace region: eastus2
Subscription id: 0ca618d2-22a8-413a-96d0-0f1b531129c3
Resource group: smoketestnew11


### 2. Create Remote Target
We create a GPU cluster as our remote compute target. If a cluster with the same name already exists in our workspace, the script will load it instead. We can see [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets#compute-targets-for-training) to learn more about setting up a compute target on different locations.

This notebook selects STANDARD_NC6 virtual machine (VM) and sets its priority as 'lowpriority' to reduce costs.

In [5]:
# choose a name for our cluster
cluster_name = "gpu-cluster-nc6"
# Remote compute (cluster) configuration. If you want to reduce costs even more, set these to small.
# For example, using Standard_DS1_v2 instead of using STANDARD_NC6
VM_SIZE = 'STANDARD_NC6'
VM_PRIORITY = 'lowpriority'

# Cluster nodes
MIN_NODES = 0
MAX_NODES = 4

try:
    # Retrieve if a compute target with the same cluster_name already exists
    compute_target = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing compute target.')
except ComputeTargetException:
    # If it doesn't already exist, we create a new one with the name provided
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size=VM_SIZE,
                                                           min_nodes=MIN_NODES,
                                                           max_nodes=MAX_NODES)

    # create the cluster
    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)

    compute_target.wait_for_completion(show_output=True)

# we can use get_status() to get a detailed status for the current cluster. 
print(compute_target.get_status().serialize())

Found existing compute target.
{'currentNodeCount': 1, 'targetNodeCount': 0, 'nodeStateCounts': {'preparingNodeCount': 0, 'runningNodeCount': 0, 'idleNodeCount': 0, 'unusableNodeCount': 0, 'leavingNodeCount': 1, 'preemptedNodeCount': 0}, 'allocationState': 'Resizing', 'allocationStateTransitionTime': '2019-07-22T04:40:41.047000+00:00', 'errors': None, 'creationTime': '2019-07-22T02:26:37.808395+00:00', 'modifiedTime': '2019-07-22T02:26:53.969636+00:00', 'provisioningState': 'Succeeded', 'provisioningStateTransitionTime': None, 'scaleSettings': {'minNodeCount': 0, 'maxNodeCount': 4, 'nodeIdleTimeBeforeScaleDown': 'PT120S'}, 'vmPriority': 'Dedicated', 'vmSize': 'STANDARD_NC6'}


### 3. Prepare data
In this notebook, we'll use the Fridge Objects dataset, which is already stored in the correct format. We then upload our data to the AzureML workspace.


In [6]:
# Note, all the files under DATA will be uploaded to the data store
DATA = unzip_url(Urls.fridge_objects_path, exist_ok=True)
REPS = 3

# Retrieving default datastore that got automatically created when we setup a workspace
ds = ws.get_default_datastore()

# We now upload the data to the 'data' folder on the Azure portal
ds.upload(
    src_dir=os.path.dirname(DATA),
    target_path='data',
    overwrite=True, # with "overwrite=True", if this data already exists on the Azure blob storage, it will be overwritten
    show_progress=True
)

Uploading an estimated of 138 files
Uploading /Users/richinjain/projects/ComputerVision/data/cvbp_milk_bottle.jpg
Uploading /Users/richinjain/projects/ComputerVision/data/cvbp_water_bottle.jpg
Uploading /Users/richinjain/projects/ComputerVision/data/example.jpg
Uploading /Users/richinjain/projects/ComputerVision/data/fridgeObjects.zip
Uploading /Users/richinjain/projects/ComputerVision/data/fridgeObjects/can/1.jpg
Uploading /Users/richinjain/projects/ComputerVision/data/fridgeObjects/can/10.jpg
Uploading /Users/richinjain/projects/ComputerVision/data/fridgeObjects/can/11.jpg
Uploading /Users/richinjain/projects/ComputerVision/data/fridgeObjects/can/12.jpg
Uploading /Users/richinjain/projects/ComputerVision/data/fridgeObjects/can/13.jpg
Uploading /Users/richinjain/projects/ComputerVision/data/fridgeObjects/can/14.jpg
Uploading /Users/richinjain/projects/ComputerVision/data/fridgeObjects/can/15.jpg
Uploading /Users/richinjain/projects/ComputerVision/data/fridgeObjects/can/16.jpg
Uploadin

Uploaded /Users/richinjain/projects/ComputerVision/data/fridgeObjects/carton/56.jpg, 30 files out of an estimated total of 138
Uploading /Users/richinjain/projects/ComputerVision/data/fridgeObjects/carton/58.jpg
Uploaded /Users/richinjain/projects/ComputerVision/data/fridgeObjects/carton/43.jpg, 31 files out of an estimated total of 138
Uploading /Users/richinjain/projects/ComputerVision/data/fridgeObjects/carton/59.jpg
Uploaded /Users/richinjain/projects/ComputerVision/data/fridgeObjects/carton/54.jpg, 32 files out of an estimated total of 138
Uploading /Users/richinjain/projects/ComputerVision/data/fridgeObjects/carton/60.jpg
Uploaded /Users/richinjain/projects/ComputerVision/data/fridgeObjects/can/8.jpg, 33 files out of an estimated total of 138
Uploading /Users/richinjain/projects/ComputerVision/data/fridgeObjects/carton/61.jpg
Uploaded /Users/richinjain/projects/ComputerVision/data/fridgeObjects/carton/55.jpg, 34 files out of an estimated total of 138
Uploading /Users/richinjain/p

Uploaded /Users/richinjain/projects/ComputerVision/data/fridgeObjects/milk_bottle/84.jpg, 72 files out of an estimated total of 138
Uploaded /Users/richinjain/projects/ComputerVision/data/fridgeObjects/carton/35.jpg, 73 files out of an estimated total of 138
Uploading /Users/richinjain/projects/ComputerVision/data/fridgeObjects/milk_bottle/98.jpg
Uploading /Users/richinjain/projects/ComputerVision/data/fridgeObjects/milk_bottle/99.jpg
Uploaded /Users/richinjain/projects/ComputerVision/data/fridgeObjects/milk_bottle/71.jpg, 74 files out of an estimated total of 138
Uploading /Users/richinjain/projects/ComputerVision/data/fridgeObjects/water_bottle/102.jpg
Uploaded /Users/richinjain/projects/ComputerVision/data/fridgeObjects/milk_bottle/85.jpg, 75 files out of an estimated total of 138
Uploading /Users/richinjain/projects/ComputerVision/data/fridgeObjects/water_bottle/103.jpg
Uploaded /Users/richinjain/projects/ComputerVision/data/fridgeObjects/can/17.jpg, 76 files out of an estimated to

Uploaded /Users/richinjain/projects/ComputerVision/data/fridgeObjects/water_bottle/118.jpg, 111 files out of an estimated total of 138
Uploaded /Users/richinjain/projects/ComputerVision/data/fridgeObjects/water_bottle/127.jpg, 112 files out of an estimated total of 138
Uploaded /Users/richinjain/projects/ComputerVision/data/fridgeObjects/water_bottle/126.jpg, 113 files out of an estimated total of 138
Uploaded /Users/richinjain/projects/ComputerVision/data/fridgeObjects/carton/38.jpg, 114 files out of an estimated total of 138
Uploaded /Users/richinjain/projects/ComputerVision/data/fridgeObjects/water_bottle/133.jpg, 115 files out of an estimated total of 138
Uploaded /Users/richinjain/projects/ComputerVision/data/fridgeObjects/water_bottle/134.jpg, 116 files out of an estimated total of 138
Uploaded /Users/richinjain/projects/ComputerVision/data/fridgeObjects/water_bottle/124.jpg, 117 files out of an estimated total of 138
Uploaded /Users/richinjain/projects/ComputerVision/data/fridge

$AZUREML_DATAREFERENCE_f63fbd85fa17436fa173eb6034cd9eb5


Here's where you can see the data in your portal: 
<img src="media/datastore.jpg" width="800" alt="Datastore screenshot for Hyperdrive notebook run">

### 4. Prepare training script

Next step is to prepare scripts that AzureML Hyperdrive will use to train and evaluate models with selected hyperparameters.

In [7]:
# creating a folder for the training script here
script_folder = os.path.join(os.getcwd(), "hyperparameter")
os.makedirs(script_folder, exist_ok=True)

In [8]:
%%writefile $script_folder/train.py

import argparse
import numpy as np
import os
from sklearn.externals import joblib
import sys

import fastai
from fastai.vision import *
from fastai.vision.data import *

from azureml.core import Run

run = Run.get_context()

# Define parameters that we are going to use for training
ARCHITECTURE  = models.resnet50

# Parse arguments passed by Hyperdrive
parser = argparse.ArgumentParser()


# Data path
parser.add_argument('--data-folder', type=str, dest='DATA_DIR', help="Datastore path")
parser.add_argument('--im_size', type=int, dest='IM_SIZE')
parser.add_argument('--learning_rate', type=float, dest='LEARNING_RATE')

args = parser.parse_args()
params = vars(args)

if params['IM_SIZE'] is None:
     raise ValueError("Image Size empty")
        
if params['LEARNING_RATE'] is None:
    raise ValueError("Learning Rate empty")

if params['DATA_DIR'] is None:
    raise ValueError("Data folder empty")
    

path = params['DATA_DIR'] + '/data/fridgeObjects'

# Getting training and validation data and training the CNN as done in 01_training_introduction.ipynb
data = (ImageList.from_folder(path)
        .split_by_rand_pct(valid_pct=0.2, seed=10)
        .label_from_folder() 
        .transform(size=params['IM_SIZE']) 
        .databunch(bs=16) 
        .normalize(imagenet_stats))

learn = cnn_learner(
    data,
    ARCHITECTURE,
    metrics=[accuracy]
)

epochs=1 # Change the value to 10 to see multiple runs, defaulting to 1 for quick run of notebook.
learn.unfreeze()
learn.fit(epochs, params['LEARNING_RATE'])

training_losses = [x.numpy().ravel()[0] for x in learn.recorder.losses]
accuracy = [x[0].numpy().ravel()[0] for x in learn.recorder.metrics][-1]

#run.log_list('training_loss', training_losses)
#run.log_list('validation_loss', learn.recorder.val_losses)
#run.log_list('error_rate', error_rate)
run.log('data_dir',params['DATA_DIR'])
run.log('im_size', params['IM_SIZE'])
run.log('learning_rate', params['LEARNING_RATE'])
run.log('accuracy', float(accuracy))  # Logging our primary metric 'accuracy'

current_directory = os.getcwd()
output_folder = os.path.join(current_directory, 'outputs')
MODEL_NAME = 'im_classif_resnet50'  # Name we will give our model both locally and on Azure
PICKLED_MODEL_NAME = MODEL_NAME + '.pkl'
os.makedirs(output_folder, exist_ok=True)

learn.export(os.path.join(output_folder, PICKLED_MODEL_NAME))

Overwriting /Users/richinjain/projects/ComputerVision/classification/notebooks/hyperparameter/train.py


### 5. Setup and run Hyperdrive experiment

Next step is to prepare scripts that AzureML Hyperdrive will use to train and evaluate models with selected hyperparameters. To run the model notebook from the Hyperdrive Run, all we need is to prepare an entry script which parses the hyperparameter arguments, passes them to the notebook, and records the results of the notebook to AzureML Run logs. 

#### 5.1 Create Experiment  
Experiment is the main entry point into experimenting with AzureML. To create new Experiment or get the existing one, we pass our experimentation name 'hyperparameter-tuning'.


In [9]:
experiment_name = 'hyperparameter-tuning'
exp = Experiment(workspace=ws, name=experiment_name)

#### 5.2. Define search space

Now we define the search space of hyperparameters. For example, if you want to test different batch sizes of {64, 128, 256}, you can use azureml.train.hyperdrive.choice(64, 128, 256). To search from a continuous space, use uniform(start, end). For more options, see [Hyperdrive parameter expressions](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.hyperdrive.parameter_expressions?view=azure-ml-py).

In this notebook we use the ResNet50 architecture, and fix the number of epochs to 10.
In the search space, we set different learning rates and image sizes. Details about the hyperparameters can be found in [11_exploring_hyperparameters.ipynb notebook](https://github.com/microsoft/ComputerVision/blob/master/classification/notebooks/11_exploring_hyperparameters.ipynb).

Hyperdrive provides three different parameter sampling methods: 'RandomParameterSampling', 'GridParameterSampling', and 'BayesianParameterSampling'. Details about each method can be found [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters). Here, we use the 'RandomParameterSampling'.

In [10]:
IM_SIZES = [299, 499]
LEARNING_RATES = [1e-3, 1e-4, 1e-5]

# Hyperparameter search space
param_sampling = RandomParameterSampling( {
        '--learning_rate': choice(LEARNING_RATES),
        '--im_size': choice(IM_SIZES)
    }
)

primary_metric_name = 'accuracy'
primary_metric_goal = PrimaryMetricGoal.MAXIMIZE
max_concurrent_runs=4

early_termination_policy = BanditPolicy(slack_factor=0.15, evaluation_interval=1, delay_evaluation=20)

<b>AzureML Estimator</b> is the building block for training. An Estimator encapsulates the training code and parameters, the compute resources and runtime environment for a particular training scenario.
We create one for our experimentation with the dependencies our model requires as follows:

```python
pip_packages=['fastai']
conda_packages=['scikit-learn']
```

In [11]:
script_params = {
    '--data-folder': ds.as_mount()
}

est = Estimator(source_directory=script_folder,
                script_params=script_params,
                compute_target=compute_target,
                entry_script='train.py',
                pip_packages=['fastai'],
                conda_packages=['scikit-learn'])

We now create a HyperDriveConfig object which  includes information about parameter space sampling, termination policy, primary metric, estimator and the compute target to execute the experiment runs on. We feed the following parameters to it:

- our estimator object that we created in the above cell
- hyperparameter sampling method, in this case it is Random Parameter Sampling
- early termination policy, in this case we use [Bandit Policy](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters#bandit-policy)
- primary metric name reported by our runs, in this case it is accuracy 
- the goal, which determines whether the primary metric has to be maximized/minimized, in this case it is to maximize our accuracy 
- number of total child-runs, in this case it is 4

The bigger the search space, the more child-runs get triggered for better results.

In [12]:
hyperdrive_run_config = HyperDriveConfig(estimator=est,
                                         hyperparameter_sampling=param_sampling,
                                         policy=early_termination_policy,
                                         primary_metric_name=primary_metric_name,
                                         primary_metric_goal=primary_metric_goal,
                                         max_total_runs=max_total_runs,
                                         max_concurrent_runs= max_concurrent_runs)

#### 5.3 Run Experiment

In [13]:
# Now we submit the Run to our experiment. 
hyperdrive_run = exp.submit(config=hyperdrive_run_config)
# We can see the experiment progress from this notebook by using 
widgets.RunDetails(hyperdrive_run).show()

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

In [14]:
hyperdrive_run.wait_for_completion()


{'runId': 'hyperparameter-tuning_1563770544897',
 'target': 'gpu-cluster-nc6',
 'status': 'Completed',
 'startTimeUtc': '2019-07-22T04:42:25.393015Z',
 'endTimeUtc': '2019-07-22T04:49:58.250673Z',
 'properties': {'primary_metric_config': '{"name": "accuracy", "goal": "maximize"}',
  'runTemplate': 'HyperDrive',
  'azureml.runsource': 'hyperdrive',
  'platform': 'AML',
  'baggage': 'eyJvaWQiOiAiNmY1Yjc5M2UtZjhiOS00NGY0LTk0N2YtNTg3N2ZjMDFjZmFjIiwgInRpZCI6ICI3MmY5ODhiZi04NmYxLTQxYWYtOTFhYi0yZDdjZDAxMWRiNDciLCAidW5hbWUiOiAiMDRiMDc3OTUtOGRkYi00NjFhLWJiZWUtMDJmOWUxYmY3YjQ2In0',
  'ContentSnapshotId': 'a63feca7-742e-49c3-b568-9cf6a53b34c3'},
 'logFiles': {'azureml-logs/hyperdrive.txt': 'https://smoketesstorage0231aa20c.blob.core.windows.net/azureml/ExperimentRun/dcid.hyperparameter-tuning_1563770544897/azureml-logs/hyperdrive.txt?sv=2018-03-28&sr=b&sig=LL8Fx6UZhJ9jddaqS1xeR%2BHi98wUHPZ%2FYuAxGH3Y39I%3D&st=2019-07-22T04%3A39%3A59Z&se=2019-07-22T12%3A49%3A59Z&sp=r'}}

Or we can check from the Azure portal with the url link we get by running 
```python 
hyperdrive_run.get_portal_url().```

To load an existing Hyperdrive Run instead of start new one, we can use 
```python
hyperdrive_run = azureml.train.hyperdrive.HyperDriveRun(exp, <your-run-id>, hyperdrive_run_config=hyperdrive_run_config)
```
We also can cancel the Run with 
```python 
hyperdrive_run_config.cancel().
```

Once all the child-runs are finished, we can get the best run and the metrics.

In [15]:
# Get best run and print out metrics
best_run = hyperdrive_run.get_best_run_by_primary_metric()
best_run_metrics = best_run.get_metrics()
parameter_values = best_run.get_details()['runDefinition']['arguments']
best_parameters = dict(zip(parameter_values[::2], parameter_values[1::2]))

print(f"* Best Run Id:{best_run.id}")
print(best_run)
print("\n* Best hyperparameters:")
print(best_parameters)
print(f"Accuracy = {best_run_metrics['accuracy']}")
#print("Learning Rate =", best_run_metrics['learning_rate'])

* Best Run Id:hyperparameter-tuning_1563770544897_0
Run(Experiment: hyperparameter-tuning,
Id: hyperparameter-tuning_1563770544897_0,
Type: azureml.scriptrun,
Status: Completed)

* Best hyperparameters:
{'--data-folder': '$AZUREML_DATAREFERENCE_workspaceblobstore', '--im_size': '299', '--learning_rate': '0.001'}
Accuracy = 0.26923078298568726


### 6. Download and test the model

We can download the best model from the outputs/ folder and inspect it.

In [16]:
import joblib
current_directory = os.getcwd()
output_folder = os.path.join(current_directory, 'outputs')
os.makedirs(output_folder, exist_ok=True)

for f in best_run.get_file_names():
    if f.startswith('outputs/im_classif_resnet50'):
        print("Downloading {}..".format(f))
        best_run.download_file('outputs/im_classif_resnet50.pkl')
saved_model =joblib.load('im_classif_resnet50.pkl')
print(saved_model)

Downloading outputs/im_classif_resnet50.pkl..
119547037146038801333356


We can now use the retrieved best model to get predictions on unseen images as done in [03_training_accuracy_vs_speed.ipynb](https://github.com/microsoft/ComputerVision/blob/master/classification/notebooks/03_training_accuracy_vs_speed.ipynb) notebook using
```python
saved_model.predict(image)
```