<i>Copyright (c) Microsoft Corporation. All rights reserved.</i>

<i>Licensed under the MIT License.</i>

TODO (make changes also in image classification notebook to stay in sync):
- Test if image classification notebook also uploads all images in the data dir, and not just here the odFridgeObjects
- Training code hard-codes what foldername is. Replace with just a generic name, e.g. img_data.
- Try setting env via the "conda_dependencies_file" parameter [link](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.estimator.estimator?view=azure-ml-py)

TODO image classification only:
- Fix display of _STANDARD_DS6_ etc machines
- Change output dir from "outputs" to: output_folder = os.path.join(current_directory, 'hyperdrive_outputs')
- Rename DATA to DATA_PATH
- Rename script_folder to hyperdrive from hyperparameters
- Add "use_gpu=True"

# Testing different Hyperparameters and Benchmarking

In this notebook, we'll cover how to test different hyperparameters for a particular dataset and how to benchmark different parameters across a group of datasets using AzureML. We assume familiarity with the basic concepts and parameters, which are discussed in the [01_training_introduction.ipynb](01_training_introduction.ipynb), [02_mask_rcnn.ipynb](02_mask_rcnn.ipynb) and [03_training_accuracy_vs_speed.ipynb](03_training_accuracy_vs_speed.ipynb) notebooks. 

Similar to the image classification notebook [11_exploring_hyperparameters.ipynb](../../classification/notebooks/11_exploring_hyperparameters.ipynb), we will learn more about how different learning rates and different image sizes affect our model's accuracy when restricted to 16 epochs, and we want to build an AzureML experiment to test out these hyperparameters. 

We will be using a Faster R-CNN model with ResNet-50 backbone to find all objects in an image belonging to 4 categories: 'can', 'carton', 'milk_bottle', 'water_bottle'. We will then conduct hyper-parameter tuning to find the best set of parameters for this model. For this, we present an overall process of utilizing AzureML, specifically [Hyperdrive](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.hyperdrive?view=azure-ml-py) component to run this tuning in parallel (and not successively).We demonstrate the following key steps:  
* Configure AzureML Workspace
* Create Remote Compute Target (GPU cluster)
* Prepare Data
* Prepare Training Script
* Setup and Run Hyperdrive Experiment
* Model Import, Re-train and Test

This notebook is very similar to the [24_exploring_hyperparameters_on_azureml.ipynb](../../classification/notebooks/24_exploring_hyperparameters_on_azureml.ipynb) hyperdrive notebook used for image classification.

In [1]:
import os
import sys
from distutils.dir_util import copy_tree

import azureml.core
from azureml.core import Workspace, Experiment
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
import azureml.data
from azureml.train.estimator import Estimator
from azureml.train.hyperdrive import (
    RandomParameterSampling, BanditPolicy, HyperDriveConfig, PrimaryMetricGoal, choice, uniform
)
import azureml.widgets as widgets

sys.path.append("../../")
#from utils_cv.classification.data import Urls
#from utils_cv.common.data import unzip_url
from utils_cv.common.data import unzip_url #, data_path
from utils_cv.detection.data import Urls

Ensure edits to libraries are loaded and plotting is shown in the notebook.

In [2]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

We now define some parameters which will be used in this notebook:

In [3]:
# Azure resources
# subscription_id = "YOUR_SUBSCRIPTION_ID"
# resource_group = "YOUR_RESOURCE_GROUP_NAME"  
# workspace_name = "YOUR_WORKSPACE_NAME"  
# workspace_region = "YOUR_WORKSPACE_REGION" #Possible values eastus, eastus2, etc.
subscription_id = "2ad17db4-e26d-4c9e-999e-adae9182530c"  #Sharat 
resource_group = "pabuehle_deleteme_hyperdrive"  
workspace_name = "pabuehle_ws"  
workspace_region = "eastus" #Possible values eastus, eastus2, etc.

# Choose a size for our cluster and the maximum number of nodes
VM_SIZE = "STANDARD_NC6" #"STANDARD_NC6", STANDARD_NC6S_V3"
MAX_NODES = 2 #12

# Hyperparameter search space
IM_SIZES = [50] #[150, 300]
LEARNING_RATE_MAX = 1e-3
LEARNING_RATE_MIN = 1e-5
MAX_TOTAL_RUNS = 1 #Set to higher value to test more parameter combinations

# Image data
#DATA = unzip_url(Urls.fridge_objects_path, exist_ok=True)
DATA_PATH = "C:/Users/pabuehle/Desktop/ComputerVision/data/odFridgeObjectsTiny"

### 1. Config AzureML workspace
Below we setup (or load an existing) AzureML workspace, and get all its details as follows. Note that the resource group and workspace will get created if they do not yet exist. For more information regaring the AzureML workspace see also the [20_azure_workspace_setup.ipynb](../../classification/notebooks/20_azure_workspace_setup.ipynb) notebook in the image classification folder.

To simplify clean-up (see end of this notebook), we recommend creating a new resource group to run this notebook.

In [4]:
from utils_cv.common.azureml import get_or_create_workspace

ws = get_or_create_workspace(
        subscription_id,
        resource_group,
        workspace_name,
        workspace_region)

# Print the workspace attributes
print('Workspace name: ' + ws.name, 
      'Workspace region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep = '\n')

Workspace name: pabuehle_ws
Workspace region: eastus
Subscription id: 2ad17db4-e26d-4c9e-999e-adae9182530c
Resource group: pabuehle_deleteme_hyperdrive


### 2. Create Remote Target
We create a GPU cluster as our remote compute target. If a cluster with the same name already exists in our workspace, the script will load it instead. This [link](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets#compute-targets-for-training) provides more information about how to set up a compute target on different locations.

By default, the VM size is set to use STANDARD\_NC6 machines. However, if quota is available, our recommendation is to use STANDARD\_NC6S\_V3 machines which come with the much faster V100 GPU.

In [5]:
CLUSTER_NAME = "gpu-cluster"

try:
    # Retrieve if a compute target with the same cluster name already exists
    compute_target = ComputeTarget(workspace=ws, name=CLUSTER_NAME)
    print('Found existing compute target.')
    
except ComputeTargetException:
    # If it doesn't already exist, we create a new one with the name provided
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size=VM_SIZE,
                                                           min_nodes=0,
                                                           max_nodes=MAX_NODES)

    # create the cluster
    compute_target = ComputeTarget.create(ws, CLUSTER_NAME, compute_config)
    compute_target.wait_for_completion(show_output=True)

# we can use get_status() to get a detailed status for the current cluster. 
print(compute_target.get_status().serialize())

Found existing compute target.
{'currentNodeCount': 0, 'targetNodeCount': 0, 'nodeStateCounts': {'preparingNodeCount': 0, 'runningNodeCount': 0, 'idleNodeCount': 0, 'unusableNodeCount': 0, 'leavingNodeCount': 0, 'preemptedNodeCount': 0}, 'allocationState': 'Steady', 'allocationStateTransitionTime': '2019-08-29T19:47:46.981000+00:00', 'errors': None, 'creationTime': '2019-08-29T19:47:17.250148+00:00', 'modifiedTime': '2019-08-29T19:47:53.458252+00:00', 'provisioningState': 'Succeeded', 'provisioningStateTransitionTime': None, 'scaleSettings': {'minNodeCount': 0, 'maxNodeCount': 2, 'nodeIdleTimeBeforeScaleDown': 'PT120S'}, 'vmPriority': 'Dedicated', 'vmSize': 'STANDARD_NC6'}


### 3. Prepare data
In this notebook, we'll use the Fridge Objects dataset, which is already stored in the correct format. We then upload our data to the AzureML workspace.


In [6]:
# Retrieving default datastore that got automatically created when we setup a workspace
ds = ws.get_default_datastore()

# We now upload the data to the 'data' folder on the Azure portal
ds.upload(
    src_dir=os.path.dirname(DATA_PATH),
    target_path='data',
    overwrite=True, # with "overwrite=True", if this data already exists on the Azure blob storage, it will be overwritten
    show_progress=True
)

Uploading an estimated of 479 files
Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\example.jpg
Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\fridgeObjectsTiny\can\1.jpg
Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\fridgeObjectsTiny\can\10.jpg
Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\fridgeObjectsTiny\can\12.jpg
Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\fridgeObjectsTiny\can\13.jpg
Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\fridgeObjectsTiny\can\14.jpg
Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\fridgeObjectsTiny\can\15.jpg
Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\fridgeObjectsTiny\can\18.jpg
Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\fridgeObjectsTiny\can\19.jpg
Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\fridgeObjectsTiny\can\20.jpg
Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\fridgeObjectsTiny\can\23.jpg
Uploading C:/Users/pabuehle/Desktop/ComputerVis

Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\fridgeObjects\can\14.jpg
Uploaded C:/Users/pabuehle/Desktop/ComputerVision/data\fridgeObjectsTiny\can\6.jpg, 43 files out of an estimated total of 479
Uploaded C:/Users/pabuehle/Desktop/ComputerVision/data\fridgeObjectsTiny\can\1.jpg, 44 files out of an estimated total of 479
Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\fridgeObjects\can\15.jpg
Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\fridgeObjects\can\16.jpg
Uploaded C:/Users/pabuehle/Desktop/ComputerVision/data\fridgeObjectsTiny\can\7.jpg, 45 files out of an estimated total of 479
Uploaded C:/Users/pabuehle/Desktop/ComputerVision/data\fridgeObjectsTiny\water_bottle\106.jpg, 46 files out of an estimated total of 479
Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\fridgeObjects\can\17.jpg
Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\fridgeObjects\can\18.jpg
Uploaded C:/Users/pabuehle/Desktop/ComputerVision/data\fridgeObjectsTiny\water_b

Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\fridgeObjects\milk_bottle\72.jpg
Uploaded C:/Users/pabuehle/Desktop/ComputerVision/data\fridgeObjects\carton\45.jpg, 113 files out of an estimated total of 479
Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\fridgeObjects\milk_bottle\73.jpg
Uploaded C:/Users/pabuehle/Desktop/ComputerVision/data\fridgeObjects\milk_bottle\67.jpg, 114 files out of an estimated total of 479
Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\fridgeObjects\milk_bottle\74.jpg
Uploaded C:/Users/pabuehle/Desktop/ComputerVision/data\fridgeObjects\carton\52.jpg, 115 files out of an estimated total of 479
Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\fridgeObjects\milk_bottle\75.jpg
Uploaded C:/Users/pabuehle/Desktop/ComputerVision/data\fridgeObjects\carton\55.jpg, 116 files out of an estimated total of 479
Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\fridgeObjects\milk_bottle\76.jpg
Uploaded C:/Users/pabuehle/Desktop/Compute

Uploaded C:/Users/pabuehle/Desktop/ComputerVision/data\fridgeObjects\water_bottle\107.jpg, 173 files out of an estimated total of 479
Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjectsTiny\annotations\35.xml
Uploaded C:/Users/pabuehle/Desktop/ComputerVision/data\fridgeObjects\water_bottle\110.jpg, 174 files out of an estimated total of 479
Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjectsTiny\annotations\40.xml
Uploaded C:/Users/pabuehle/Desktop/ComputerVision/data\fridgeObjects\water_bottle\108.jpg, 175 files out of an estimated total of 479
Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjectsTiny\annotations\45.xml
Uploaded C:/Users/pabuehle/Desktop/ComputerVision/data\fridgeObjects\water_bottle\111.jpg, 176 files out of an estimated total of 479
Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjectsTiny\annotations\50.xml
Uploaded C:/Users/pabuehle/Desktop/ComputerVision/data\fridgeObjects\water_bottle\10

Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjects\annotations\48.xml
Uploaded C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjects\annotations\33.xml, 256 files out of an estimated total of 479
Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjects\annotations\49.xml
Uploaded C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjects\annotations\17.xml, 257 files out of an estimated total of 479
Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjects\annotations\5.xml
Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjects\annotations\50.xml
Uploaded C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjects\annotations\13.xml, 258 files out of an estimated total of 479
Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjects\annotations\51.xml
Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjects\annotations\52.xml
Uploaded C:/Users/pabuehle/Desktop/ComputerVision/dat

Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjects\annotations\80.xml
Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjects\annotations\81.xml
Uploaded C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjects\annotations\54.xml, 293 files out of an estimated total of 479
Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjects\annotations\82.xml
Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjects\annotations\83.xml
Uploaded C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjects\annotations\61.xml, 294 files out of an estimated total of 479
Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjects\annotations\84.xml
Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjects\annotations\85.xml
Uploaded C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjects\annotations\43.xml, 295 files out of an estimated total of 479
Uploaded C:/Users/pabuehle/Desktop/ComputerVision/da

Uploaded C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjects\annotations\86.xml, 334 files out of an estimated total of 479
Uploaded C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjects\annotations\97.xml, 335 files out of an estimated total of 479
Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjects\images\120.jpg
Uploaded C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjects\annotations\98.xml, 336 files out of an estimated total of 479
Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjects\images\121.jpg
Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjects\images\122.jpg
Uploaded C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjects\annotations\92.xml, 337 files out of an estimated total of 479
Uploaded C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjects\images\10.jpg, 338 files out of an estimated total of 479
Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjects\im

Uploaded C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjects\images\32.jpg, 395 files out of an estimated total of 479
Uploaded C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjects\images\34.jpg, 396 files out of an estimated total of 479
Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjects\images\60.jpg
Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjects\images\61.jpg
Uploaded C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjects\images\39.jpg, 397 files out of an estimated total of 479
Uploaded C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjects\images\35.jpg, 398 files out of an estimated total of 479
Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjects\images\62.jpg
Uploaded C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjects\images\53.jpg, 399 files out of an estimated total of 479
Uploading C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjects\images\63.jpg
Uploading C

Uploaded C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjects\images\88.jpg, 455 files out of an estimated total of 479
Uploaded C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjects\images\92.jpg, 456 files out of an estimated total of 479
Uploaded C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjects\images\94.jpg, 457 files out of an estimated total of 479
Uploaded C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjects\images\89.jpg, 458 files out of an estimated total of 479
Uploaded C:/Users/pabuehle/Desktop/ComputerVision/data\odFridgeObjects\images\90.jpg, 459 files out of an estimated total of 479
Uploaded C:/Users/pabuehle/Desktop/ComputerVision/data\tiny\can\7.jpg, 460 files out of an estimated total of 479
Uploaded C:/Users/pabuehle/Desktop/ComputerVision/data\tiny\can\5.jpg, 461 files out of an estimated total of 479
Uploaded C:/Users/pabuehle/Desktop/ComputerVision/data\tiny\carton\35.jpg, 462 files out of an estimated total of 479
Uploaded 

$AZUREML_DATAREFERENCE_e3cb1a684267425ba976c73cafeec314


Here's where you can see the data in your portal: 
<img src="media/datastore.jpg" width="800" alt="Datastore screenshot for Hyperdrive notebook run">

### 4. Prepare training script

Next step is to prepare scripts that AzureML Hyperdrive will use to train and evaluate models with selected hyperparameters.

In [7]:
# Create a folder for the training script and the utils_cv library
script_folder = os.path.join(os.getcwd(), "hyperdrive")
os.makedirs(script_folder, exist_ok=True)

In [8]:
# Copy utils_cv library to script folder
_ = copy_tree(os.path.join('..', '..', 'utils_cv'), os.path.join(script_folder, 'utils_cv'))

In [35]:
%%writefile $script_folder/train.py

import argparse
import numpy as np
import os
#from sklearn.externals import joblib
import sys

import matplotlib 
matplotlib.use("Agg") 
import matplotlib.pyplot as plt

print(sys.version)
import torch
print(f"Loaded torch version: {torch.__version__}")
print("CUDA available: " + str(torch.cuda.is_available()))
import fastai
print(f"Loaded fastai version: {fastai.__version__}")
import torchvision
print(f"Loaded torchvision version: {torchvision.__version__}")


#sys.path.append("../../")
from utils_cv.detection.dataset import DetectionDataset
from utils_cv.detection.model import DetectionLearner 
from utils_cv.common.gpu import which_processor

from azureml.core import Run

run = Run.get_context()

#import cudatoolkit
#print(f"Loaded cudatoolkit version: {cudatoolkit.__version__}")


#------------------------------------------------------------------
# Define parameters that we are going to use for training
EPOCHS = 1 #16
BATCH_SIZE = 10
#------------------------------------------------------------------


# Parse arguments passed by Hyperdrive
if True:
    parser = argparse.ArgumentParser()

    # Data path
    parser.add_argument('--data-folder', type=str, dest='DATA_DIR', help="Datastore path")
    parser.add_argument('--im_size', type=int, dest='IM_SIZE')
    parser.add_argument('--learning_rate', type=float, dest='LEARNING_RATE')
    args = parser.parse_args()
    params = vars(args)

else:
    params = {}
    params['DATA_DIR'] = "C:/Users/pabuehle/Desktop/ComputerVision/"
    params['IM_SIZE'] = -1
    params['LEARNING_RATE'] = 0.005
    
# Check if required parameters are specified
if params['IM_SIZE'] is None:
     raise ValueError("Image size not specified.")
if params['LEARNING_RATE'] is None:
    raise ValueError("Learning rate not specified.")
if params['DATA_DIR'] is None:
    raise ValueError("Data folder not specified.")


which_processor()

# Getting training and validation data
path = params['DATA_DIR'] + '/data/odFridgeObjects' #Tiny'
data = DetectionDataset(path, train_pct=0.25, batch_size = BATCH_SIZE)
print(
    f"Training dataset: {len(data.train_ds)} | Training DataLoader: {data.train_dl} \nTesting dataset: {len(data.test_ds)} | Testing DataLoader: {data.test_dl}"
)
# data = (ImageList.from_folder(path)
#         .split_by_rand_pct(valid_pct=0.5, seed=10)
#         .label_from_folder() 
#         .transform(size=params['IM_SIZE']) 
#         .databunch(bs=BATCH_SIZE) 
#         .normalize(imagenet_stats))

# Get model and run training
print("Initializing DetectionLearner")
detector = DetectionLearner(data)

print("Running detector.fit()")
detector.fit(EPOCHS, lr=params['LEARNING_RATE'], print_freq=30)

print("Printing aps:")
print(detector.ap)

# learn = cnn_learner(
#     data,
#     ARCHITECTURE,
#     metrics=[accuracy]
# )
# learn.fit_one_cycle(EPOCHS_HEAD, params['LEARNING_RATE'])
# learn.unfreeze()
# learn.fit_one_cycle(EPOCHS_BODY, params['LEARNING_RATE'])

# Add log entries
print("Adding log entries")
#training_losses = detector.losses # [x.numpy().ravel()[0] for x in learn.recorder.losses]
accuracy = detector.ap[-1] #[100*x[0].numpy().ravel()[0] for x in learn.recorder.metrics][-1]

print(f"accuracy={accuracy}")
run.log('data_dir',params['DATA_DIR'])
run.log('im_size', params['IM_SIZE'])
run.log('learning_rate', params['LEARNING_RATE'])
run.log('accuracy', float(accuracy))  # Logging our primary metric 'accuracy'

# # Save trained model
# print("Saving model")
# current_directory = os.getcwd()
# output_folder = os.path.join(current_directory, 'hyperdrive_outputs')
# model_name = 'im_classif_resnet'  # Name we will give our model both locally and on Azure
# os.makedirs(output_folder, exist_ok=True)
#learn.export(os.path.join(output_folder, model_name + ".pkl"))

Overwriting C:\Users\pabuehle\Desktop\ComputerVision\detection\notebooks\hyperdrive/train.py


### 5. Setup and run Hyperdrive experiment

#### 5.1 Create Experiment  
Experiment is the main entry point into experimenting with AzureML. To create new Experiment or get the existing one, we pass our experimentation name 'hyperparameter-tuning'.


In [36]:
experiment_name = 'hyperparameter-tuning'
exp = Experiment(workspace=ws, name=experiment_name)

#### 5.2. Define search space

Now we define the search space of hyperparameters. As shown below, to test discrete parameter values use 'choice()', and for uniform sampling use 'uniform()'. For more options, see [Hyperdrive parameter expressions](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.hyperdrive.parameter_expressions?view=azure-ml-py).

Hyperdrive provides three different parameter sampling methods: 'RandomParameterSampling', 'GridParameterSampling', and 'BayesianParameterSampling'. Details about each method can be found [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters). Here, we use the 'RandomParameterSampling'.

In [37]:
# Hyperparameter search space
param_sampling = RandomParameterSampling( {
        '--learning_rate': uniform(LEARNING_RATE_MIN, LEARNING_RATE_MAX),
        '--im_size': choice(IM_SIZES)
    }
)

early_termination_policy = BanditPolicy(slack_factor=0.15, evaluation_interval=1, delay_evaluation=20)

<b>AzureML Estimator</b> is the building block for training. An Estimator encapsulates the training code and parameters, the compute resources and runtime environment for a particular training scenario.
We create one for our experimentation with the dependencies our model requires as follows:

```python
pip_packages=['fastai']
conda_packages=['scikit-learn']
```

In [38]:
script_params = {
    '--data-folder': ds.as_mount()
}

est = Estimator(source_directory=script_folder,
                script_params=script_params,
                compute_target=compute_target,
                entry_script='train.py',
                use_gpu=True,
                #pip_packages=['fastai==1.0.48'],
                #conda_packages=['scikit-learn', 'pycocotools>=2.0','torchvision==0.3','cudatoolkit==9.0'])
                pip_packages=['nvidia-ml-py3','fastai'],
                conda_packages=['scikit-learn', 'pycocotools>=2.0','torchvision==0.3','cudatoolkit==9.0'])

We now create a HyperDriveConfig object which  includes information about parameter space sampling, termination policy, primary metric, estimator and the compute target to execute the experiment runs on. We feed the following parameters to it:

- our estimator object that we created in the above cell
- hyperparameter sampling method, in this case it is Random Parameter Sampling
- early termination policy, in this case we use [Bandit Policy](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters#bandit-policy)
- primary metric name reported by our runs, in this case it is accuracy 
- the goal, which determines whether the primary metric has to be maximized/minimized, in this case it is to maximize our accuracy 
- number of total child-runs

The bigger the search space, the more child-runs get triggered for better results.

In [39]:
hyperdrive_run_config = HyperDriveConfig(estimator=est,
                                         hyperparameter_sampling=param_sampling,
                                         policy=early_termination_policy,
                                         primary_metric_name='accuracy',
                                         primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
                                         max_total_runs=MAX_TOTAL_RUNS,
                                         max_concurrent_runs=MAX_NODES)

#### 5.3 Run Experiment

In [40]:
# Now we submit the Run to our experiment. 
hyperdrive_run = exp.submit(config=hyperdrive_run_config)

# We can see the experiment progress from this notebook by using 
widgets.RunDetails(hyperdrive_run).show()

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

In [None]:
hyperdrive_run.wait_for_completion()

Or we can check from the Azure portal with the url link we get by running 
```python 
hyperdrive_run.get_portal_url().```

To load an existing Hyperdrive Run instead of start new one, we can use 
```python
hyperdrive_run = azureml.train.hyperdrive.HyperDriveRun(exp, <your-run-id>, hyperdrive_run_config=hyperdrive_run_config)
```
We also can cancel the Run with 
```python 
hyperdrive_run_config.cancel().
```

Once all the child-runs are finished, we can get the best run and the metrics.

In [None]:
# Get best run and print out metrics
best_run = hyperdrive_run.get_best_run_by_primary_metric()
best_run_metrics = best_run.get_metrics()
parameter_values = best_run.get_details()['runDefinition']['arguments']
best_parameters = dict(zip(parameter_values[::2], parameter_values[1::2]))

print(f"* Best Run Id:{best_run.id}")
print(best_run)
print("\n* Best hyperparameters:")
print(best_parameters)
print(f"Accuracy = {best_run_metrics['accuracy']}")
#print("Learning Rate =", best_run_metrics['learning_rate'])

### 6. Download and test the model

We can download the best model from the outputs/ folder and inspect it.

In [None]:
import joblib
current_directory = os.getcwd()
output_folder = os.path.join(current_directory, 'outputs')
os.makedirs(output_folder, exist_ok=True)

for f in best_run.get_file_names():
    if f.startswith('outputs/im_classif_resnet'):
        print("Downloading {}..".format(f))
        best_run.download_file('outputs/im_classif_resnet.pkl')
saved_model =joblib.load('im_classif_resnet.pkl')

We can now use the retrieved best model to get predictions on unseen images as done in [03_training_accuracy_vs_speed.ipynb](https://github.com/microsoft/ComputerVision/blob/master/classification/notebooks/03_training_accuracy_vs_speed.ipynb) notebook using
```python
saved_model.predict(image)
```

### 7. Clean up

To avoid unnecessary expenses, all resources which were created in this notebook need to get deleted once parameter search is concluded. To simplify this clean-up step, we recommend creating a new resource group to run this notebook. This resource group can then be deleted, e.g. using the Azure Portal, which will remove all created resources.