# Hyperparameter Tuning of QRNN Models with AML SDK and HyperDrive

This notebook performs hyperparameter tuning of QRNN models with AML SDK and HyperDrive. It selects the best model by cross validation using the training data in the 6 forecast round. Specifically, it splits the training data into sub-training data and validation data. Then, it trains QRNN models with different sets of hyperparameters using the sub-training data and evaluate the pinball loss of each model with the validation data. The set of hyperparameters which yield the best cross validation pinball loss will be used to train models and forecast energy load across all 6 forecast rounds.

## Prerequisites
To run this notebook, you need to install AML SDK and its widget extension in your environment by running the following commands in a terminal. Before running the commands, you need to activate your environment by executing `activate <your env>` or `source activate <your env>` in a Linux VM.   
`pip3 install --upgrade azureml-sdk[notebooks,automl]`  
`jupyter nbextension install --py --user azureml.train.widgets`  
`jupyter nbextension enable --py --user azureml.train.widgets`  

To add the environment to your Jupyter kernels, you can do python3 -m ipykernel install --name <your env>. Besides, you need to create an Azure ML workspace and its configuration file (config.json) by following the [00.configuration.ipynb](https://github.com/Azure/MachineLearningNotebook/blob/master/configuration.ipynb) notebook.

In [1]:
import azureml
from azureml.core import Workspace, Run

# Check core SDK version number
print("Azure ML SDK Version: ", azureml.core.VERSION)

Azure ML SDK Version:  1.0.8


In [2]:
from azureml.telemetry import set_diagnostics_collection

# Opt-in diagnostics for better experience of future releases
set_diagnostics_collection(send_diagnostics=True)

Turning diagnostics collection on. 


## Initialize Workspace & Create an Azure ML Experiment

Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` below creates a workspace object from the details stored in `config.json`.

In [3]:
from azureml.core.workspace import Workspace

ws = Workspace.from_config()
print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Resource group: ' + ws.resource_group, sep = '\n')

Found the config file in: /data/home/tsperfadmin/Projects/zhouf/energy_forecast_fnn_model_v1/TSPerf/energy_load/GEFCom2017_D_Prob_MT_hourly/submissions/fnn/config.json
Workspace name: tsperfwszhouf
Azure region: eastus
Resource group: tsperf03


In [6]:
from azureml.core import Experiment

exp = Experiment(workspace=ws, name='tune_qrnn')

## Run Script on AML Compute Target

### Create AML Compute as compute target

In [7]:
from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget

# choose a name for your cluster
compute_name =  "cpucompute"
compute_min_nodes = 0
compute_max_nodes = 16

vm_size = "STANDARD_D3_V2"


if compute_name in ws.compute_targets:
    compute_target = ws.compute_targets[compute_name]
    if compute_target and type(compute_target) is AmlCompute:
        print('found compute target. just use it. ' + compute_name)
else:
    print('creating a new compute target...')
    provisioning_config = AmlCompute.provisioning_configuration(vm_size = vm_size,
                                                                min_nodes = compute_min_nodes, 
                                                                max_nodes = compute_max_nodes)

    # create the cluster
    compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)

    # can poll for a minimum number of nodes and for a specific timeout. 
    # if no min node count is provided it will use the scale settings for the cluster
    compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)

     # For a more detailed view of current AmlCompute status, use the 'status' property    
    print(compute_target.status.serialize())

found compute target. just use it. cpucompute


### Configure Docker environment

In [9]:
from azureml.core.runconfig import EnvironmentDefinition
from azureml.core.conda_dependencies import CondaDependencies

env = EnvironmentDefinition()

env.python.user_managed_dependencies = False
env.python.conda_dependencies = CondaDependencies.create(conda_packages=['pandas', 'r-base', 'r-data.table', 'r-rjson', 'r-optparse', 'r-doparallel'],
                                                         pip_packages=['azure-cli-core<2.0.55'], 
                                                         python_version='3.6.2')
env.python.conda_dependencies.add_channel('conda-forge')
env.docker.enabled=True

### Upload data to default datastore

Upload the 6 round train data of Energy dataset to the workspace's default datastore, which will later be mounted on a AML Compute target for training. 

In [10]:
ds = ws.get_default_datastore()
print(ds.datastore_type, ds.account_name, ds.container_name)

AzureBlob tsperfwszhouf2444925548 azureml-blobstore-43291b4b-78d3-4c1a-94db-cdd97cf840c4


In [11]:
path_on_datastore = 'data'
ds.upload(src_dir='./data/features/train', target_path=path_on_datastore, overwrite=True, show_progress=True)

Uploading ./data/features/train/train_round_1.csv
Uploading ./data/features/train/train_round_2.csv
Uploading ./data/features/train/train_round_3.csv
Uploading ./data/features/train/train_round_4.csv
Uploading ./data/features/train/train_round_5.csv
Uploading ./data/features/train/train_round_6.csv
Uploaded ./data/features/train/train_round_1.csv, 1 files out of an estimated total of 6
Uploaded ./data/features/train/train_round_3.csv, 2 files out of an estimated total of 6
Uploaded ./data/features/train/train_round_6.csv, 3 files out of an estimated total of 6
Uploaded ./data/features/train/train_round_2.csv, 4 files out of an estimated total of 6
Uploaded ./data/features/train/train_round_4.csv, 5 files out of an estimated total of 6
Uploaded ./data/features/train/train_round_5.csv, 6 files out of an estimated total of 6


$AZUREML_DATAREFERENCE_229fc20c559646f08551875da8fb29f6

In [12]:
# Get data reference object for the data path
ds_data = ds.path(path_on_datastore)
print(ds_data)

$AZUREML_DATAREFERENCE_5b3ca2f04f09428d976a5c17de811623


### Create estimator

In [15]:
from azureml.core.runconfig import EnvironmentDefinition
from azureml.train.estimator import Estimator

script_folder = './'

script_params = {
    '--path': ds_data.as_mount(),
    '--cv_path': './',
    '--n_hidden_1': 5, 
    '--n_hidden_2': 5,
    '--iter_max': 3,
    '--penalty': 0
}

est = Estimator(source_directory=script_folder,
                script_params=script_params,
                compute_target=compute_target,
                use_docker=True,
                entry_script='aml_estimator.py',
                environment_definition=env)
# The above estimator defined using environment_definition has problem.

In [38]:
from azureml.train.estimator import Estimator

script_folder = './'

script_params = {
    '--path': ds_data.as_mount(),
    '--cv_path': './',
    '--n_hidden_1': 5, 
    '--n_hidden_2': 5,
    '--iter_max': 3,
    '--penalty': 0
}

est = Estimator(source_directory=script_folder,
                script_params=script_params,
                compute_target=compute_target,
                use_docker=True,
                entry_script='aml_estimator.py',
                conda_packages=['pandas', 'r-base', 'r-data.table', 'r-rjson', 'r-doparallel'],
                pip_packages=['azure-cli-core<2.0.55'])

### Submit job

In [39]:
# Submit job to Batch AI cluster
run_batchai = exp.submit(config=est)

### Check job status

In [1]:
# run_batchai.get_details()

### Load job and get metrics

In [14]:
from azureml.core import Run
run_batchai = Run(exp, "tune_qrnn_1547986964549")

In [15]:
run_batchai.get_metrics()

{'average pinball loss': 82.9833541445754}

## Tune Hyperparameters using HyperDrive

To tune hyperparameters using HyperDrive, we can use the compute target, docker environment, datastore and estimator that has been defined and created as above and additionally specify a parameter sampling technique, and then let the program run across the parameter sets.

In [16]:
from azureml.core.runconfig import EnvironmentDefinition
from azureml.train.estimator import Estimator
from azureml.train.hyperdrive import *

script_folder = './'

script_params = {
    '--path': ds_data.as_mount(),
    '--cv_path': './',
    '--n_hidden_1': 5, 
    '--n_hidden_2': 5,
    '--iter_max': 3,
    '--penalty': 0
}


est = Estimator(source_directory=script_folder,
                script_params=script_params,
                compute_target=compute_target,
                use_docker=True,
                entry_script='aml_estimator.py',
                environment_definition=env)

ps = GridParameterSampling({
    '--n_hidden_1': choice(4, 8), 
    '--n_hidden_2': choice(4, 8),
    '--iter_max': choice(1, 2, 4, 6, 8, 10),
    '--penalty': choice(0, 0.001),
})

htc = HyperDriveRunConfig(estimator=est, 
                          hyperparameter_sampling=ps, 
                          primary_metric_name='average pinball loss', 
                          primary_metric_goal=PrimaryMetricGoal.MINIMIZE, 
                          max_total_runs=48,
                          max_concurrent_runs=16)
# The above estimator has problem

In [17]:
from azureml.train.estimator import Estimator
from azureml.train.hyperdrive import *

script_folder = './'

script_params = {
    '--path': ds_data.as_mount(),
    '--cv_path': './',
    '--n_hidden_1': 5, 
    '--n_hidden_2': 5,
    '--iter_max': 3,
    '--penalty': 0
}


est = Estimator(source_directory=script_folder,
                script_params=script_params,
                compute_target=compute_target,
                use_docker=True,
                entry_script='aml_estimator.py',
                conda_packages=['pandas', 'r-base', 'r-data.table', 'r-rjson', 'r-doparallel'],
                pip_packages=['azure-cli-core<2.0.55'])

ps = GridParameterSampling({
    '--n_hidden_1': choice(4, 8), 
    '--n_hidden_2': choice(2, 4, 8),
    '--iter_max': choice(1, 2, 4, 6, 8),
    '--penalty': choice(0, 0.001),
})

htc = HyperDriveRunConfig(estimator=est, 
                          hyperparameter_sampling=ps, 
                          primary_metric_name='average pinball loss', 
                          primary_metric_goal=PrimaryMetricGoal.MINIMIZE, 
                          max_total_runs=60,
                          max_concurrent_runs=16)

### Submit job

In [18]:
htr = exp.submit(config=htc)

The same input parameter(s) are specified in estimator script params and HyperDrive parameter space. HyperDrive parameter space definition will override duplicate entries in estimator. ['--n_hidden_1', '--n_hidden_2', '--iter_max', '--penalty'] is the list of overridden parameter(s).


In [20]:
# RunDetails(htr).show()

### Check job status

In [25]:
htr.get_details()

{'runId': 'tune_qrnn_1548046929030',
 'target': 'cpucompute',
 'status': 'Completed',
 'endTimeUtc': '2019-01-21T19:28:09.000Z',
 'properties': {'primary_metric_config': '{"name": "average pinball loss", "goal": "minimize"}',
  'runTemplate': 'HyperDrive',
  'azureml.runsource': 'hyperdrive'},
 'logFiles': {'azureml-logs/hyperdrive.txt': 'https://tsperfwszhouf2444925548.blob.core.windows.net/azureml/ExperimentRun/dcid.tune_qrnn_1548046929030/azureml-logs/hyperdrive.txt?sv=2018-03-28&sr=b&sig=eCKR04sWQJv1EmDl%2BhpO0PwZBEVAhjV15lo2RoRt4Pk%3D&st=2019-01-22T03%3A22%3A01Z&se=2019-01-22T11%3A32%3A01Z&sp=r'}}

### Get best run

In [None]:
best_run = htr.get_best_run_by_primary_metric()
best_parameter_values = best_run.get_details()['runDefinition']['Arguments']
print(best_parameter_values)

In [None]:
best_run.get_metrics()

### Load job and get metrics

In [26]:
from azureml.core import Run
htr = Run(exp, "tune_qrnn_1548046929030")

In [31]:
import pandas as pd

results = htr.get_children()

results_dict = {'pinball_loss': [], 'n_hidden_1': [], 'n_hidden_2': [], 'iter_max': [], 'penalty': []} 
for child_run in results:
    if child_run.get_status() == "Completed":
        arguments = child_run.get_details()['runDefinition']['Arguments']
        results_dict['pinball_loss'].append(child_run.get_metrics()['average pinball loss'])
        results_dict['n_hidden_1'].append(int(arguments[5]))
        results_dict['n_hidden_2'].append(int(arguments[7]))
        results_dict['iter_max'].append(int(arguments[9]))
        results_dict['penalty'].append(float(arguments[11]))

results_df = pd.DataFrame.from_dict(results_dict)

In [33]:
results_df.sort_values('pinball_loss')

Unnamed: 0,pinball_loss,n_hidden_1,n_hidden_2,iter_max,penalty
28,81.208704,8,2,1,0.001
47,81.224931,8,2,1,0.0
26,81.269753,8,4,1,0.001
59,81.323023,8,4,1,0.0
27,81.376903,4,4,1,0.001
56,81.38957,4,8,1,0.0
25,81.396291,4,8,1,0.001
52,81.411275,4,4,1,0.0
29,81.53119,4,2,1,0.001
46,81.550677,4,2,1,0.0


In [30]:
results_df.to_csv('cv_results.csv', index=False)