# Tuning Hyperparameters of LightGBM Model with AML SDK and HyperDrive

This notebook performs hyperparameter tuning of LightGBM model with AML SDK and HyperDrive. It selects the best model by cross validation using the training data in the first forecast round. Specifically, it splits the training data into sub-training data and validation data. Then, it trains LightGBM models with different sets of hyperparameters using the sub-training data and evaluate the accuracy of each model with the validation data. The set of hyperparameters which yield the best validation accuracy will be used to train models and forecast sales across all 12 forecast rounds.

## Prerequisites
To run this notebook, you need to install AML SDK and its widget extension in your environment by running the following commands in a terminal. Before running the commands, you need to activate your environment by executing `source activate <your env>` in a Linux VM.   
`pip3 install --upgrade azureml-sdk[notebooks,automl]`  
`jupyter nbextension install --py --user azureml.widgets`  
`jupyter nbextension enable --py --user azureml.widgets`  

To add the environment to your Jupyter kernels, you can do `python3 -m ipykernel install --name <your env>`. Besides, you need to create an Azure ML workspace and download its configuration file (`config.json`) by following the [configuration.ipynb](https://github.com/Azure/MachineLearningNotebooks/blob/master/configuration.ipynb) notebook.

In [None]:
import azureml
from azureml.core import Workspace, Run

# Check core SDK version number
print("Azure ML SDK Version: ", azureml.core.VERSION)

In [None]:
from azureml.telemetry import set_diagnostics_collection

# Opt-in diagnostics for better experience of future releases
set_diagnostics_collection(send_diagnostics=True)

## Initialize Workspace & Create an Azure ML Experiment

Initialize a [Machine Learning Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the workspace you created in the Prerequisites step. `Workspace.from_config()` below creates a workspace object from the details stored in `config.json` that you have downloaded.

In [None]:
from azureml.core.workspace import Workspace

ws = Workspace.from_config()
print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Resource group: ' + ws.resource_group, sep = '\n')

In [None]:
from azureml.core import Experiment

exp = Experiment(workspace=ws, name='tune_lgbm')

## Validate Script Locally

In [None]:
from azureml.core.runconfig import RunConfiguration

# Configure local, user managed environment
run_config_user_managed = RunConfiguration()
run_config_user_managed.environment.python.user_managed_dependencies = True
run_config_user_managed.environment.python.interpreter_path = '/usr/bin/python3.5'

In [None]:
from azureml.core import ScriptRunConfig

# Please update data-folder argument before submitting the job
src = ScriptRunConfig(source_directory='./', 
                      script='train_validate.py', 
                      arguments=['--data-folder', 
                                 '/home/chenhui/TSPerf/retail_sales/OrangeJuice_Pt_3Weeks_Weekly/data/', 
                                 '--bagging-fraction', '0.8'],
                      run_config=run_config_user_managed)
run_local = exp.submit(src)

In [None]:
# Check job status
run_local.get_status()

In [None]:
# Check results
while(run_local.get_status() != 'Completed'): {}
run_local.get_details()
run_local.get_metrics()

## Run Script on Remote Compute Target

### Create a CPU cluster as compute target

In [None]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Choose a name for your cluster
cluster_name = "cpucluster"

try:
    # Look for the existing cluster by name
    compute_target = ComputeTarget(workspace=ws, name=cluster_name)
    if type(compute_target) is AmlCompute:
        print('Found existing compute target {}.'.format(cluster_name))
    else:
        print('{} exists but it is not an AML Compute target. Please choose a different name.'.format(cluster_name))
except ComputeTargetException:
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size="STANDARD_D14_v2", # CPU-based VM
                                                            #vm_priority='lowpriority', # optional
                                                            min_nodes=0, 
                                                            max_nodes=4,
                                                            idle_seconds_before_scaledown=3600)
    # Create the cluster
    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)
    # Can poll for a minimum number of nodes and for a specific timeout. 
    # if no min node count is provided it uses the scale settings for the cluster
    compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)
    # Get a detailed status for the current cluster. 
    print(compute_target.serialize())

In [None]:
# If you have created the compute target, you should see one entry named 'cpucluster' of type AmlCompute 
# in the workspace's compute_targets property.
compute_targets = ws.compute_targets
for name, ct in compute_targets.items():
    print(name, ct.type, ct.provisioning_state)

### Configure Docker environment

In [None]:
from azureml.core.runconfig import EnvironmentDefinition
from azureml.core.conda_dependencies import CondaDependencies

env = EnvironmentDefinition()
env.python.user_managed_dependencies = False
env.python.conda_dependencies = CondaDependencies.create(conda_packages=['pandas', 'numpy', 'scipy', 'scikit-learn', 'lightgbm', 'joblib'],
                                                         python_version='3.6.2')
env.python.conda_dependencies.add_channel('conda-forge')
env.docker.enabled=True

### Upload data to default datastore

Upload the Orange Juice dataset to the workspace's default datastore, which will later be mounted on the cluster for model training and validation. 

In [None]:
ds = ws.get_default_datastore()
print(ds.datastore_type, ds.account_name, ds.container_name)

In [None]:
path_on_datastore = 'data'
ds.upload(src_dir='../../data', target_path=path_on_datastore, overwrite=True, show_progress=True)

In [None]:
# Get data reference object for the data path
ds_data = ds.path(path_on_datastore)
print(ds_data)

### Create estimator
Next, we will check if the remote compute target is successfully created by submitting a job to the target. This compute target will be used by HyperDrive to tune the hyperparameters later. You may skip this part of code and directly jump into [Tune Hyperparameters using HyperDrive](#tune-hyperparameters-using-hyperdrive).

In [None]:
from azureml.core.runconfig import EnvironmentDefinition
from azureml.train.estimator import Estimator

script_folder = './'
script_params = {
    '--data-folder': ds_data.as_mount(),
    '--bagging-fraction': 0.8
}
est = Estimator(source_directory=script_folder,
                script_params=script_params,
                compute_target=compute_target,
                use_docker=True,
                entry_script='train_validate.py',
                environment_definition=env)

### Submit job

In [None]:
# Submit job to compute target
run_remote = exp.submit(config=est)

### Check job status

In [None]:
from azureml.widgets import RunDetails

RunDetails(run_remote).show()

In [None]:
run_remote.get_details()

In [None]:
# Get metric value after the job finishes  
while(run_remote.get_status() != 'Completed'): {}
run_remote.get_metrics()

<a id='tune-hyperparameters-using-hyperdrive'></a>
## Tune Hyperparameters using HyperDrive

In [None]:
from azureml.train.hyperdrive import *

script_folder = './'
script_params = {
    '--data-folder': ds_data.as_mount()                                                 
}
est = Estimator(source_directory=script_folder,
                script_params=script_params,
                compute_target=compute_target,
                use_docker=True,
                entry_script='train_validate.py',
                environment_definition=env)
ps = BayesianParameterSampling({
    '--num-leaves': quniform(8, 128, 1),
    '--min-data-in-leaf': quniform(20, 500, 10),
    '--learning-rate': choice(1e-4, 1e-3, 5e-3, 1e-2, 1.5e-2, 2e-2, 3e-2, 5e-2, 1e-1),
    '--feature-fraction': uniform(0.2, 1), 
    '--bagging-fraction': uniform(0.1, 1), 
    '--bagging-freq': quniform(1, 20, 1), 
    '--max-rounds': quniform(50, 2000, 10),
    '--max-lag': quniform(3, 40, 1), 
    '--window-size': quniform(3, 40, 1), 
})
htc = HyperDriveRunConfig(estimator=est, 
                          hyperparameter_sampling=ps, 
                          primary_metric_name='MAPE', 
                          primary_metric_goal=PrimaryMetricGoal.MINIMIZE, 
                          max_total_runs=200,
                          max_concurrent_runs=4)
htr = exp.submit(config=htc)

In [None]:
RunDetails(htr).show()

In [None]:
while(htr.get_status() != 'Completed'): {}
htr.get_metrics()

In [None]:
best_run = htr.get_best_run_by_primary_metric()
parameter_values = best_run.get_details()['runDefinition']['Arguments']
print(parameter_values)