# Time Series Forecasting Backtesting using HyperDrive

## Prerequisites
In order to run this notebook, you need to install AML SDK and its widget extension in your environment by running the following commands in commandline or terminal.  
First, you need to activate your environment by running `activate <your env>` or `source activate <your env>`(on Linux).   
`pip install --upgrade azureml-sdk[notebooks,automl]`  
`jupyter nbextension install --py --user azureml.train.widgets`  
`jupyter nbextension enable --py --user azureml.train.widgets`

To add the environment to your Jupyter kernels, you can do `python3 -m ipykernel install --name <your env>`. Besides, you need to create an Azure ML workspace and download its configuration file (`config.json`) by following the [configuration.ipynb](https://github.com/Azure/MachineLearningNotebooks/blob/master/configuration.ipynb) notebook.

## Set up workspace and experiment

In [3]:
from azureml.core import Workspace, Experiment
ws = Workspace.from_config()
exp = Experiment(workspace=ws, name = 'tsbacktest')

Found the config file in: C:\Users\hlu\TSPerf\prototypes\cross_validation\config.json


## Validate script locally
Configure local, user managed environment

In [12]:
from azureml.core.runconfig import RunConfiguration
run_config_user_managed = RunConfiguration()
run_config_user_managed.environment.python.user_managed_dependencies = True
run_config_user_managed.environment.python.interpreter_path = 'C:/Anaconda/envs/tsperf/python.exe'

In [13]:
from azureml.core import ScriptRunConfig
src = ScriptRunConfig(source_directory='./', 
                      script='train_validation.py', 
                      arguments=['--data-folder', 'C:/Users/hlu/TSPerf/prototypes/cross_validation/data/', '--n-estimators', '10', '--min-samples-split', '10'],
                      run_config=run_config_user_managed)
run_local = exp.submit(src)

In [73]:
run_local.get_details()
run_local.get_metrics()

{'average pinball loss': 193.81733289262013}

## Submit a single job to BatchAI

### Configure Batch AI cluster

In [4]:
from azureml.core.compute import ComputeTarget, BatchAiCompute
from azureml.core.compute_target import ComputeTargetException

batchai_cluster_name = "hlutsperfnew"
try:
    compute_target = ComputeTarget(workspace=ws, name = batchai_cluster_name)
    if type(compute_target) is BatchAiCompute:
        print('found compute target {}, just use it.'.format(batchai_cluster_name))
    else:
        print('{} exists but it is not a Batch AI cluster. Please choose a different name.'.format(batchai_cluster_name))
except ComputeTargetException:
    print('creating a new compute target...')
    compute_config = BatchAiCompute.provisioning_configuration(vm_size="STANDARD_D2_V2",
                                                                autoscale_enabled=True,
                                                                cluster_min_nodes=0, 
                                                                cluster_max_nodes=4)

    # create the cluster
    compute_target = ComputeTarget.create(ws, batchai_cluster_name, compute_config)
    
    # can poll for a minimum number of nodes and for a specific timeout. 
    # if no min node count is provided it uses the scale settings for the cluster
    compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)
    
    # Use the 'status' property to get a detailed status for the current cluster. 
    print(compute_target.status.serialize()) 
    

found compute target hlutsperfnew, just use it.


### Configure Docker environment

In [74]:
from azureml.core.runconfig import EnvironmentDefinition
from azureml.core.conda_dependencies import CondaDependencies

env = EnvironmentDefinition()

env.python.user_managed_dependencies = False
env.python.conda_dependencies = CondaDependencies.create(conda_packages=['pandas', 'numpy', 'scikit-garden', 'joblib'],
                                                         python_version='3.6.2')
env.python.conda_dependencies.add_channel('conda-forge')
env.docker.enabled=True

### Create Estimator

In [75]:
from azureml.core.runconfig import EnvironmentDefinition
from azureml.train.estimator import Estimator

script_folder = './'

script_params = {
    '--data-folder': ws.get_default_datastore().as_mount(),
    '--n-estimators': 10,
    '--min-samples-split': 10
}

est = Estimator(source_directory=script_folder,
                script_params=script_params,
                compute_target=compute_target,
                use_docker=True,
                entry_script='train_validation.py',
                environment_definition=env)

### Submit job

In [76]:
run_batchai = exp.submit(config=est)

### Check job status

In [None]:
from azureml.train.widgets import RunDetails
RunDetails(run_batchai).show()

In [83]:
run_batchai.get_details()
run_batchai.get_metrics()

{'average pinball loss': 193.81733289262013}

## Tune hyper parameter using HyperDrive

In [None]:
from azureml.train.hyperdrive import *
ps = RandomParameterSampling({
    '--min-samples-split': choice(5, 10),
    '--n-estimators': choice(10, 100)
})
htc = HyperDriveRunConfig(estimator=est, 
                          hyperparameter_sampling=ps, 
                          primary_metric_name='average pinball loss', 
                          primary_metric_goal=PrimaryMetricGoal.MINIMIZE, 
                          max_total_runs=8,
                          max_concurrent_runs=4)
htr = exp.submit(config=htc)

In [None]:
from azureml.train.widgets import RunDetails
RunDetails(htr).show()

In [84]:
best_run = htr.get_best_run_by_primary_metric()
parameter_values = best_run.get_details()['runDefinition']['Arguments']
print(parameter_values)

['--data-folder', '$AZUREML_DATAREFERENCE_workspacefilestore', '--min-samples-split', '5', '--n-estimators', '10']
