# Creating experiments using the runtool
This notebook is meant to help getting started with using the runtool.

## Defining a configuration file
We start with defining a configuration file which we want to use. This example assumes that you have a local docker image containing gluon-ts. The notebook has been tests on gluon-ts release v0.6.4 but should work for previous releases as well.

In [1]:
# Define the image and dataset files to use
# Replace these with your own values
image='gluonts_cpu:v0.6.4'
train_dataset='my s3 path'
test_dataset='my s3 path'

In [2]:
# generate a config. This should be done in a file, but for the purpose of this notebook
# Defining it as a string will suffice
# Please refer to writing_configs.md for further details of the structure of a config file.
config_yaml = f'''
simple:
    image: {image}
    instance: ml.m5.xlarge
    hyperparameters:
        forecaster_name: gluonts.model.simple_feedforward.SimpleFeedForwardEstimator    
        freq: 
            $eval: $trial.dataset.meta.freq
        prediction_length:
            $eval: 2 * $trial.dataset.meta.prediction_length

deepar:
    image: {image}
    instance: ml.m5.xlarge
    hyperparameters:
        forecaster_name: gluonts.model.deepar.DeepAREstimator  
        freq: 
            $eval: $trial.dataset.meta.freq
        prediction_length:
            $eval: 2 * $trial.dataset.meta.prediction_length

electricity_dataset:
    meta:
        freq: H
        prediction_length: 24
    path:
        train: {train_dataset}
        test: {test_dataset}
'''

The runtool can load config files either from a file using a path which is the prefered way. But it can also load a config if it is provided as a dictionary. Thus we have to convert the yaml above into a dict.

In [3]:
import yaml
config_data = yaml.safe_load(config_yaml)

## Writing the run script
Now when we have a configuration defined, we can implement a script which uses the runtool to define and dispatch experiments. 

In [4]:
import runtool
import boto3

In order to use the configurations stored in the `config` any `$xxx` expressions such as `$eval` need to be resolved. This is done by calling the `transform_config`.

In [5]:
config = runtool.runtool.transform_config(config_data)

Now we want to use the `config` to define experiments to run. This is done using the `*` and the `+` symbols. 

`+` concatenates either a set of algorithms or a set of datasets together.

`*` takes a set of algorithms and a set of datasets and generates an experiment from them. 

In this example we want the `deepar` and the `simple` algorithm to train on the `electricity_dataset`.

In [6]:
experiment = config.electricity_dataset * config.simple + config.electricity_dataset * config.deepar

Next we need to provided the runtool with the session it should use when running as well as providing a sagemaker role with proper permissions. Further, we need to provide a bucket where sagemaker will store the output data of the training jobs.

In [11]:
tool = runtool.Client(
    role="my role arn",
    bucket="my s3 bucket for sagemaker artifacts",
    session=boto3.Session(),
)

It may at this point be beneficial to inspect what jobs will be created.
Performing a `dry-run` displays a summary of the jobs that are to be generated in a table. 

The `dry_run` does not work well in jupyter, so we convert it to a `pandas.DataFrame` in order to take a look.

In [9]:
import pandas as pd
table = tool.dry_run(experiment, print_data=False)
pd.DataFrame(table).head()

Unnamed: 0,image,hyperparameters,output_path,instance,job_name,tags,run,datasets
0,gluonts_cpu:v0.6.4,{'forecaster_name': 'gluonts.model.simple_feed...,s3:///default experiment name/default experime...,ml.m5.xlarge,config-7b755f82-date-2021-03-22-09-38-35-runid...,{'run_configuration_id': 'default experiment n...,0,"[my s3 path, my s3 path]"
1,gluonts_cpu:v0.6.4,{'forecaster_name': 'gluonts.model.deepar.Deep...,s3:///default experiment name/default experime...,ml.m5.xlarge,config-48164356-date-2021-03-22-09-38-35-runid...,{'run_configuration_id': 'default experiment n...,0,"[my s3 path, my s3 path]"


When we are satisified with the jobs that are to be run, it is time to execute the jobs using the runtool.

In [12]:
# Note that a valid AWS Execution Role, ECR image and S3 bucket is required for this to work
runs = tool.run(experiment)

total jobs to run: 2
[K0/2 jobs submitted, submitting job: config-48164356-date-2021-03-22-09-40-42-runid-d5103991-run-0

ParamValidationError: Parameter validation failed:
Invalid length for parameter RoleArn, value: 11, valid range: 20-inf

In [None]:
print(runs)