# Distributed DeepRacer RL training with SageMaker and RoboMaker

---
## Introduction


In this notebook, we will train a fully autonomous 1/18th scale race car using reinforcement learning using Amazon SageMaker RL and AWS RoboMaker's 3D driving simulator. [AWS RoboMaker](https://console.aws.amazon.com/robomaker/home#welcome) is a service that makes it easy for developers to develop, test, and deploy robotics applications.  

This notebook provides a jailbreak experience of [AWS DeepRacer](https://console.aws.amazon.com/deepracer/home#welcome), giving us more control over the training/simulation process and RL algorithm tuning.

![Training in Action](./deepracer-reinvent-track.jpg)


---
## How it works?  

![How training works](./training.png)

The reinforcement learning agent (i.e. our autonomous car) learns to drive by interacting with its environment, e.g., the track, by taking an action in a given state to maximize the expected reward. The agent learns the optimal plan of actions in training by trial-and-error through repeated episodes.  
  
The figure above shows an example of distributed RL training across SageMaker and two RoboMaker simulation envrionments that perform the **rollouts** - execute a fixed number of episodes using the current model or policy. The rollouts collect agent experiences (state-transition tuples) and share this data with SageMaker for training. SageMaker updates the model policy which is then used to execute the next sequence of rollouts. This training loop continues until the model converges, i.e. the car learns to drive and stops going off-track. More formally, we can define the problem in terms of the following:  

1. **Objective**: Learn to drive autonomously by staying close to the center of the track.
2. **Environment**: A 3D driving simulator hosted on AWS RoboMaker.
3. **State**: The driving POV image captured by the car's head camera, as shown in the illustration above.
4. **Action**: Six discrete steering wheel positions at different angles (configurable)
5. **Reward**: Positive reward for staying close to the center line; High penalty for going off-track. This is configurable and can be made more complex (for e.g. steering penalty can be added).

### Imports

To get started, we'll import the Python libraries we need, set up the environment with a few prerequisites for permissions and configurations.

You can run this notebook from your local machine or from a SageMaker notebook instance. In both of these scenarios, you can run the following to launch a training job on SageMaker and a simulation job on RoboMaker.

In [1]:
from src.core.DeepRacerEngine import DeepRacerEngine

### Scenario 1: Single Model Run and Evaluation

For this scenario, we are going to run a single model, over a chosen track

### Set Up Parameters
There are two set of parameters we can configure:
 - The simulation parameters
 - The model hyperparameters

#### View Default Simulation Parameters

In [2]:
!pygmentize common/constant.py

[37m# Estimator Pamrs[39;49;00m
entry_point = [33m"[39;49;00m[33mtraining_worker.py[39;49;00m[33m"[39;49;00m
source_dir = [33m'[39;49;00m[33msrc[39;49;00m[33m'[39;49;00m

[37m#Training Params[39;49;00m
default_instance_type = [33m"[39;49;00m[33mml.c4.2xlarge[39;49;00m[33m"[39;49;00m [37m#For GPU use 'ml.p3.2xlarge'[39;49;00m
default_instance_pool = [34m1[39;49;00m
default_job_duration = [34m3600[39;49;00m
default_hyperparam_preset = [33m'[39;49;00m[33msrc/markov/presets/preset_hyperparams.json[39;49;00m[33m'[39;49;00m
tmp_hyperparam_preset = [33m'[39;49;00m[33msrc/markov/presets/preset_hyperparams_tmp.py[39;49;00m[33m'[39;49;00m

[37m#Track Details:[39;49;00m
default_track_name = [33m'[39;49;00m[33mreinvent_base[39;49;00m[33m'[39;49;00m
track_name = [[33m'[39;49;00m[33mreinvent_base[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mreinvent_carpet[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mreinvent_concrete[39;49

#### View Default Hypterparameters

In [3]:
!pygmentize src/markov/presets/preset_hyperparams.json

{
    [34;01m"learning_rate"[39;49;00m: [34m0.0003[39;49;00m,
    [34;01m"batch_size"[39;49;00m : [34m64[39;49;00m,
    [34;01m"optimizer_epsilon"[39;49;00m : [34m0.00001[39;49;00m,
    [34;01m"adam_optimizer_beta2"[39;49;00m : [34m0.999[39;49;00m,
    [34;01m"clip_likelihood_ratio_using_epsilon"[39;49;00m : [34m0.2[39;49;00m,
    [34;01m"beta_entropy"[39;49;00m : [34m0.01[39;49;00m,
    [34;01m"gae_lambda"[39;49;00m : [34m0.95[39;49;00m,
    [34;01m"discount"[39;49;00m : [34m0.999[39;49;00m,
    [34;01m"optimization_epochs"[39;49;00m : [34m10[39;49;00m
}


#### Set Parameters

In [3]:
#here we can set the params we want to use for our model runs
params = {
    'job_name': 'dr-test-abc',
    'track_name':'reinvent_base',
    'job_duration': 3600,
    'batch_size':128,
    'evaluation_trials':5
}

#### Instantiate the DeepRacerEngine class, and provide the params

In [6]:
deepRacer = DeepRacerEngine(params)

***Deep Racer Engine Backend***


#### Start Simulation Training 

In [None]:
deepRacer.start_training_testing_process()

#### Plot Training Process

In [None]:
deepRacer.plot_training_output()

Create local folder tmp/dr-test-abc-2019-12-24-05-51-39-591


#### Start Evaluation Proces


In [None]:
deepRacer.start_evaluation_process()

#### Plot Evaluation Process

In [None]:
deepRacer.plot_evaluation_output()

====================

### Scenario 2: Multi-Model Training with different HyperParameters and Evaluation

For this scenario, we are going to train multiple models, with different hyperparameters, and view the training and evaluation in parallel.

#### Hyperparameter Generation

In this example we're going to generate some changes to the hyperparameters. Typically when conducting experiments with varible chances, only one variable is adjusted at a time, in order to allow for measurable changes (controlled experiment). If multiple variables are changed, then it is difficult to determine the impact to the hypothesis.

In [None]:
def param_gen_batch_sizes(self, min_batch = 64, max_batch = 512, job_name_prefix):
    
    batches = []
    btch = min_batch 
    while btch <= max_batch:
        batches.append(btch)
        btch *= 2
    print(batches)
    
    model_params = []
    job_name = job_name_prefix+'-batchsize-'
    for batch_size in batches:
        
        params = {
        'job_name': job_name+'{}'.format(batch_size),
        'track_name':'reinvent_base',
        'job_duration': 3600,
        'batch_size':batch_size,
        'evaluation_trials':5
        }
    model_params.append(params)
    
param_gen_batch_sizes(None)