## Training DeepRacer models using SageMaker and RoboMaker
---

This notebook borrows heavily from AWS's [DeepRacer 400L workshop](https://catalog.us-east-1.prod.workshops.aws/workshops/66473261-de66-42a1-b280-3e0ec87aee26/en-US).


### How can I use this notebook? 

Training new DeepRacer models directly with SageMaker and RoboMaker offers the flexibility to: 

* Change the environment (modify the DeepRacer world's lighting conditions)
* Automate the training process
* Train on multiple tracks in parallel
* Experiment! 

**Let's get started!**

## Setup 

We will start by installing libraries and helper functions which will be needed later on

In [None]:
import boto3
import sagemaker
import sys
import os
import re
import numpy as np
import subprocess
import yaml

sys.path.append("common")
sys.path.append("./src")
from misc import get_execution_role, wait_for_s3_object
from docker_utils import build_docker_image
from docker_utils import push as docker_push
from sagemaker.rl import RLEstimator, RLToolkit, RLFramework
from time import gmtime, strftime
import time
from IPython.display import Markdown
from markdown_helper import *

### Get SageMaker execution role

This notebook is designed to be run from a SageMaker notebook, so our first step is to get the SageMaker notebook's IAM role, so that we can start making calls against other AWS services.

In [None]:
try:
    sagemaker_role = sagemaker.get_execution_role()
except:
    print('Unable to get role! Are you running this notebook locally?')

print("Using Sagemaker IAM role arn: \n{}".format(sagemaker_role))

> Please note that this notebook cannot be run in `SageMaker local mode` as the simulator is based on AWS RoboMaker service.

## Download a copy of the DeepRacer simapp

This is a docker container of the simulation application loaded into robomaker. A copy is downloaded and then some files are extracted.

In [None]:
# This is the name of the simapp that is locally created and pushed to your account ECR
local_simapp_ecr_docker_image_name = "deepracer-sim-local-notebook"
public_ecr_alias = "k1d3r4z1"

# Clean up existing docker images
!if [ -n "$(docker ps -a -q)" ]; then docker rm -f $(docker ps -a -q); fi
!if [ -n "$(docker images -q)" ]; then docker rmi -f $(docker images -q); fi

!aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws

!docker pull public.ecr.aws/{public_ecr_alias}/deepracer-sim-public

!docker tag public.ecr.aws/{public_ecr_alias}/deepracer-sim-public {local_simapp_ecr_docker_image_name}

# Get docker id and container id
simapp_docker_ids = !docker images | grep deepracer-sim-public | tr -s ' '| cut -d ' ' -f 3 | head -n 1
simapp_docker_id = simapp_docker_ids[0]
simapp_container_ids = !docker run -d -t {simapp_docker_id}
simapp_container_id = simapp_container_ids[0]

# Copy all the required training related code and update the code base
!docker cp {simapp_container_id}:/opt/amazon/markov ./src/
!docker cp {simapp_container_id}:/opt/amazon/rl_coach.patch ./src/
!docker cp {simapp_container_id}:/opt/ml/code/. ./src/lib/
!rm ./src/lib/credentials #Bug in Jupyter can't handle symlinks to files w/o read perms
    
# Copy out the RoboMaker 3D environment
!docker cp {simapp_container_id}:/opt/amazon/install/deepracer_simulation_environment ./src/

### Initializing basic parameters

In [None]:
# Select the instance type
instance_type = "ml.c4.2xlarge"
# instance_type = "ml.p2.xlarge"
# instance_type = "ml.c5.4xlarge"

# Starting SageMaker session
sage_session = sagemaker.session.Session()

# Create unique job name.
job_name_prefix = "deepracer-notebook"

# Duration of job in seconds (1200 seconds = 20 minutes)
job_duration_in_seconds = 1200

# AWS RoboMaker (which we need for our simulation) is only supported in certain regions
supported_regions = [
    "us-east-1",
    "us-east-2",
    "us-west-2",
    "ap-southeast-1",
    "ap-northeast-1",
    "eu-central-1",
    "eu-west-1",
    "us-gov-west-1"
]

# Check that we are in a region that supports RoboMaker
aws_region = sage_session.boto_region_name
if aws_region not in supported_regions:
    raise Exception(
        "This notebook uses RoboMaker which is available only in certain"
        "regions. Please switch to one of these regions."
    )

### Build Sagemaker docker image

The file ./Dockerfile contains all the packages that should be included in the docker container image. Instead of using the default SageMaker container, we will be using this docker container. This is a separate docker container than the one created earlier for RoboMaker.

In [None]:
%%time
from copy_to_sagemaker_container import (
    get_sagemaker_docker,
    copy_to_sagemaker_container,
    get_custom_image_name,
)

cpu_or_gpu = "gpu" if instance_type.startswith("ml.p") else "cpu"
repository_short_name = "sagemaker-docker-%s" % cpu_or_gpu
custom_image_name = get_custom_image_name(repository_short_name)
try:
    print("Copying files from your notebook to existing sagemaker container")
    sagemaker_docker_id = get_sagemaker_docker(repository_short_name)
    copy_to_sagemaker_container(sagemaker_docker_id, repository_short_name)
except Exception as e:
    print("Creating sagemaker container")
    docker_build_args = {
        "CPU_OR_GPU": cpu_or_gpu,
        "AWS_REGION": boto3.Session().region_name,
    }
    build_docker_image(
        repository_short_name, build_args=docker_build_args
    )


### Run these commands if you wish to modify the SageMaker and Robomaker code
<span style="color:red">Note: Make sure you have atleast 25 GB of space when you are planning to modify the Sagemaker and Robomaker code</span>

Note that the code below will copy any changes you made to `deepracer_simulation_environment` back into the container created during the build, so these changes will end up in your docker container image. This allows you to customize the DeepRacer simulation environment. 

**Pro tip: if you are planning to make simple changes to the environment such as adding additional lights or changing the intensity or color of lights, you can do this in real time inside Gazebo, by launching the simulation job's Gazebo viewer from inside the AWS RoboMaker console!** This is typically faster and easier than changing the container image.

#### Accessing Gazebo from RoboMaker

Find the RoboMaker simulation job, and look for a "gazebo" tool under "Simulation application tools": 

![Gazebo Launcher](./gazebo_launcher.png)

After clicking connect, a window should open where you can interact with the Gazebo environment. Try changing or adding some lights! 

![Gazebo Viewer](./gazebo_viewer.png)

In [None]:
# Get docker id and container id
simapp_docker_ids = !docker images | grep deepracer-sim-local-notebook | tr -s ' '| cut -d ' ' -f 3 | head -n 1
simapp_docker_id = simapp_docker_ids[0]
simapp_container_ids = !docker run -d -t {simapp_docker_id} /bin/sh
simapp_container_id = simapp_container_ids[0]

!docker cp ./src/markov {simapp_container_id}:/opt/amazon/
!docker cp ./src/rl_coach.patch {simapp_container_id}:/opt/amazon/
# #Restore symlink removed earlier due to bug in Jupyter
!rm ./src/lib/credentials
!ln -s /root/.aws/credentials ./src/lib/credentials
!docker cp ./src/lib/. {simapp_container_id}:/opt/ml/code/.

# #This is the Robomaker 3d environment
!docker cp ./src/deepracer_simulation_environment {simapp_container_id}:/opt/amazon/install/

# #Only needed if one modifies the ROS packages/libraries/etc
#!docker exec {simapp_container_id} /opt/ml/code/scripts/build_deepracer_ros_packages.sh

!docker exec {simapp_container_id} /opt/ml/code/scripts/clean_up_local.sh

!docker stop {simapp_container_id}

!docker commit {simapp_container_id} deepracer-sim-local-notebook

## Upload container images to ECR

The Robot Application and Simulation Application containers need to be pushed to ECR before we can create a Simulation Job in RoboMaker.  

In [None]:
# Push the simapp docker image to your ECR account 
docker_push(local_simapp_ecr_docker_image_name)

In [None]:
# Push the SageMaker docker image to your ECR account 
custom_image_name = docker_push(repository_short_name)
print("Using ECR image %s" % custom_image_name)

# Already built the container images? Start here! 

Unless you need to change the simulation environment (which requires rebuilding the container images above), you should always restart your notebook from here. 

### Setup an S3 bucket

We will need an S3 bucket to store model checkpoint data and logs. 

In [None]:
# S3 bucket
s3_bucket = sage_session.default_bucket()

# SDK appends the job name and output folder
s3_output_path = "s3://{}/".format(s3_bucket)

# Ensure that the S3 prefix contains the keyword 'sagemaker'
s3_prefix = job_name_prefix + "-sagemaker-" + strftime("%y%m%d-%H%M%S", gmtime())

# Get the AWS account id of this account
sts = boto3.client("sts")
account_id = sts.get_caller_identity()["Account"]

print("Using s3 bucket {}".format(s3_bucket))
print(
    "Model checkpoints and other metadata will be stored at: \ns3://{}/{}".format(
        s3_bucket, s3_prefix
    )
)

### Query VPC configuration

We need our containers to be able to communicate over the network, so we need to determine our VPC network configuration.

**Note: This code assumes that the *default* VPC is being used.**

In [None]:
ec2 = boto3.client("ec2")

print("Using the default VPC")
deepracer_vpc = [vpc["VpcId"] for vpc in ec2.describe_vpcs()["Vpcs"] if vpc["IsDefault"] == True][0]

deepracer_security_groups = [
    group["GroupId"]
    for group in ec2.describe_security_groups()["SecurityGroups"]
    if "VpcId" in group and group["GroupName"] == "default" and group["VpcId"] == deepracer_vpc
]

deepracer_subnets = [
    subnet["SubnetId"]
    for subnet in ec2.describe_subnets()["Subnets"]
    if subnet["VpcId"] == deepracer_vpc and subnet["DefaultForAz"] == True
]

print("Using VPC:", deepracer_vpc)
print("Using security group:", deepracer_security_groups)
print("Using subnets:", deepracer_subnets)

### Create an S3 Endpoint inside the VPC

We want to access S3 over the VPC to avoid network egress charges (and improve speed and security).

The default VPC should already have an S3 endpoint, but just in case it does not, we create one here. 

In [None]:
def create_vpc_endpoint_table():
    print("Creating ")
    try:
        route_tables = [
            route_table["RouteTableId"]
            for route_table in ec2.describe_route_tables()["RouteTables"]
            if route_table["VpcId"] == deepracer_vpc
        ]
    except Exception as e:
        if "UnauthorizedOperation" in str(e):
            display(Markdown(generate_help_for_s3_endpoint_permissions(sagemaker_role)))
        else:
            display(Markdown(create_s3_endpoint_manually(aws_region, deepracer_vpc)))
        raise e

    print("Trying to attach S3 endpoints to the following route tables:", route_tables)

    if not route_tables:
        raise Exception(
            (
                "No route tables were found. Please follow the VPC S3 endpoint creation "
                "guide by clicking the above link."
            )
        )
    try:
        ec2.create_vpc_endpoint(
            DryRun=False,
            VpcEndpointType="Gateway",
            VpcId=deepracer_vpc,
            ServiceName="com.amazonaws.{}.s3".format(aws_region),
            RouteTableIds=route_tables,
        )
        print("S3 endpoint created successfully!")
    except Exception as e:
        if "RouteAlreadyExists" in str(e):
            print("S3 endpoint already exists.")
        elif "UnauthorizedOperation" in str(e):
            display(Markdown(generate_help_for_s3_endpoint_permissions(role)))
            raise e
        else:
            display(Markdown(create_s3_endpoint_manually(aws_region, deepracer_vpc)))
            raise e

create_vpc_endpoint_table()

# Training

### Copy configuration files to S3

We need to copy our reward function and model settings to S3 so that our simulation job can see them. 

In [None]:
s3_location = "s3://%s/%s" % (s3_bucket, s3_prefix)
print(s3_location)

# Clean up the previously uploaded files
!aws s3 rm --recursive {s3_location}

!aws s3 cp ./src/artifacts/rewards/default.py {s3_location}/customer_reward_function.py

!aws s3 cp ./src/artifacts/actions/default.json {s3_location}/model/model_metadata.json

#!aws s3 cp src/markov/presets/default.py {s3_location}/presets/preset.py
#!aws s3 cp src/markov/presets/preset_attention_layer.py {s3_location}/presets/preset.py

### Train the RL model using the Python SDK Script mode

Next, we define the following algorithm metrics that we want to capture from cloudwatch logs to monitor the training progress. These are algorithm specific parameters and might change for different algorithm. We use Clipped PPO by default.

In [None]:
metric_definitions = [
    # Training> Name=main_level/agent, Worker=0, Episode=19, Total reward=-102.88, Steps=19019, Training iteration=1
    {"Name": "reward-training", "Regex": "^Training>.*Total reward=(.*?),"},
    # Policy training> Surrogate loss=-0.32664725184440613, KL divergence=7.255815035023261e-06, Entropy=2.83156156539917, training epoch=0, learning_rate=0.00025
    {"Name": "ppo-surrogate-loss", "Regex": "^Policy training>.*Surrogate loss=(.*?),"},
    {"Name": "ppo-entropy", "Regex": "^Policy training>.*Entropy=(.*?),"},
    # Testing> Name=main_level/agent, Worker=0, Episode=19, Total reward=1359.12, Steps=20015, Training iteration=2
    {"Name": "reward-testing", "Regex": "^Testing>.*Total reward=(.*?),"},
]

In [None]:
custom_hyperparameter = {
    "s3_bucket": s3_bucket,
    "s3_prefix": s3_prefix,
    "aws_region": aws_region,
    "model_metadata_s3_key": "%s/model/model_metadata.json" % s3_prefix,
    "reward_function_s3_source": "%s/customer_reward_function.py" % s3_prefix,
    "batch_size": "64",
    "num_epochs": "10",
    "stack_size": "1",
    "lr": "0.0003",
    "exploration_type": "Categorical",
    "e_greedy_value": "1",
    "epsilon_steps": "10000",
    "beta_entropy": "0.01",
    "discount_factor": "0.95", # It's a good idea to reduce this from the default value of 0.999
    "loss_type": "Huber",
    "num_episodes_between_training": "20",
    "max_sample_count": "0",
    "sampling_frequency": "1"
    #     ,"pretrained_s3_bucket": "sagemaker-us-east-1-259455987231"
    #     ,"pretrained_s3_prefix": "deepracer-notebook-sagemaker-200729-202318"
}

In [None]:
# Connect to SageMaker so we can create our training job
b_sagemaker = boto3.client("sagemaker", region_name=aws_region)

In [None]:
# Determine if there is an existing, running training job (if there is, we do not need to create one)
try:
    job_arn = training_job['TrainingJobArn']
except:
    job_arn = 'none'

if job_arn == 'none':
    training_job = b_sagemaker.create_training_job(
        TrainingJobName=s3_prefix,
        HyperParameters=custom_hyperparameter,
        AlgorithmSpecification={
            "TrainingImage": "{}:latest".format(custom_image_name),
            "TrainingInputMode": "File"
        },
        RoleArn=sagemaker_role,
        OutputDataConfig={
            "S3OutputPath": "s3://{}/{}/train-output/".format(s3_bucket, s3_prefix)
        },
        ResourceConfig={
            'InstanceType': instance_type,
            'InstanceCount': 1,
            'VolumeSizeInGB': 32
        },
        VpcConfig={
            'SecurityGroupIds': deepracer_security_groups,
            'Subnets': deepracer_subnets
        },
        StoppingCondition={
            'MaxRuntimeInSeconds': job_duration_in_seconds
        },
    )
    
job_name = s3_prefix
training_job_arn = training_job['TrainingJobArn']
print("Training job: %s" % job_name)

### Configure the Robomaker Job

In [None]:
# Configure a new RoboMaker job and an associated Kinesis video stream, so we can watch the car go! 
robomaker = boto3.client("robomaker")
kinesisvideo = boto3.client("kinesisvideo")

#### Create Simulation Application

In [None]:
robomaker_environment = {"uri": get_custom_image_name(local_simapp_ecr_docker_image_name)+":latest"}
simulation_software_suite = {"name": "SimulationRuntime"}
robot_software_suite = {"name": "General"}

In [None]:
app_name = "deepracer-notebook-application" + strftime("%y%m%d-%H%M%S", gmtime())

print(app_name)
try:
    response = robomaker.create_simulation_application(
        name=app_name,
        environment=robomaker_environment,
        simulationSoftwareSuite=simulation_software_suite,
        robotSoftwareSuite=robot_software_suite
    )
    simulation_app_arn = response["arn"]
    print("Created a new simulation app with ARN:", simulation_app_arn)
except Exception as e:
    if "AccessDeniedException" in str(e):
        display(Markdown(generate_help_for_robomaker_all_permissions(role)))
        raise e
    else:
        raise e

#### Set the number of simulation jobs

In [None]:
# Change this for multiple rollouts. This will invoke the specified number of robomaker jobs to collect experience
num_simulation_workers = 1 # Let's try running 2 jobs at once! 

#### Create the Kinesis video stream(s)

In [None]:
kvs_stream_name=[]
kvs_stream_arns=[]
for job_no in range(num_simulation_workers):
    kvs_stream_name.append("dr-kvs-{}-{}".format(job_name,job_no))
    try:
        response=kinesisvideo.create_stream(StreamName=kvs_stream_name[job_no],MediaType="video/h264",DataRetentionInHours=24)
    except Exception as err:
        if err.__class__.__name__ == 'ResourceInUseException':
            response=kinesisvideo.describe_stream(StreamName=kvs_stream_name[job_no])["StreamInfo"]
        else:
            raise err
    print("Created kinesis video stream {}".format(kvs_stream_name[job_no]))
    kvs_stream_arns.append(response["StreamARN"])

### Launch the Simulation job(s) on RoboMaker

We create [AWS RoboMaker](https://console.aws.amazon.com/robomaker/home#welcome) Simulation Jobs that simulates the environment and shares this data with SageMaker for training. 

In [None]:
s3_yaml_name = "training_params.yaml"
world_name = "2022_reinvent_champ"

with open("./src/artifacts/yaml/training_yaml_template.yaml", "r") as filepointer:
    yaml_config = yaml.safe_load(filepointer)

yaml_config["WORLD_NAME"] = world_name
yaml_config["SAGEMAKER_SHARED_S3_BUCKET"] = s3_bucket
yaml_config["SAGEMAKER_SHARED_S3_PREFIX"] = s3_prefix
yaml_config["TRAINING_JOB_ARN"] = training_job_arn
yaml_config["METRICS_S3_BUCKET"] = s3_bucket
yaml_config["METRICS_S3_OBJECT_KEY"] = "{}/training_metrics.json".format(s3_prefix)
yaml_config["SIMTRACE_S3_BUCKET"] = s3_bucket
yaml_config["SIMTRACE_S3_PREFIX"] = "{}/iteration-data/training".format(s3_prefix)
yaml_config["AWS_REGION"] = aws_region
yaml_config["ROBOMAKER_SIMULATION_JOB_ACCOUNT_ID"] = account_id
yaml_config["KINESIS_VIDEO_STREAM_NAME"] = kvs_stream_name[job_no]
yaml_config["REWARD_FILE_S3_KEY"] = "{}/customer_reward_function.py".format(s3_prefix)
yaml_config["MODEL_METADATA_FILE_S3_KEY"] = "{}/model/model_metadata.json".format(s3_prefix)
yaml_config["NUM_WORKERS"] = num_simulation_workers
yaml_config["MP4_S3_BUCKET"] = s3_bucket
yaml_config["MP4_S3_OBJECT_PREFIX"] = "{}/iteration-data/training".format(s3_prefix)

# Race-type supported for training are TIME_TRIAL, OBJECT_AVOIDANCE, HEAD_TO_BOT
# If you need to modify more attributes look at the template yaml file
race_type = "TIME_TRIAL"

if race_type == "OBJECT_AVOIDANCE":
    yaml_config["NUMBER_OF_OBSTACLES"] = "6"
    yaml_config["RACE_TYPE"] = "OBJECT_AVOIDANCE"

elif race_type == "HEAD_TO_BOT":
    yaml_config["NUMBER_OF_BOT_CARS"] = "6"
    yaml_config["RACE_TYPE"] = "HEAD_TO_BOT"

# Printing the modified yaml parameter
for key, value in yaml_config.items():
    print("{}: {}".format(key.ljust(40, " "), value))

# Uploading the modified yaml parameter
with open("./training_params.yaml", "w") as filepointer:
    yaml.dump(yaml_config, filepointer)

!aws s3 cp ./training_params.yaml {s3_location}/training_params.yaml
!rm training_params.yaml

In [None]:
responses = list()
for job_no in range(num_simulation_workers):
    response = robomaker.create_simulation_job(
        clientRequestToken=strftime("%Y-%m-%d-%H-%M-%S", gmtime()),
        outputLocation={
            "s3Bucket": s3_bucket,
            "s3Prefix": s3_prefix
        },
        maxJobDurationInSeconds=job_duration_in_seconds,
        iamRole=sagemaker_role,
        failureBehavior="Fail",
        simulationApplications=[{
            "application": simulation_app_arn,
            "applicationVersion": "$LATEST",
            "launchConfig": {
                "command": ["roslaunch", "deepracer_simulation_environment", "distributed_training.launch"],
                "environmentVariables": {
                    "S3_YAML_NAME": s3_yaml_name,
                    "SAGEMAKER_SHARED_S3_PREFIX": s3_prefix,
                    "SAGEMAKER_SHARED_S3_BUCKET": s3_bucket,
                    "WORLD_NAME": world_name,
                    "KINESIS_VIDEO_STREAM_NAME": kvs_stream_name[job_no],
                    "APP_REGION": aws_region,
                    "MODEL_METADATA_FILE_S3_KEY": "%s/model/model_metadata.json" % s3_prefix,
                    "ROLLOUT_IDX": str(job_no),
                    "DEEPRACER_JOB_TYPE_ENV": "SAGEONLY"
                },
                "streamUI": True
            },
            "uploadConfigurations": [{
                    "name": "gazebo-logs",
                    "path": "/root/.gazebo/server*/*.log",
                    "uploadBehavior": "UPLOAD_ON_TERMINATE"
                },
                {
                    "name": "ros-logs",
                    "path": "/root/.ros/log/**",
                    "uploadBehavior": "UPLOAD_ON_TERMINATE"
                }
            ],
            "useDefaultUploadConfigurations": False,
            "tools": [{
                "streamUI": True,
                "name": "rviz",
                "command": "source /opt/ros/melodic/setup.bash;source /opt/amazon/install/setup.bash; rviz",
                "streamOutputToCloudWatch": True,
                "exitBehavior": "RESTART"
              },
              {
                "streamUI": True,
                "name": "terminal",
                "command": "source /opt/ros/melodic/setup.bash;source /opt/amazon/install/setup.bash; xfce4-terminal",
                "streamOutputToCloudWatch": True,
                "exitBehavior": "RESTART"
              },
              {
                "streamUI": True,
                "name": "gazebo",
                "command": "source /opt/ml/code/scripts/gzclient_source.sh; export GAZEBO_MODEL_PATH=/opt/amazon/install/deepracer_simulation_environment/share/deepracer_simulation_environment/; gzclient",
                "streamOutputToCloudWatch": True,
                "exitBehavior": "RESTART"
            }],
        }],
        vpcConfig={
            "subnets": deepracer_subnets,
            "securityGroups": deepracer_security_groups,
            "assignPublicIp": True
        }
    )
    responses.append(response)
    time.sleep(5)
    
print("Created the following jobs:")
job_arns = [response["arn"] for response in responses]
for job_arn in job_arns:
    print("Job ARN", job_arn)

### Visualizing the simulations in RoboMaker
You can visit the RoboMaker console to visualize the simulations or run the following cell to generate the hyperlinks.

In [None]:
display(Markdown(generate_robomaker_links(job_arns, aws_region)))

for job_no in range(num_simulation_workers):
    display(Markdown("View the Kinesis video stream <a target=_blank href=\"https://us-east-1.console.aws.amazon.com/kinesisvideo/home?region=us-east-1#/streams/streamName/%s\">here.</a> (Expand 'Media Playback')"%(kvs_stream_name[job_no])))

### Create a folder to hold training metrics

In [None]:
tmp_dir = "/tmp/{}".format(job_name)
os.system("mkdir {}".format(tmp_dir))
print("Create local folder {}".format(tmp_dir))

# Explore the output

Take a look at the metrics from the training job and examine the raw output logs and videos. 

### Plot metrics for training job

In [None]:
%matplotlib inline
import pandas as pd
import json

training_metrics_file = "training_metrics.json"
training_metrics_path = "{}/{}".format(s3_prefix, training_metrics_file)
wait_for_s3_object(s3_bucket, training_metrics_path, tmp_dir)

json_file = "{}/{}".format(tmp_dir, training_metrics_file)
with open(json_file) as fp:
    data = json.load(fp)

df = pd.DataFrame(data["metrics"])
x_axis = "episode"
y_axis = "reward_score"

plt = df.plot(x=x_axis, y=y_axis, figsize=(12, 5), legend=True, style="b-")
plt.set_ylabel(y_axis)
plt.set_xlabel(x_axis);

### Explore output logs and videos

In [None]:
display(Markdown("Visit <a target=_blank href=\"https://s3.console.aws.amazon.com/s3/buckets/%s?region=%s&prefix=%s/&showversions=false\">the output logs, videos, and other artifacts in S3.</a>"%(s3_bucket,aws_region,s3_prefix)))

### Upload Your Model into the DeepRacer console

When training is complete, import the trained model into the DeepRacer console so one can clone and train it further in the console, evaluate it in the console, or submit it to the virtual league. Visit <a href="https://us-east-1.console.aws.amazon.com/deepracer/home?region=us-east-1#models">"Your Models"</a> in the DeepRacer console, click the 'Import model' button, and follow the directions. Use the following URL for your import path:

In [None]:
display(Markdown("Copy and paste this S3 path: <a target=_blank href=\"s3://%s/%s\">s3://%s/%s</a>"%(s3_bucket,s3_prefix,s3_bucket,s3_prefix)))

### Explore output logs and videos

In [None]:
display(Markdown("Visit <a target=_blank href=\"https://s3.console.aws.amazon.com/s3/buckets/%s?region=%s&prefix=%s/&showversions=false\">the output logs, videos, and other artifacts in S3.</a>"%(s3_bucket,aws_region,s3_prefix)))

# Evaluate your model

Start an evaluation job to see how your model performs! 

# Evaluation (Time Trial)

In [None]:
s3_yaml_name = "evaluation_params.yaml"
world_name = "2022_reinvent_champ"

with open("./src/artifacts/yaml/evaluation_yaml_template.yaml", "r") as filepointer:
    yaml_config = yaml.safe_load(filepointer)

yaml_config["WORLD_NAME"] = world_name
yaml_config["MODEL_S3_BUCKET"] = s3_bucket
yaml_config["MODEL_S3_PREFIX"] = s3_prefix
yaml_config["AWS_REGION"] = aws_region
yaml_config["METRICS_S3_BUCKET"] = s3_bucket
yaml_config["METRICS_S3_OBJECT_KEY"] = "{}/evaluation_metrics.json".format(s3_prefix)
yaml_config["SIMTRACE_S3_BUCKET"] = s3_bucket
yaml_config["SIMTRACE_S3_PREFIX"] = "{}/iteration-data/evaluation".format(s3_prefix)
yaml_config["ROBOMAKER_SIMULATION_JOB_ACCOUNT_ID"] = account_id
yaml_config["NUMBER_OF_TRIALS"] = "5"
yaml_config["MP4_S3_BUCKET"] = s3_bucket
yaml_config["MP4_S3_OBJECT_PREFIX"] = "{}/iteration-data/evaluation".format(s3_prefix)

# Race-type supported for training are TIME_TRIAL, OBJECT_AVOIDANCE, HEAD_TO_BOT
# If you need to modify more attributes look at the template yaml file
race_type = "TIME_TRIAL"

if race_type == "OBJECT_AVOIDANCE":
    yaml_config["NUMBER_OF_OBSTACLES"] = "6"
    yaml_config["RACE_TYPE"] = "OBJECT_AVOIDANCE"

elif race_type == "HEAD_TO_BOT":
    yaml_config["NUMBER_OF_BOT_CARS"] = "6"
    yaml_config["RACE_TYPE"] = "HEAD_TO_BOT"

# Printing the modified yaml parameter
for key, value in yaml_config.items():
    print("{}: {}".format(key.ljust(40, " "), value))

# Uploading the modified yaml parameter
with open("./evaluation_params.yaml", "w") as filepointer:
    yaml.dump(yaml_config, filepointer)

!aws s3 cp ./evaluation_params.yaml {s3_location}/evaluation_params.yaml
!rm evaluation_params.yaml

#### Create the Kinesis video stream

In [None]:
# Set the number of simultaneous evaluations to carry out
num_evaluation_workers = 1

In [None]:
kvs_stream_name=[]
kvs_stream_arns=[]
for job_no in range(num_evaluation_workers):
    kvs_stream_name.append("dr-kvs-{}-{}".format(job_name,job_no))
    try:
        response=kinesisvideo.create_stream(StreamName=kvs_stream_name[job_no],MediaType="video/h264",DataRetentionInHours=24)
    except Exception as err:
        if err.__class__.__name__ == 'ResourceInUseException':
            response=kinesisvideo.describe_stream(StreamName=kvs_stream_name[job_no])["StreamInfo"]
        else:
            raise err
    print("Created kinesis video stream {}".format(kvs_stream_name[job_no]))
    kvs_stream_arns.append(response["StreamARN"])

In [None]:
responses = list()
for job_no in range(num_evaluation_workers):
    response = robomaker.create_simulation_job(
        clientRequestToken=strftime("%Y-%m-%d-%H-%M-%S", gmtime()),
        outputLocation={
            "s3Bucket": s3_bucket,
            "s3Prefix": s3_prefix
        },
        maxJobDurationInSeconds=job_duration_in_seconds,
        iamRole=sagemaker_role,
        failureBehavior="Fail",
        simulationApplications=[{
            "application": simulation_app_arn,
            "applicationVersion": "$LATEST",
            "launchConfig": {
                "command": ["roslaunch", "deepracer_simulation_environment", "evaluation.launch"],
                "environmentVariables": {
                    "S3_YAML_NAME": s3_yaml_name,
                    "MODEL_S3_PREFIX": s3_prefix,
                    "MODEL_S3_BUCKET": s3_bucket,
                    "WORLD_NAME": world_name,
                    "KINESIS_VIDEO_STREAM_NAME": kvs_stream_name[job_no],
                    "APP_REGION": aws_region,
                    "MODEL_METADATA_FILE_S3_KEY": "%s/model/model_metadata.json" % s3_prefix,
                },
                "streamUI": True
            },
            "uploadConfigurations": [{
                    "name": "gazebo-logs",
                    "path": "/root/.gazebo/server*/*.log",
                    "uploadBehavior": "UPLOAD_ON_TERMINATE"
                },
                {
                    "name": "ros-logs",
                    "path": "/root/.ros/log/**",
                    "uploadBehavior": "UPLOAD_ON_TERMINATE"
                }
            ],
            "useDefaultUploadConfigurations": False,
            "tools": [{
                "streamUI": True,
                "name": "rviz",
                "command": "source /opt/ros/melodic/setup.bash;source /opt/amazon/install/setup.bash; rviz",
                "streamOutputToCloudWatch": True,
                "exitBehavior": "RESTART"
              },
              {
                "streamUI": True,
                "name": "terminal",
                "command": "source /opt/ros/melodic/setup.bash;source /opt/amazon/install/setup.bash; xfce4-terminal",
                "streamOutputToCloudWatch": True,
                "exitBehavior": "RESTART"
              },
              {
                "streamUI": True,
                "name": "gazebo",
                "command": "source /opt/ml/code/scripts/gzclient_source.sh; export GAZEBO_MODEL_PATH=/opt/amazon/install/deepracer_simulation_environment/share/deepracer_simulation_environment/; gzclient",
                "streamOutputToCloudWatch": True,
                "exitBehavior": "RESTART"
            }],
        }],
        vpcConfig={
            "subnets": deepracer_subnets,
            "securityGroups": deepracer_security_groups,
            "assignPublicIp": True
        }
    )
    responses.append(response)
    time.sleep(5)
    
print("Created the following jobs:")
job_arns = [response["arn"] for response in responses]
for job_arn in job_arns:
    print("Job ARN", job_arn)

### Visualizing the simulations in RoboMaker

You can visit the RoboMaker console directly to watch the simulations, or run the cell below to generate Kinesis video stream URLs. 

In [None]:
display(Markdown(generate_robomaker_links(job_arns, aws_region)))

for job_no in range(num_evaluation_workers):
    display(Markdown("View the Kinesis video stream <a target=_blank href=\"https://us-east-1.console.aws.amazon.com/kinesisvideo/home?region=us-east-1#/streams/streamName/%s\">here.</a> (Expand 'Media Playback')"%(kvs_stream_name[job_no])))

### Create (another) temporary folder to plot metrics

In [None]:
evaluation_metrics_file = "evaluation_metrics.json"
evaluation_metrics_path = "{}/{}".format(s3_prefix, evaluation_metrics_file)
wait_for_s3_object(s3_bucket, evaluation_metrics_path, tmp_dir)

json_file = "{}/{}".format(tmp_dir, evaluation_metrics_file)
with open(json_file) as fp:
    data = json.load(fp)

df = pd.DataFrame(data["metrics"])
# Converting milliseconds to seconds
df["elapsed_time"] = df["elapsed_time_in_milliseconds"] / 1000
df = df[["trial", "completion_percentage", "reset_count", "elapsed_time"]]

display(df)

### Explore output logs and videos

In [None]:
display(Markdown("Visit <a target=_blank href=\"https://s3.console.aws.amazon.com/s3/buckets/%s?region=%s&prefix=%s/&showversions=false\">the output logs, videos, and other artifacts in S3.</a>"%(s3_bucket,aws_region,s3_prefix)))

# Clean Up the Environment

### Clean up RoboMaker and SageMaker training jobs

Get rid of any outstanding RoboMaker and SageMaker training jobs.

In [None]:
# Cancelling robomaker job
for job_arn in job_arns:
    try:
        robomaker.cancel_simulation_job(job=job_arn)
    except:
        pass

# Stopping sagemaker training job
try:
    sage_session.sagemaker_client.stop_training_job(TrainingJobName=job_name)
except Exception as err:
    print("Could not stop training job; already stopped?",err)

### Clean Up Simulation Application Resource

In [None]:
robomaker.delete_simulation_application(application=simulation_app_arn)

### Remove Kinesis Video Streams

In [None]:
for streamarn in kvs_stream_arns:
    try:
        kinesisvideo.delete_stream(StreamARN=streamarn)
        print("Deleted",streamarn)
    except:
        pass

### Clean your S3 bucket

**Note** this section is left commented to avoid accidentally deleting trained models you might want to keep! Please import your models into DeepRacer or move them to another S3 bucket, if you want to keep them. 

In [None]:
# Uncomment if you only want to clean the s3 bucket
sagemaker_s3_folder = "s3://{}/{}".format(s3_bucket, s3_prefix)
!aws s3 rm --recursive {sagemaker_s3_folder}

robomaker_s3_folder = "s3://{}/{}".format(s3_bucket, job_name)
!aws s3 rm --recursive {robomaker_s3_folder}

robomaker_sim_app = "s3://{}/{}".format(s3_bucket, 'robomaker')
!aws s3 rm --recursive {robomaker_sim_app}

model_output = "s3://{}/{}".format(s3_bucket, s3_bucket)
!aws s3 rm --recursive {model_output}

#### Remove the docker images from Elastic Container Repository

In [None]:
ecr = boto3.client('ecr')

ecr.delete_repository(repositoryName=local_simapp_ecr_docker_image_name,force=True)
ecr.delete_repository(repositoryName=repository_short_name,force=True)

### Clean the docker images
Uncomment and run this only when you want to completely remove the docker containers or clean up the space of the sagemaker instance on this notebook.

In [None]:
!docker rm -f $(docker ps -a -q);
!docker rmi -f $(docker images -q);