# Cart-pole Balancing Model with Amazon SageMaker and Ray

---
## Introduction

In this notebook we'll start from the cart-pole balancing problem, where a pole is attached by an un-actuated joint to a cart, moving along a frictionless track. Instead of applying control theory to solve the problem, this example shows how to solve the problem with reinforcement learning on Amazon SageMaker and Ray RLlib. You can choose either TensorFlow or PyTorch as your underlying DL framework.

(For a similar example using Coach library, see this [link](../rl_cartpole_coach/rl_cartpole_coach_gymEnv.ipynb). Another Cart-pole example using Coach library and offline data can be found [here](../rl_cartpole_batch_coach/rl_cartpole_batch_coach.ipynb).)

1. *Objective*: Prevent the pole from falling over
2. *Environment*: The environment used in this exmaple is part of OpenAI Gym, corresponding to the version of the cart-pole problem described by Barto, Sutton, and Anderson [1]
3. *State*: Cart position, cart velocity, pole angle, pole velocity at tip	
4. *Action*: Push cart to the left, push cart to the right
5. *Reward*: Reward is 1 for every step taken, including the termination step

References

1. AG Barto, RS Sutton and CW Anderson, "Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problem", IEEE Transactions on Systems, Man, and Cybernetics, 1983.

## Pre-requisites 

### Imports

To get started, we'll import the Python libraries we need, set up the environment with a few prerequisites for permissions and configurations.

In [1]:
import sagemaker
import boto3
import sys
import os
import glob
import re
import subprocess
import numpy as np
from IPython.display import HTML
import time
from time import gmtime, strftime
sys.path.append("common")
from misc import get_execution_role, wait_for_s3_object
from docker_utils import build_and_push_docker_image
from sagemaker.rl import RLEstimator, RLToolkit, RLFramework

### Setup S3 bucket

Set up the linkage and authentication to the S3 bucket that you want to use for checkpoint and the metadata. 

In [2]:
sage_session = sagemaker.session.Session()
s3_bucket = sage_session.default_bucket()  
s3_output_path = 's3://{}/'.format(s3_bucket)
print("S3 bucket path: {}".format(s3_output_path))

S3 bucket path: s3://sagemaker-us-west-2-775004277940/


### Define Variables 

We define variables such as the job prefix for the training jobs *and the image path for the container (only when this is BYOC).*

In [3]:
# create a descriptive job name 
job_name_prefix = 'rl-cartpole-ray'

### Configure where training happens

You can train your RL training jobs using the SageMaker notebook instance or local notebook instance. In both of these scenarios, you can run the following in either local or SageMaker modes. The local mode uses the SageMaker Python SDK to run your code in a local container before deploying to SageMaker. This can speed up iterative testing and debugging while using the same familiar Python SDK interface. You just need to set `local_mode = True`.

In [4]:
# run in local_mode on this machine, or as a SageMaker TrainingJob?
local_mode = False

if local_mode:
    instance_type = 'local'
else:
    # If on SageMaker, pick the instance type
    instance_type = "ml.c5.2xlarge"
    instance_type = "ml.p3.2xlarge"

### Create an IAM role

Either get the execution role when running from a SageMaker notebook instance `role = sagemaker.get_execution_role()` or, when running from local notebook instance, use utils method `role = get_execution_role()` to create an execution role.

In [5]:
try:
    role = sagemaker.get_execution_role()
except:
    role = get_execution_role()

print("Using IAM role arn: {}".format(role))

Using IAM role arn: arn:aws:iam::775004277940:role/test-distributed-rl-annal-NotebookInstanceExecutio-1WIJ7RST9BA0J


### Install docker for `local` mode

In order to work in `local` mode, you need to have docker installed. When running from you local machine, please make sure that you have docker and docker-compose (for local CPU machines) and nvidia-docker (for local GPU machines) installed. Alternatively, when running from a SageMaker notebook instance, you can simply run the following script to install dependenceis.

Note, you can only run a single local notebook at one time.

In [6]:
# only run from SageMaker notebook instance
if local_mode:
    !/bin/bash ./common/setup.sh

## Write the Training Code

The training code is written in the file “train-rl-cartpole-ray.py” which is uploaded in the /src directory. 
First import the environment files and the preset files, and then define the main() function. 

**Note**: If PyTorch is used, plese update the above training code and set `use_pytorch` to `True` in the config.

In [7]:
!pygmentize src/train-rl-cartpole-ray.py

[34mimport[39;49;00m [04m[36mjson[39;49;00m
[34mimport[39;49;00m [04m[36mos[39;49;00m

[34mimport[39;49;00m [04m[36mgym[39;49;00m
[34mimport[39;49;00m [04m[36mray[39;49;00m
[34mfrom[39;49;00m [04m[36mray[39;49;00m[04m[36m.[39;49;00m[04m[36mtune[39;49;00m [34mimport[39;49;00m run_experiments
[34mfrom[39;49;00m [04m[36mray[39;49;00m[04m[36m.[39;49;00m[04m[36mtune[39;49;00m[04m[36m.[39;49;00m[04m[36mregistry[39;49;00m [34mimport[39;49;00m register_env

[34mfrom[39;49;00m [04m[36msagemaker_rl[39;49;00m[04m[36m.[39;49;00m[04m[36mray_launcher[39;49;00m [34mimport[39;49;00m SageMakerRayLauncher


[34mdef[39;49;00m [32mcreate_environment[39;49;00m(env_config):
    [34mreturn[39;49;00m gym.make([33m'[39;49;00m[33mCartPole-v1[39;49;00m[33m'[39;49;00m) [37m#, render_mode="rgb_array")[39;49;00m

[34mclass[39;49;00m [04m[32mMyLauncher[39;49;00m(SageMakerRayLauncher):

    [34mdef[39;49;00m [32mregister_env_creato

# Configure the framework you want to use

Set `framework` to `"tf"` or `"torch"` for tensorflow or pytorch respectively.

You will also have to edit your entry point i.e., [`train-sagemaker.py`](./source/train-sagemaker.py) with the configuration parameter `framework` to match the framework that you have selected.

In [8]:
#framework = "tf"
framework = "torch"

## Build docker container

We must build a custom docker container with Roboschool installed.  This takes care of everything:

1. Fetching base container image
2. Installing Roboschool and its dependencies
3. Uploading the new container image to ECR

This step can take a long time if you are running on a machine with a slow internet connection.  If your notebook instance is in SageMaker or EC2 it should take 3-10 minutes depending on the instance type.


In [9]:
!docker stop $(docker ps -aq)

"docker stop" requires at least 1 argument.
See 'docker stop --help'.

Usage:  docker stop [OPTIONS] CONTAINER [CONTAINER...]

Stop one or more running containers


In [10]:
!docker rm $(docker ps -aq)

"docker rm" requires at least 1 argument.
See 'docker rm --help'.

Usage:  docker rm [OPTIONS] CONTAINER [CONTAINER...]

Remove one or more containers


In [11]:
!docker rmi -f $(docker images -a -q)

Untagged: 775004277940.dkr.ecr.us-west-2.amazonaws.com/ray-1_1_cartpole-gpu-torch-1.7.1:latest
Untagged: 775004277940.dkr.ecr.us-west-2.amazonaws.com/ray-1_1_cartpole-gpu-torch-1.7.1@sha256:b0c9b12cbc81e01e394ec53ffc49d729efd714724c6a3012fc7f17a0b2a48d39
Untagged: ray-1_1_cartpole-gpu-torch-1.7.1:latest
Deleted: sha256:692022756fc7ab5ecd0c9d83dd67d3c2fe391df96f629b45bf08c423cb249997
Deleted: sha256:60d9ba77d106c50b1db5d795d4be52a7ec99f4d81aeef214eef38d88af5ab021
Deleted: sha256:07852ed60b10e9ed91b124350044048a7f55dc847154a365d268edf9b27333a0
Deleted: sha256:c5733ee4124c36d9fbec0f88325f41081516ddd962e9325141f5606222b4b70d
Deleted: sha256:eec69f2bb23e0655c29358c38b4794ddfd066c90c560585c48386e54ba199d92
Deleted: sha256:93b56c6151eca0e05150174d29fbfeb73b306aed409abafccf1d6c31f7a0805f
Deleted: sha256:2ae10e714b0b7074c0f58f39a8272bad14994e19c1d3f2839b8ce810fa0436c4
Deleted: sha256:accd1af2d89fe4946ab7c8765652b688b0e679c1562bcca88d1209b97bbda42e
Deleted: sha256:0cb4f31c71b0a9200169fe24e71f126

In [12]:
# default as tensorflow
if framework == 'tf':
    framework_fullname = 'tensorflow'
    framework_version = "1.15.5" # TF "1.15.5" or "2.3.1" PyTorch "1.7.1"
    python_version = "py37"
elif framework == 'torch':
    framework_fullname = 'pytorch'
    framework_version = "1.7.1" # PyTorch "1.7.1"
    python_version = "py36"


aws_region = boto3.Session().region_name
suffix = python_version

# Sahika: Todo, add ml.g instances or inferon
if 'ml.p' in instance_type:
    CPU_OR_GPU = "gpu"
    if framework == "tf" and framework_version.startswith("1.15"):
        suffix += "-cu100-ubuntu18.04"
    if framework == "tf" and framework_version.startswith("2.3"):
        suffix += "-cu102-ubuntu18.04"
    if framework == "torch" and framework_version.startswith("1.7"):
        suffix += "-cu110-ubuntu18.04"
elif 'ml.c' in instance_type:
    CPU_OR_GPU = "cpu"


    
repository_short_name = "ray-1_1_cartpole-{}-{}-{}".format(CPU_OR_GPU, framework, framework_version)

docker_build_args = {
    'CPU_OR_GPU': CPU_OR_GPU, 
    'AWS_REGION': aws_region,
    'FRAMEWORK': framework_fullname,
    'VERSION': framework_version,     
    'SUFFIX': suffix
}
image_name = build_and_push_docker_image(repository_short_name, build_args=docker_build_args)


https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded
Logged into ECR
Building docker image ray-1_1_cartpole-gpu-torch-1.7.1 from Dockerfile
$ docker build -t ray-1_1_cartpole-gpu-torch-1.7.1 -f Dockerfile . --build-arg CPU_OR_GPU=gpu --build-arg AWS_REGION=us-west-2 --build-arg FRAMEWORK=pytorch --build-arg VERSION=1.7.1 --build-arg SUFFIX=py36-cu110-ubuntu18.04
Sending build context to Docker daemon  606.7kB
Step 1/24 : ARG AWS_REGION
Step 2/24 : ARG CPU_OR_GPU
Step 3/24 : ARG SUFFIX
Step 4/24 : ARG VERSION
Step 5/24 : ARG FRAMEWORK
Step 6/24 : FROM 763104351884.dkr.ecr.${AWS_REGION}.amazonaws.com/${FRAMEWORK}-training:${VERSION}-${CPU_OR_GPU}-${SUFFIX}
1.7.1-gpu-py36-cu110-ubuntu18.04: Pulling from pytorch-training
171857c49d0f: Pulling fs layer
419640447d26: Pulling fs layer
61e52f862619: Pulling fs layer
2a93278deddf: Pulling fs layer
c9f080049843: Pulling fs layer
8189556b2329: Pulling fs layer
c306a0c97a55: Pulling fs layer
4a9478bd0b24: 

Confirm the image name correctly refers to the version of Ray and tensorflow or torch libraries.

In [13]:
print("Using ECR image %s" % image_name)

Using ECR image 775004277940.dkr.ecr.us-west-2.amazonaws.com/ray-1_1_cartpole-gpu-torch-1.7.1


### Metric Definitions

In [14]:
#metric_definitions = RLEstimator.default_metric_definitions(RLToolkit.RAY)

metric_definitions = [{'Name': 'episode_reward_mean',
  'Regex': 'episode_reward_mean: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'episode_reward_max',
  'Regex': 'episode_reward_max: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'episode_len_mean',
  'Regex': 'episode_len_mean: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'entropy',
  'Regex': 'entropy: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'episode_reward_min',
  'Regex': 'episode_reward_min: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},                                           
]

## Train the RL model using the Python SDK Script mode

If you are using local mode, the training will run on the notebook instance. When using SageMaker for training, you can select a GPU or CPU instance. The RLEstimator is used for training RL jobs. 

1. Specify the source directory where the environment, presets and training code is uploaded.
2. Specify the entry point as the training code 
3. Specify the custom image to be used for the training environment. 
4. Define the training parameters such as the instance count, job name, S3 path for output and job name. 
5. Define the metrics definitions that you are interested in capturing in your logs. These can also be visualized in CloudWatch and SageMaker Notebooks. 

Configure the number of training instances, note that this is different from the number of workers. Make sure the entry point is located in the source directory, i.e.`src` folder in this notebook. Set the training duration in seconds.

In [15]:
train_instance_count = 1
train_entry_point = "train-rl-cartpole-ray.py"
#train_entry_point = "train-rl-cartpole-ray-customEnv.py"

train_job_max_duration_in_seconds = 60 * 10

estimator = RLEstimator(entry_point= train_entry_point,
                        source_dir="src",
                        dependencies=["common/sagemaker_rl"],
                        image_uri=image_name,
                        role=role,
                        instance_type=instance_type,
                        instance_count=1,
                        output_path=s3_output_path,
                        base_job_name=job_name_prefix,
                        metric_definitions=metric_definitions,
                        max_run=train_job_max_duration_in_seconds,
                        debugger_hook_config=False,
                        hyperparameters={}
                       )

estimator.fit(wait=True)
job_name = estimator.latest_training_job.job_name
print("Training job: %s" % job_name)

2021-03-10 05:18:58 Starting - Starting the training job...
2021-03-10 05:19:20 Starting - Launching requested ML instancesProfilerReport-1615353537: InProgress
......
2021-03-10 05:20:21 Starting - Preparing the instances for training......
2021-03-10 05:21:22 Downloading - Downloading input data...
2021-03-10 05:21:45 Training - Downloading the training image...........................[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2021-03-10 05:26:16,572 sagemaker-containers INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2021-03-10 05:26:16,596 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2021-03-10 05:26:18,006 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[34m2021-03-10 05:26:18,223 sagemaker-training-toolkit INFO     Invoking user script
[0m
[34mTraining Env:
[0m
[34m{
    "add

UnexpectedStatusException: Error for Training job rl-cartpole-ray-2021-03-10-05-18-57-494: Failed. Reason: AlgorithmError: framework error: 
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/sagemaker_containers/_trainer.py", line 84, in train
    entrypoint()
  File "/opt/conda/lib/python3.6/site-packages/sagemaker_pytorch_container/training.py", line 121, in main
    train(environment.Environment())
  File "/opt/conda/lib/python3.6/site-packages/sagemaker_pytorch_container/training.py", line 80, in train
    six.reraise(info[0], err, info[2])
  File "/opt/conda/lib/python3.6/site-packages/six.py", line 703, in reraise
    raise value
  File "/opt/conda/lib/python3.6/site-packages/sagemaker_pytorch_container/training.py", line 73, in train
    runner_type=runner_type)
  File "/opt/conda/lib/python3.6/site-packages/sagemaker_training/entry_point.py", line 100, in run
    wait, capture_error
  File "/opt/conda/lib/python3.6/site-packages/sagemaker_training/process.py", line 164, in run
    cwd=environment.code_dir,
  File "/opt/conda/lib/python3.6/site-packages/sagemaker_training/pro

## Visualization

RL training can take a long time.  So while it's running there are a variety of ways we can track progress of the running training job.  Some intermediate output gets saved to S3 during training, so we'll set up to capture that.

In [None]:
print("Job name: {}".format(job_name))

s3_url = "s3://{}/{}".format(s3_bucket,job_name)

intermediate_folder_key = "{}/output/intermediate/".format(job_name)
intermediate_url = "s3://{}/{}".format(s3_bucket, intermediate_folder_key)

print("S3 job path: {}".format(s3_url))
print("Intermediate folder path: {}".format(intermediate_url))
    
tmp_dir = "/tmp/{}".format(job_name)
os.system("mkdir {}".format(tmp_dir))
print("Create local folder {}".format(tmp_dir))

### Fetch videos of training rollouts
Videos of certain rollouts get written to S3 during training.  Here we fetch the last 10 videos from S3, and render the last one.

In [None]:
recent_videos = wait_for_s3_object(
            s3_bucket, intermediate_folder_key, tmp_dir, 
            fetch_only=(lambda obj: obj.key.endswith(".mp4") and obj.size>0), 
            limit=10, training_job_name=job_name)

In [None]:
last_video = sorted(recent_videos)[-1]  # Pick which video to watch
os.system("mkdir -p ./src/tmp_render/ && cp {} ./src/tmp_render/last_video.mp4".format(last_video))
HTML('<video src="./src/tmp_render/last_video.mp4" controls autoplay></video>')

### Plot metrics for training job
We can see the reward metric of the training as it's running, using algorithm metrics that are recorded in CloudWatch metrics.  We can plot this to see the performance of the model over time.

In [None]:
%matplotlib inline
from sagemaker.analytics import TrainingJobAnalytics

if not local_mode:
    df = TrainingJobAnalytics(job_name, ['episode_reward_mean']).dataframe()
    num_metrics = len(df)
    if num_metrics == 0:
        print("No algorithm metrics found in CloudWatch")
    else:
        plt = df.plot(x='timestamp', y='value', figsize=(12,5), legend=True, style='b-')
        plt.set_ylabel('Mean reward per episode')
        plt.set_xlabel('Training time (s)')
else:
    print("Can't plot metrics in local mode.")

### Monitor training progress
You can repeatedly run the visualization cells to get the latest videos or see the latest metrics as the training job proceeds.

## Evaluation of RL models

We use the last checkpointed model to run evaluation for the RL Agent. 

### Load checkpointed model

Checkpointed data from the previously trained models will be passed on for evaluation / inference in the checkpoint channel. In local mode, we can simply use the local directory, whereas in the SageMaker mode, it needs to be moved to S3 first.

In [None]:
if local_mode:
    model_tar_key = "{}/model.tar.gz".format(job_name)
else:
    model_tar_key = "{}/output/model.tar.gz".format(job_name)
    
local_checkpoint_dir = "{}/model".format(tmp_dir)

wait_for_s3_object(s3_bucket, model_tar_key, tmp_dir, training_job_name=job_name)  

if not os.path.isfile("{}/model.tar.gz".format(tmp_dir)):
    raise FileNotFoundError("File model.tar.gz not found")
    
os.system("mkdir -p {}".format(local_checkpoint_dir))
os.system("tar -xvzf {}/model.tar.gz -C {}".format(tmp_dir, local_checkpoint_dir))

print("Checkpoint directory {}".format(local_checkpoint_dir))

In [None]:
if local_mode:
    checkpoint_path = 'file://{}'.format(local_checkpoint_dir)
    print("Local checkpoint file path: {}".format(local_checkpoint_dir))
else:
    checkpoint_path = "s3://{}/{}/checkpoint/".format(s3_bucket, job_name)
    if not os.listdir(local_checkpoint_dir):
        raise FileNotFoundError("Checkpoint files not found under the path")
    os.system("aws s3 cp --recursive {} {}".format(local_checkpoint_dir, checkpoint_path))
    print("S3 checkpoint file path: {}".format(checkpoint_path))

In [None]:
%%time
    
estimator_eval = RLEstimator(entry_point="evaluate-ray.py",
                        source_dir='src',
                        dependencies=["common/sagemaker_rl"],
                        image_uri=image_name,
                        role=role,
                        instance_type=instance_type,
                        instance_count=1,
                        base_job_name=job_name_prefix + "-evaluation",
                        hyperparameters={
                            "evaluate_episodes": 10,
                            "algorithm": "PPO",
                            "env": "CartPole-v1"
                        }
                    )

estimator_eval.fit({'model': checkpoint_path})
job_name = estimator_eval.latest_training_job.job_name
print("Evaluation job: %s" % job_name)

# Model deployment

Now let us deploy the RL policy so that we can get the optimal action, given an environment observation.

**Note**: Model deployment is supported for TensorFLow only at current stage. 

STOP HERE IF PYTORCH IS USED.

In [None]:
from sagemaker.tensorflow.model import TensorFlowModel

model = TensorFlowModel(model_data=estimator.model_data,
              framework_version='2.3.1',
              role=role)

predictor = model.deploy(initial_instance_count=1, 
                         instance_type=instance_type)

In [None]:
# ray 0.8.5 requires all the following inputs
# 'prev_action', 'is_training', 'prev_reward' and 'seq_lens' are placeholders for this example
# they won't affect prediction results

# Number of different values stored in at any time in the current state for the Cartpole example.
CARTPOLE_STATE_VALUES = 4

input = {"inputs": {'observations': np.ones(shape=(1, CARTPOLE_STATE_VALUES)).tolist(),
                    'prev_action': [0, 0],
                    'is_training': False,
                    'prev_reward': -1,
                    'seq_lens': -1
                   }
        }

In [None]:
print(input)

In [None]:
result = predictor.predict(input)

#result['outputs']['actions_0']

### Clean up endpoint

In [None]:
predictor.delete_endpoint()