# PyBullet Humanoid with Amazon SageMaker

---
## Introduction

PyBullet is an [open source](https://pybullet.org/wordpress/) physics simulator that is commonly used to train RL policies for simulated robotic systems.  PyBullety provides 3D visualization of physical systems with multiple joints in contact with each other and their environment.

In this notebook we install PyBullet into the SageMaker RL container (see the Dockerfile), and send training jobs of RL algorithms of the RLlib library.

In [1]:
env = "HumanoidBulletEnv-v0"

## Imports

To get started, we'll import the Python libraries we need, set up the environment with a few prerequisites for permissions and configurations.

In [4]:
import sagemaker
import boto3
import sys
import os
import glob
import re
import subprocess
import numpy as np
from IPython.display import HTML
import time
from time import gmtime, strftime

sys.path.append("common")
from misc import (
    get_execution_role,
    wait_for_s3_object
)
from common.docker_utils import (
    build_and_push_docker_image
)
from sagemaker.rl import (
    RLEstimator,
    RLToolkit,
    RLFramework
)

IMPORTANT: to continue, setup the AWS credentials follwing https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/setup-credentials.html

In [10]:
os.environ['AWS_ACCESS_KEY_ID'] = #
os.environ['AWS_SECRET_ACCESS_KEY'] = #
os.environ['AWS_REGION'] = #

### Setup S3 bucket

Set up the linkage and authentication to the S3 bucket that you want to use for checkpoint and the metadata. 

In [12]:
sage_session = sagemaker.session.Session()
s3_bucket = sage_session.default_bucket()
s3_output_path = "s3://{}/".format(s3_bucket)
print("S3 bucket path: {}".format(s3_output_path))

S3 bucket path: s3://sagemaker-us-west-2-565870655207/


### Define Variables 

We define variables such as the job prefix for the training jobs

In [22]:
job_name_prefix = "rl-humanoid-bullet"

### Configure instance type

There are multiple instance types to run RL training jobs in SageMaker. See https://aws.amazon.com/sagemaker/pricing/?nc1=h_ls.

In [14]:
instance_type = "ml.c5.9xlarge"

### Create an IAM role

Either get the execution role when running from a SageMaker notebook instance `role = sagemaker.get_execution_role()` or, when running from local notebook instance, use utils method `role = get_execution_role()` to create an execution role.

In [15]:
try:
    role = sagemaker.get_execution_role()
except:
    role = get_execution_role()

print("Using IAM role arn: {}".format(role))

Couldn't call 'get_role' to get Role ARN from role name miquelescobar to get Role path.


Created new sagemaker execution role: sagemaker
Using IAM role arn: arn:aws:iam::565870655207:role/sagemaker


### Install docker for `local` mode

In order to work in `local` mode, you need to have docker installed. When running from you local machine, please make sure that you have docker and docker-compose (for local CPU machines) and nvidia-docker (for local GPU machines) installed. Alternatively, when running from a SageMaker notebook instance, you can simply run the following script to install dependenceis.

Note, you can only run a single local notebook at one time.

In [16]:
# only run from SageMaker notebook instance
if local_mode:
    !/bin/bash ./common/setup.sh

## Build docker container

We must build a custom docker container with Ray, TensorFlow and PyBullet installed.  This takes care of everything:

1. Fetching base container image
2. Installing PyBullet and its dependencies
3. Uploading the new container image to ECR

This step can take a long time.


In [14]:
%%time

cpu_or_gpu = "gpu" if instance_type.startswith("ml.p") else "cpu"
repository_short_name = "sagemaker-bullet-ray-%s" % cpu_or_gpu
docker_build_args = {
    "CPU_OR_GPU": cpu_or_gpu,
    "AWS_REGION": boto3.Session().region_name,
}
custom_image_name = build_and_push_docker_image(repository_short_name, build_args=docker_build_args)
print("Using ECR image %s" % custom_image_name)

https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded
Logged into ECR
Building docker image sagemaker-bullet-ray-cpu from Dockerfile
$ docker build -t sagemaker-bullet-ray-cpu -f Dockerfile . --build-arg CPU_OR_GPU=cpu --build-arg AWS_REGION=us-west-2
Sending build context to Docker daemon    576kB
Step 1/14 : ARG CPU_OR_GPU
Step 2/14 : ARG AWS_REGION
Step 3/14 : FROM 462105765813.dkr.ecr.${AWS_REGION}.amazonaws.com/sagemaker-rl-ray-container:ray-0.8.2-tf-${CPU_OR_GPU}-py36
 ---> 24e9d976c4b2
Step 4/14 : WORKDIR /opt/ml
 ---> Using cache
 ---> e757479553a9
Step 5/14 : RUN apt-get update && apt-get install -y       git cmake ffmpeg pkg-config       qtbase5-dev libqt5opengl5-dev libassimp-dev       libtinyxml-dev       libgl1-mesa-dev     && cd /opt     && apt-get clean && rm -rf /var/cache/apt/archives/* /var/lib/apt/lists/*
 ---> Using cache
 ---> 3e406268b352
Step 6/14 : RUN apt-get update &&     apt-get install -y libboost-python-dev
 ---> Usi

## Training

The training code in the /src directory. 

In [21]:
!pygmentize src/train-humanoid-ppo.py

import json
import os

import gym
import ray
import pybullet_envs
from ray.tune import run_experiments
from ray.tune.registry import register_env
from sagemaker_rl.ray_launcher import SageMakerRayLauncher


def create_environment(env_config):
    # This import must happen inside the method so that worker processes import this code
    import roboschool

    return gym.make("HumanoidBulletEnv-v0")


class MyLauncher(SageMakerRayLauncher):
    def register_env_creator(self):
        register_env("HumanoidBulletEnv-v0", create_environment)

    def get_experiment_config(self):
        return {
            "training": {
                "env": "HumanoidBulletEnv-v0",
                "run": "PPO",
                "stop": {
                    "episode_reward_mean": 2500,
                },
                "config": {
                    "gamma": 0.99,
                    "kl_coeff": 1.0,
                    "lr": 0.0001,
                    "monitor": True, 
                    
            

## Train the RL model using the Python SDK Script mode

When using SageMaker for training, you can select a GPU or CPU instance. The RLEstimator is used for training RL jobs. The follwoing steps must be made:

1. Specify the source directory where the environment, presets and training code is uploaded.
2. Specify the entry point as the training code 
3. Specify the choice of RL toolkit and framework. This automatically resolves to the ECR path for the RL Container. 
4. Define the training parameters such as the instance count, job name, S3 path for output and job name. 
5. Specify the hyperparameters for the RL agent algorithm. The RLCOACH_PRESET or the RLRAY_PRESET can be used to specify the RL agent algorithm you want to use. 
6. Define the metrics definitions that you are interested in capturing in your logs. These can also be visualized in CloudWatch and SageMaker Notebooks. 

In [19]:
%%time

ALG = 'ppo'

metric_definitions = RLEstimator.default_metric_definitions(RLToolkit.RAY)
int_regex = '[0-9]+'
float_regex = "[-+]?[0-9]*[.]?[0-9]+([eE][-+]?[0-9]+)?"  # noqa: W605, E501
metric_definitions.append( {"Name": "timesteps_total", "Regex": "timesteps_total: (%s)" % int_regex} )
metric_definitions.append( {"Name": "time_this_iter_s", "Regex": "time_this_iter_s: (%s)" % float_regex} )

custom_image_name = "565870655207.dkr.ecr.us-west-2.amazonaws.com/sagemaker-bullet-ray-cpu:latest"



estimator = RLEstimator(
    entry_point="train-%s.py" % f"humanoid-{ALG}",
    source_dir="src",
    dependencies=["common/sagemaker_rl"],
    image_uri=custom_image_name,
    role=role,
    instance_type=instance_type,
    instance_count=1,
    output_path=s3_output_path,
    base_job_name=job_name_prefix+'-'+ALG,
    metric_definitions=metric_definitions,
    hyperparameters={
    },
)

estimator.fit(wait=local_mode)
job_name = estimator.latest_training_job.job_name
print("Training job: %s" % job_name)

Training job: rl-humanoid-bullet-ppo
Wall time: 3.34 s


## Visualization

RL training can take a long time.  So while it's running there are a variety of ways we can track progress of the running training job.  Some intermediate output gets saved to S3 during training, so we'll set up to capture that.

In [None]:
print("Job name: {}".format(job_name))

s3_url = "s3://{}/{}".format(s3_bucket, job_name)

intermediate_folder_key = "{}/output/intermediate/".format(job_name)
intermediate_url = "s3://{}/{}".format(s3_bucket, intermediate_folder_key)

print("S3 job path: {}".format(s3_url))
print("Intermediate folder path: {}".format(intermediate_url))

tmp_dir = "/tmp/{}".format(job_name)
os.system("mkdir {}".format(tmp_dir))
print("Create local folder {}".format(tmp_dir))

### Fetch videos of training 
Videos of certain rollouts get written to S3 during training.  Here we fetch the last 10 videos from S3, and render the last one.

In [None]:
recent_videos = wait_for_s3_object(
    s3_bucket,
    intermediate_folder_key,
    tmp_dir,
    fetch_only=(lambda obj: obj.key.endswith(".mp4") and obj.size > 0),
    limit=10,
    training_job_name=job_name,
)

In [None]:
last_video = sorted(recent_videos)[-1]  # Pick which video to watch
os.system("mkdir -p ./src/tmp_render/ && cp {} ./src/tmp_render/last_video.mp4".format(last_video))
HTML('<video src="./src/tmp_render/last_video.mp4" controls autoplay></video>')

### Plot metrics for training job
We can see the reward metric of the training as it's running, using algorithm metrics that are recorded in CloudWatch metrics.  We can plot this to see the performance of the model over time.

In [None]:
%matplotlib inline
from sagemaker.analytics import TrainingJobAnalytics

if not local_mode:
    df = TrainingJobAnalytics(job_name, ["episode_reward_mean"]).dataframe()
    num_metrics = len(df)
    if num_metrics == 0:
        print("No algorithm metrics found in CloudWatch")
    else:
        plt = df.plot(x="timestamp", y="value", figsize=(12, 5), legend=True, style="b-")
        plt.set_ylabel("Mean reward per episode")
        plt.set_xlabel("Training time (s)")
else:
    print("Can't plot metrics in local mode.")

### Monitor training progress
You can repeatedly run the visualization cells to get the latest videos or see the latest metrics as the training job proceeds.

## Evaluation of RL models

We use the last checkpointed model to run evaluation for the RL Agent. 

### Load checkpointed model

Checkpointed data from the previously trained models will be passed on for evaluation / inference in the checkpoint channel. In local mode, we can simply use the local directory, whereas in the SageMaker mode, it needs to be moved to S3 first.

In [None]:
if local_mode:
    model_tar_key = "{}/model.tar.gz".format(job_name)
else:
    model_tar_key = "{}/output/model.tar.gz".format(job_name)

local_checkpoint_dir = "{}/model".format(tmp_dir)

wait_for_s3_object(s3_bucket, model_tar_key, tmp_dir, training_job_name=job_name)

if not os.path.isfile("{}/model.tar.gz".format(tmp_dir)):
    raise FileNotFoundError("File model.tar.gz not found")

os.system("mkdir -p {}".format(local_checkpoint_dir))
os.system("tar -xvzf {}/model.tar.gz -C {}".format(tmp_dir, local_checkpoint_dir))

print("Checkpoint directory {}".format(local_checkpoint_dir))

In [None]:
if local_mode:
    checkpoint_path = "file://{}".format(local_checkpoint_dir)
    print("Local checkpoint file path: {}".format(local_checkpoint_dir))
else:
    checkpoint_path = "s3://{}/{}/checkpoint/".format(s3_bucket, job_name)
    if not os.listdir(local_checkpoint_dir):
        raise FileNotFoundError("Checkpoint files not found under the path")
    os.system("aws s3 cp --recursive {} {}".format(local_checkpoint_dir, checkpoint_path))
    print("S3 checkpoint file path: {}".format(checkpoint_path))

In [None]:
%%time

estimator_eval = RLEstimator(
    entry_point="evaluate-ray.py",
    source_dir="src",
    dependencies=["common/sagemaker_rl"],
    image_uri=custom_image_name,
    role=role,
    instance_type=instance_type,
    instance_count=1,
    base_job_name=job_name_prefix + "-evaluation",
    hyperparameters={
        "evaluate_episodes": 5,
        "algorithm": "PPO",
        "env":env,
    },
)

estimator_eval.fit({"model": checkpoint_path})
job_name = estimator_eval.latest_training_job.job_name
print("Evaluation job: %s" % job_name)

### Visualize the output 

Optionally, you can run the steps defined earlier to visualize the output.

# Model deployment

Now let us deploy the RL policy so that we can get the optimal action, given an environment observation.

In [None]:
from sagemaker.tensorflow.model import TensorFlowModel

model = TensorFlowModel(model_data=estimator.model_data, framework_version="2.1.0", role=role)

predictor = model.deploy(initial_instance_count=1, instance_type=instance_type)

In [None]:
# Mapping of environments to observation space
observation_space_mapping = {"reacher": 9, "hopper": 15, "humanoid": 44}

Now let us predict the actions using a dummy observation

In [None]:
# ray 0.8.2 requires all the following inputs
# 'prev_action', 'is_training', 'prev_reward' and 'seq_lens' are placeholders for this example
# they won't affect prediction results

input = {
    "inputs": {
        "observations": np.ones(shape=(1, observation_space_mapping[roboschool_problem])).tolist(),
        "prev_action": [0, 0],
        "is_training": False,
        "prev_reward": -1,
        "seq_lens": -1,
    }
}

In [None]:
result = predictor.predict(input)

result["outputs"]["actions"]

### Clean up endpoint

In [None]:
predictor.delete_endpoint()