# 使用 Stable baselines 在 Amazon SageMaker 上进行强化学习训练

## 概要介绍


<img src="https://stable-baselines.readthedocs.io/en/master/_static/logo.png" width="300">

[OpenAI Gym](https://gym.openai.com) 是一个开源的强化学习工具包,它提供了标准的接口和一组环境, 通过这些环境我们可以快速的进行强化学习实验. 

[Stable baselines](https://stable-baselines.readthedocs.io/en/master/) 是在OpenAI Baselines 基础算法上进行增强的开源强化学习算法项目. 

本次实验我们将使用stable baselines 自带的算法进行对OpenAI Gym自带的雅达利游戏 '吃豆人' [**MsPacman-v0**](https://gym.openai.com/envs/MsPacman-v0/) 进行训练.





In [12]:
rl_problem = 'pacman'

## 前置条件

### 导入

导入我们需要的Python库, 以及需要的辅助方法: get_execution_role, wait_for_s3_object.

In [13]:
import sagemaker
import boto3
import sys
import os
import subprocess
from IPython.display import HTML
import time
from time import gmtime, strftime
sys.path.append("common")
from misc import wait_for_s3_object
from docker_utils import build_and_push_docker_image
from sagemaker.rl import RLEstimator

### 设置 S3 桶

通过Sagemaker SDK获取默认s3桶, 该桶将会存储模型,检查点和其他元数据

In [14]:
sage_session = sagemaker.session.Session()
s3_bucket = sage_session.default_bucket()  
s3_output_path = 's3://{}/'.format(s3_bucket)
print("S3 bucket path: {}".format(s3_output_path))

S3 bucket path: s3://sagemaker-us-east-1-596030579944/


### 定义任务/Image 名称变量 

我们定义为训练任务和image定义前缀变量: job_name_prefix*

In [15]:
# create a descriptive job name 
job_name_prefix = 'rl-stabebaselines-'+rl_problem

### 获取IAM角色

使用SageMaker SDK 的`get_execution_role()` 获取SageMaker Notebook的Role, `role = sagemaker.get_execution_role()` 

In [16]:
role = sagemaker.get_execution_role()
print("Using IAM role arn: {}".format(role))

Using IAM role arn: arn:aws:iam::596030579944:role/service-role/AmazonSageMaker-ExecutionRole-20191130T110013


## 构建 docker 镜像

我们必须要构建自己的docker 镜像.  This takes care of everything:

1. 拉取基础镜像
2. 安装g++,cmake 等编译工具
3. 安装stable-baselines 和它需要的依赖库, etc OpenMPI
3. 将镜像上传到Amazone ECR 

这个步骤通常会花费 3-10分钟,具体时间取决于你的网络速度和notebook实例类型.



In [7]:
%%time

cpu_or_gpu = 'gpu' if instance_type.startswith('ml.p') else 'cpu'
repository_short_name = "sagemaker-roboschool-stablebaselines-%s" % cpu_or_gpu
docker_build_args = { 
    'AWS_REGION': boto3.Session().region_name,
}
custom_image_name = build_and_push_docker_image(repository_short_name, build_args=docker_build_args)
print("Using ECR image %s" % custom_image_name)

Logged into ECR
Building docker image sagemaker-roboschool-stablebaselines-cpu from Dockerfile
$ docker build -t sagemaker-roboschool-stablebaselines-cpu -f Dockerfile . --build-arg AWS_REGION=us-east-1
Sending build context to Docker daemon  1.466MB
Step 1/42 : ARG AWS_REGION
Step 2/42 : FROM 520713654638.dkr.ecr.${AWS_REGION}.amazonaws.com/sagemaker-rl-tensorflow:coach0.11.0-cpu-py3
 ---> 69468aab742c
Step 3/42 : RUN apt update
 ---> Using cache
 ---> d6132b0bb34b
Step 4/42 : RUN apt-get install -y gcc-4.9 cmake
 ---> Using cache
 ---> 228854c556ef
Step 5/42 : RUN update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.9 50
 ---> Using cache
 ---> 27a9a8ccde4d
Step 6/42 : RUN apt-get install -y g++-4.9
 ---> Using cache
 ---> e2cb24dbaf4c
Step 7/42 : RUN update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.9 50
 ---> Using cache
 ---> aceaa03dc247
Step 8/42 : RUN buildDeps="         wget         build-essential     "     && apt-get update && apt-get install -y --no-ins

(Reading database ... 19119 files and directories currently installed.)
Preparing to unpack .../openssh-client_1%3a7.2p2-4ubuntu2.10_amd64.deb ...
Unpacking openssh-client (1:7.2p2-4ubuntu2.10) ...
Selecting previously unselected package openssh-sftp-server.
Preparing to unpack .../openssh-sftp-server_1%3a7.2p2-4ubuntu2.10_amd64.deb ...
Unpacking openssh-sftp-server (1:7.2p2-4ubuntu2.10) ...
Selecting previously unselected package openssh-server.
Preparing to unpack .../openssh-server_1%3a7.2p2-4ubuntu2.10_amd64.deb ...
Unpacking openssh-server (1:7.2p2-4ubuntu2.10) ...
Processing triggers for systemd (229-4ubuntu21.8) ...
Setting up openssh-client (1:7.2p2-4ubuntu2.10) ...
Setting up openssh-sftp-server (1:7.2p2-4ubuntu2.10) ...
Setting up openssh-server (1:7.2p2-4ubuntu2.10) ...
debconf: unable to initialize frontend: Dialog
debconf: (TERM is not set, so the dialog frontend is not usable.)
debconf: falling back to frontend: Readline
Creating SSH2 RSA key; this may take some time ...


4 upgraded, 120 newly installed, 0 to remove and 143 not upgraded.
Need to get 28.7 MB of archives.
After this operation, 140 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu xenial/main amd64 libpopt0 amd64 1.16-10 [26.0 kB]
Get:2 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 fontconfig amd64 2.11.94-0ubuntu1.1 [178 kB]
Get:3 http://archive.ubuntu.com/ubuntu xenial/main amd64 libmtdev1 amd64 1.1.5-1ubuntu2 [13.8 kB]
Get:4 http://archive.ubuntu.com/ubuntu xenial/main amd64 libpcre16-3 amd64 2:8.38-3.1 [144 kB]
Get:5 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 zlib1g amd64 1:1.2.8.dfsg-2ubuntu4.3 [51.2 kB]
Get:6 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 libqt5core5a amd64 5.5.1+dfsg-16ubuntu7.7 [1817 kB]
Get:7 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 libwayland-server0 amd64 1.12.0-1~ubuntu16.04.3 [28.0 kB]
Get:8 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 libgbm1 amd64 18.0.5-0

Get:79 http://archive.ubuntu.com/ubuntu xenial/main amd64 libxcb-randr0-dev amd64 1.11.1-1ubuntu1 [18.2 kB]
Get:80 http://archive.ubuntu.com/ubuntu xenial/main amd64 libxcb-shape0-dev amd64 1.11.1-1ubuntu1 [6900 B]
Get:81 http://archive.ubuntu.com/ubuntu xenial/main amd64 libxcb-xfixes0-dev amd64 1.11.1-1ubuntu1 [11.2 kB]
Get:82 http://archive.ubuntu.com/ubuntu xenial/main amd64 libxcb-sync-dev amd64 1.11.1-1ubuntu1 [10.1 kB]
Get:83 http://archive.ubuntu.com/ubuntu xenial/main amd64 libxcb-present-dev amd64 1.11.1-1ubuntu1 [6618 B]
Get:84 http://archive.ubuntu.com/ubuntu xenial/main amd64 libxshmfence-dev amd64 1.2-1 [3676 B]
Get:85 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 libx11-xcb-dev amd64 2:1.6.3-1ubuntu2.2 [9684 B]
Get:86 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 libwayland-cursor0 amd64 1.12.0-1~ubuntu16.04.3 [10.1 kB]
Get:87 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 libwayland-bin amd64 1.12.0-1~ubuntu16.04.3 [18.4 kB]
Get:88 

Selecting previously unselected package libxkbcommon0:amd64.
Preparing to unpack .../libxkbcommon0_0.5.0-1ubuntu2.1_amd64.deb ...
Unpacking libxkbcommon0:amd64 (0.5.0-1ubuntu2.1) ...
Selecting previously unselected package libmirclient9:amd64.
Preparing to unpack .../libmirclient9_0.26.3+16.04.20170605-0ubuntu1.1_amd64.deb ...
Unpacking libmirclient9:amd64 (0.26.3+16.04.20170605-0ubuntu1.1) ...
Selecting previously unselected package libwayland-client0:amd64.
Preparing to unpack .../libwayland-client0_1.12.0-1~ubuntu16.04.3_amd64.deb ...
Unpacking libwayland-client0:amd64 (1.12.0-1~ubuntu16.04.3) ...
Preparing to unpack .../libx11-xcb1_2%3a1.6.3-1ubuntu2.2_amd64.deb ...
Unpacking libx11-xcb1:amd64 (2:1.6.3-1ubuntu2.2) over (2:1.6.3-1ubuntu2.1) ...
Selecting previously unselected package libegl1-mesa:amd64.
Preparing to unpack .../libegl1-mesa_18.0.5-0ubuntu0~16.04.1_amd64.deb ...
Unpacking libegl1-mesa:amd64 (18.0.5-0ubuntu0~16.04.1) ...
Selecting previously unselected package libevdev

Unpacking xtrans-dev (1.3.5-1) ...
Selecting previously unselected package libpthread-stubs0-dev:amd64.
Preparing to unpack .../libpthread-stubs0-dev_0.3-4_amd64.deb ...
Unpacking libpthread-stubs0-dev:amd64 (0.3-4) ...
Selecting previously unselected package libxcb1-dev:amd64.
Preparing to unpack .../libxcb1-dev_1.11.1-1ubuntu1_amd64.deb ...
Unpacking libxcb1-dev:amd64 (1.11.1-1ubuntu1) ...
Selecting previously unselected package libx11-dev:amd64.
Preparing to unpack .../libx11-dev_2%3a1.6.3-1ubuntu2.2_amd64.deb ...
Unpacking libx11-dev:amd64 (2:1.6.3-1ubuntu2.2) ...
Selecting previously unselected package x11proto-xext-dev.
Preparing to unpack .../x11proto-xext-dev_7.3.0-1_all.deb ...
Unpacking x11proto-xext-dev (7.3.0-1) ...
Selecting previously unselected package libxext-dev:amd64.
Preparing to unpack .../libxext-dev_2%3a1.3.3-1_amd64.deb ...
Unpacking libxext-dev:amd64 (2:1.3.3-1) ...
Selecting previously unselected package x11proto-xf86vidmode-dev.
Preparing to unpack .../x11prot

Selecting previously unselected package libqt5xml5:amd64.
Preparing to unpack .../libqt5xml5_5.5.1+dfsg-16ubuntu7.7_amd64.deb ...
Unpacking libqt5xml5:amd64 (5.5.1+dfsg-16ubuntu7.7) ...
Selecting previously unselected package qtchooser.
Preparing to unpack .../qtchooser_52-gae5eeef-2build1~gcc5.2_amd64.deb ...
Unpacking qtchooser (52-gae5eeef-2build1~gcc5.2) ...
Selecting previously unselected package qt5-qmake:amd64.
Preparing to unpack .../qt5-qmake_5.5.1+dfsg-16ubuntu7.7_amd64.deb ...
Unpacking qt5-qmake:amd64 (5.5.1+dfsg-16ubuntu7.7) ...
Selecting previously unselected package qtbase5-dev-tools.
Preparing to unpack .../qtbase5-dev-tools_5.5.1+dfsg-16ubuntu7.7_amd64.deb ...
Unpacking qtbase5-dev-tools (5.5.1+dfsg-16ubuntu7.7) ...
Selecting previously unselected package qtbase5-dev:amd64.
Preparing to unpack .../qtbase5-dev_5.5.1+dfsg-16ubuntu7.7_amd64.deb ...
Unpacking qtbase5-dev:amd64 (5.5.1+dfsg-16ubuntu7.7) ...
Selecting previously unselected package libqt5opengl5-dev:amd64.
Pre

Setting up libqt5sql5:amd64 (5.5.1+dfsg-16ubuntu7.7) ...
Setting up libqt5test5:amd64 (5.5.1+dfsg-16ubuntu7.7) ...
Setting up libqt5xml5:amd64 (5.5.1+dfsg-16ubuntu7.7) ...
Setting up qtchooser (52-gae5eeef-2build1~gcc5.2) ...
Setting up qt5-qmake:amd64 (5.5.1+dfsg-16ubuntu7.7) ...
Setting up qtbase5-dev-tools (5.5.1+dfsg-16ubuntu7.7) ...
Setting up qtbase5-dev:amd64 (5.5.1+dfsg-16ubuntu7.7) ...
Setting up libqt5opengl5-dev:amd64 (5.5.1+dfsg-16ubuntu7.7) ...
Setting up libqt5sql5-sqlite:amd64 (5.5.1+dfsg-16ubuntu7.7) ...
Setting up libtinyxml2.6.2v5:amd64 (2.6.2-3) ...
Setting up libtinyxml-dev:amd64 (2.6.2-3) ...
Setting up libwacom-bin (0.22-1~ubuntu16.04.1) ...
Setting up libx11-doc (2:1.6.3-1ubuntu2.2) ...
Setting up pkg-config (0.29.1-0ubuntu1) ...
Setting up qttranslations5-l10n (5.5.1-2build1) ...
Setting up libassimp3v5 (3.2~dfsg-3) ...
Setting up libassimp-dev (3.2~dfsg-3) ...
Processing triggers for libc-bin (2.23-0ubuntu10) ...
Processing triggers for systemd (229-4ubuntu21.8

Selecting previously unselected package libboost-python1.58.0.
Preparing to unpack .../libboost-python1.58.0_1.58.0+dfsg-5ubuntu3.1_amd64.deb ...
Unpacking libboost-python1.58.0 (1.58.0+dfsg-5ubuntu3.1) ...
Selecting previously unselected package libpython2.7:amd64.
Preparing to unpack .../libpython2.7_2.7.12-1ubuntu0~16.04.18_amd64.deb ...
Unpacking libpython2.7:amd64 (2.7.12-1ubuntu0~16.04.18) ...
Selecting previously unselected package libpython2.7-dev:amd64.
Preparing to unpack .../libpython2.7-dev_2.7.12-1ubuntu0~16.04.18_amd64.deb ...
Unpacking libpython2.7-dev:amd64 (2.7.12-1ubuntu0~16.04.18) ...
Selecting previously unselected package libpython-dev:amd64.
Preparing to unpack .../libpython-dev_2.7.12-1~16.04_amd64.deb ...
Unpacking libpython-dev:amd64 (2.7.12-1~16.04) ...
Selecting previously unselected package python2.7-dev.
Preparing to unpack .../python2.7-dev_2.7.12-1ubuntu0~16.04.18_amd64.deb ...
Unpacking python2.7-dev (2.7.12-1ubuntu0~16.04.18) ...
Selecting previously un

Installing collected packages: roboschool
Successfully installed roboschool-1.0.48
Removing intermediate container 449c881e8cff
 ---> e4b8c4e2dfa4
Step 36/42 : ENV PYTHONUNBUFFERED 1
 ---> Running in b785b97db76f
Removing intermediate container b785b97db76f
 ---> 39708f87a2d0
Step 37/42 : RUN apt-get update && apt-get install -y cmake libopenmpi-dev python3-dev zlib1g-dev wget
 ---> Running in 82244dc26f6e
Get:1 http://ppa.launchpad.net/deadsnakes/ppa/ubuntu xenial InRelease [18.1 kB]
Get:2 http://security.ubuntu.com/ubuntu xenial-security InRelease [109 kB]
Get:3 http://archive.ubuntu.com/ubuntu xenial InRelease [247 kB]
Get:4 http://ppa.launchpad.net/deadsnakes/ppa/ubuntu xenial/main amd64 Packages [41.5 kB]
Get:5 http://security.ubuntu.com/ubuntu xenial-security/main amd64 Packages [2002 kB]
Get:6 http://archive.ubuntu.com/ubuntu xenial-updates InRelease [109 kB]
Get:7 http://archive.ubuntu.com/ubuntu xenial-backports InRelease [107 kB]
Get:8 http://archive.ubuntu.com/ubuntu xenial/

Selecting previously unselected package libpython3.5-dev:amd64.
Preparing to unpack .../libpython3.5-dev_3.5.2-2ubuntu0~16.04.13_amd64.deb ...
Unpacking libpython3.5-dev:amd64 (3.5.2-2ubuntu0~16.04.13) ...
Selecting previously unselected package libpython3-dev:amd64.
Preparing to unpack .../libpython3-dev_3.5.1-3_amd64.deb ...
Unpacking libpython3-dev:amd64 (3.5.1-3) ...
Selecting previously unselected package libtool.
Preparing to unpack .../libtool_2.4.6-0.1_all.deb ...
Unpacking libtool (2.4.6-0.1) ...
Selecting previously unselected package libhwloc5:amd64.
Preparing to unpack .../libhwloc5_1.11.2-3_amd64.deb ...
Unpacking libhwloc5:amd64 (1.11.2-3) ...
Selecting previously unselected package libibverbs1.
Preparing to unpack .../libibverbs1_1.1.8-1.1ubuntu2_amd64.deb ...
Unpacking libibverbs1 (1.1.8-1.1ubuntu2) ...
Selecting previously unselected package libopenmpi1.10.
Preparing to unpack .../libopenmpi1.10_1.10.2-8ubuntu1_amd64.deb ...
Unpacking libopenmpi1.10 (1.10.2-8ubuntu1) .

Installing collected packages: atari-py, stable-baselines
Successfully installed atari-py-0.2.6 stable-baselines-2.10.2
Removing intermediate container e3a6342422c9
 ---> 646bf5c715d8
Step 42/42 : RUN python -c "import gym; import roboschool;"
 ---> Running in 0e4823ad5c6b
Removing intermediate container 0e4823ad5c6b
 ---> abfe67e008bd
Successfully built abfe67e008bd
Successfully tagged sagemaker-roboschool-stablebaselines-cpu:latest
Done building docker image sagemaker-roboschool-stablebaselines-cpu
Created new ECR repository: sagemaker-roboschool-stablebaselines-cpu
https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded
Logged into ECR
$ docker tag sagemaker-roboschool-stablebaselines-cpu 596030579944.dkr.ecr.us-east-1.amazonaws.com/sagemaker-roboschool-stablebaselines-cpu
Pushing docker image to ECR repository 596030579944.dkr.ecr.us-east-1.amazonaws.com/sagemaker-roboschool-stablebaselines-cpu

$ docker push 596030579944.dkr.ecr.us-east-1.ama

## 编写训练代码

#### 配置 RL 算法超参数

配置RL训练作业的预设文件是在./src目录中的preset-pacman.py中定义的。使用预设文件，您可以定义代理参数以选择特定的代理算法。您还可以设置环境参数，定义计划和可视化参数以及定义图形管理器。预设包含以下PPO1训练的需要超参数：

* `num_timesteps`: (int) Number of training steps - Preset: 1e4
* `timesteps_per_actorbatch` – (int) timesteps per actor per update - Preset: 2048
* `clip_param` – (float) clipping parameter epsilon - Preset: 0.2
* `entcoeff` – (float) the entropy loss weight - Preset: 0.0
* `optim_epochs` – (float) the optimizer’s number of epochs - Preset: 10
* `optim_stepsize` – (float) the optimizer’s stepsize - Preset: 3e-4
* `optim_batchsize` – (int) the optimizer’s the batch size - Preset: 64
* `gamma` – (float) discount factor - Preset: 0.99
* `lam` – (float) advantage estimation - Preset: 0.95
* `schedule` – (str) The type of scheduler for the learning rate update (‘linear’, ‘constant’, ‘double_linear_con’, ‘middle_drop’ or ‘double_middle_drop’) - Preset: linear
* `verbose` – (int) the verbosity level: 0 none, 1 training information, 2 tensorflow debug - Preset: 1

你可以在这里获取到完整的PPO1算法超参数列表和详细文档: https://stable-baselines.readthedocs.io/en/master/modules/ppo1.html


通过指定RLSTABLEBASELINES_PRESET超参数，可以制定预设超参数定义文件,这里我们使用了`"RLSTABLEBASELINES_PRESET":"preset-{}.py".format(rl_problem)`

####  查看preset-pacman.py 超参数定义文件

In [17]:
!pygmentize src/preset-{rl_problem}.py

[34mimport[39;49;00m [04m[36margparse[39;49;00m

[34mfrom[39;49;00m [04m[36msagemaker_rl[39;49;00m[04m[36m.[39;49;00m[04m[36mstable_baselines_launcher[39;49;00m [34mimport[39;49;00m SagemakerStableBaselinesPPO1Launcher, create_env


[34mdef[39;49;00m [32mparse_args[39;49;00m():
    parser = argparse.ArgumentParser()
    parser.add_argument([33m'[39;49;00m[33m--output_path[39;49;00m[33m'[39;49;00m, default=[33m"[39;49;00m[33m/opt/ml/output/intermediate/[39;49;00m[33m"[39;49;00m, [36mtype[39;49;00m=[36mstr[39;49;00m)
    parser.add_argument([33m'[39;49;00m[33m--num_timesteps[39;49;00m[33m'[39;49;00m, default=[34m1e4[39;49;00m) [37m#default 1e4[39;49;00m
    parser.add_argument([33m'[39;49;00m[33m--timesteps_per_actorbatch[39;49;00m[33m'[39;49;00m, default=[34m2048[39;49;00m, [36mtype[39;49;00m=[36mint[39;49;00m)
    parser.add_argument([33m'[39;49;00m[33m--clip_param[39;49;00m[33m'[39;49;00m, default=[34m0.2[39

#### 编写训练代码

训练代码在`./src`目录中的`train_stable_baselines.py` 文件.

In [18]:
!pygmentize src/train_stable_baselines.py

[34mimport[39;49;00m [04m[36margparse[39;49;00m

[34mfrom[39;49;00m [04m[36msagemaker_rl[39;49;00m[04m[36m.[39;49;00m[04m[36mmpi_launcher[39;49;00m [34mimport[39;49;00m MPILauncher


[34mdef[39;49;00m [32mparse_args[39;49;00m():
    parser = argparse.ArgumentParser()
    parser.add_argument([33m'[39;49;00m[33m--RLSTABLEBASELINES_PRESET[39;49;00m[33m'[39;49;00m, required=[34mTrue[39;49;00m, [36mtype[39;49;00m=[36mstr[39;49;00m)
    parser.add_argument([33m'[39;49;00m[33m--output_path[39;49;00m[33m'[39;49;00m, default=[33m"[39;49;00m[33m/opt/ml/output/intermediate/[39;49;00m[33m"[39;49;00m, [36mtype[39;49;00m=[36mstr[39;49;00m)
    parser.add_argument([33m'[39;49;00m[33m--instance_type[39;49;00m[33m'[39;49;00m, [36mtype[39;49;00m=[36mstr[39;49;00m)

    [34mreturn[39;49;00m parser.parse_known_args()


[34mif[39;49;00m [31m__name__[39;49;00m == [33m"[39;49;00m[33m__main__[39;49;00m[33m"[39;49;00m:
    a

## 使用SageMaker SDK 创建 RL 训练任务

你可以选择 GPU 或者 CPU 来创建SageMaker 训练任务. SageMaker SDK提供了`RLEstimator类`用来创建RL训练任务. 

1. Specify the source directory where the environment, presets and training code is uploaded.
2. Specify the entry point as the training code 
3. Specify the choice of RL toolkit and framework. This automatically resolves to the ECR path for the RL Container. 
4. Define the training parameters such as the instance count, job name, S3 path for output and job name. 
5. Specify the hyperparameters for the RL agent algorithm. The `RLSTABLEBASELINES_PRESET` can be used to specify the RL agent algorithm you want to use. 
6. Define the metrics definitions that you are interested in capturing in your logs. These can also be visualized in CloudWatch and SageMaker Notebooks. 

请注意,所有`preset-pacman.py`文件里的预设超参数都可以通过 `hyperparameters` 进行覆盖.

**Note**: PPO1算法需要使用到MPI, 本次实验中请将实例数量 `instance_count` 设置为 `1` .

In [None]:
%%time

instance_type = "ml.c5.xlarge"
custom_image_name = '596030579944.dkr.ecr.us-east-1.amazonaws.com/sagemaker-roboschool-stablebaselines-cpu'
estimator = RLEstimator(entry_point="train_stable_baselines.py",
                        source_dir='src',
                        dependencies=["common/sagemaker_rl"],
                        image_uri=custom_image_name,
                        role=role,
                        instance_type=instance_type,
                        use_spot_instances=True,
                        max_wait = (72 * 60 * 60),
                        instance_count=1,
                        output_path=s3_output_path,
                        base_job_name=job_name_prefix,
                        hyperparameters={
                            "RLSTABLEBASELINES_PRESET":"preset-{}.py".format(rl_problem),
                            "num_timesteps":1e4,
                            "instance_type":instance_type
                        },
                        metric_definitions= [
                            {
                                "Name":"EpisodesLengthMean",
                                "Regex":"\[.*,.*\]\<stdout\>\:\| *EpLenMean *\| *([-+]?[0-9]*\.?[0-9]*) *\|"
                            },
                            {
                                "Name":"EpisodesRewardMean",
                                "Regex":"\[.*,.*\]\<stdout\>\:\| *EpRewMean *\| *([-+]?[0-9]*\.?[0-9]*) *\|"
                            },
                            {
                                "Name":"EpisodesSoFar",
                                "Regex":"\[.*,.*\]\<stdout\>\:\| *EpisodesSoFar *\| *([-+]?[0-9]*\.?[0-9]*) *\|"
                            }
                        ]
                    )

estimator.fit(wait=True)

2021-04-18 02:53:10 Starting - Starting the training job...
2021-04-18 02:53:11 Starting - Launching requested ML instancesProfilerReport-1618714389: InProgress
......
2021-04-18 02:54:38 Starting - Preparing the instances for training......
2021-04-18 02:55:30 Downloading - Downloading input data
2021-04-18 02:55:30 Training - Downloading the training image.........
2021-04-18 02:57:07 Training - Training image download completed. Training in progress..[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2021-04-18 02:57:08,615 sagemaker-containers INFO     Imported framework sagemaker_tensorflow_container.training[0m
[34m2021-04-18 02:57:08,619 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2021-04-18 02:57:08,775 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2021-04-18 02:57:08,784 sagemaker-containers INFO     Invoking us

[34m[1,0]<stderr>:The TensorFlow contrib module will not be included in TensorFlow 2.0.[0m
[34m[1,0]<stderr>:For more information, please see:[0m
[34m[1,0]<stderr>:  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md[0m
[34m[1,0]<stderr>:  * https://github.com/tensorflow/addons[0m
[34m[1,0]<stderr>:  * https://github.com/tensorflow/io (for I/O related ops)[0m
[34m[1,0]<stderr>:If you depend on functionality not listed there, please file an issue.[0m
[34m[1,0]<stderr>:[0m
[34m[1,0]<stderr>:  "stable-baselines is in maintenance mode, please use [Stable-Baselines3 (SB3)](https://github.com/DLR-RM/stable-baselines3) for an up-to-date version. You can find a [migration guide](https://stable-baselines3.readthedocs.io/en/master/guide/migration.html) in SB3 documentation."[0m
[34m[1,0]<stdout>:Starting training for half cheetah with PPO1[0m
[34m[1,0]<stdout>:Launching training script with stable baselines PPO1[0m
[34m[1,0]<stderr>:[0m
[

[34m[1,0]<stdout>:Optimizing...[0m
[34m[1,0]<stdout>:     pol_surr |    pol_entpen |       vf_loss |            kl |           ent[0m
[34m[1,0]<stdout>:     -0.00019 |       0.00000 |      76.42161 |       0.00033 |       2.18525[0m
[34m[1,0]<stdout>:     -0.00189 |       0.00000 |      69.33100 |       0.00153 |       2.18453[0m
[34m[1,0]<stdout>:     -0.00335 |       0.00000 |      65.36742 |       0.00366 |       2.18366[0m
[34m[1,0]<stdout>:     -0.00398 |       0.00000 |      63.08681 |       0.00442 |       2.18599[0m
[34m[1,0]<stdout>:     -0.00465 |       0.00000 |      61.77058 |       0.00632 |       2.18273[0m
[34m[1,0]<stdout>:     -0.00465 |       0.00000 |      60.88961 |       0.00659 |       2.18247[0m
[34m[1,0]<stdout>:     -0.00454 |       0.00000 |      60.38831 |       0.00669 |       2.18277[0m
[34m[1,0]<stdout>:     -0.00421 |       0.00000 |      60.07469 |       0.00645 |       2.18206[0m
[34m[1,0]<stdout>:     -0.00453 |       0.00000 |    

## 可视化

强化学习训练通常需要很长时间，因此在运行过程中我们需要通过多种方式来跟踪正在运行的培训工作的进度。在训练期间，训练任务可以将一些中间输出到S3，我们可以根据这些这里中间输出来进行监控或者分析。

### 获取训练输出的视频
在训练期间，可以将环境的训练视频将输出到S3。接下来，我们将获取所有可用的视频，并且在notebook中渲染最后一个。

In [53]:

job_name = estimator.latest_training_job.job_name
print("Training job: %s" % job_name)

s3_url = "s3://{}/{}".format(s3_bucket,job_name)

output_tar_key = "{}/output/output.tar.gz".format(job_name)

intermediate_folder_key = "{}/output/intermediate".format(job_name)
output_url = "s3://{}/{}".format(s3_bucket, output_tar_key)
intermediate_url = "s3://{}/{}".format(s3_bucket, intermediate_folder_key)

print("S3 job path: {}".format(s3_url))
print("Output.tar.gz location: {}".format(output_url))
print("Intermediate folder path: {}".format(intermediate_url))
    
tmp_dir = "/tmp/{}".format(job_name)
os.system("mkdir {}".format(tmp_dir))
print("Create local folder {}".format(tmp_dir))
wait_for_s3_object(s3_bucket, intermediate_folder_key, tmp_dir) 

Training job: rl-stabebaselines-pacman-2021-04-09-14-25-04-440
S3 job path: s3://sagemaker-us-east-1-596030579944/rl-stabebaselines-pacman-2021-04-09-14-25-04-440
Output.tar.gz location: s3://sagemaker-us-east-1-596030579944/rl-stabebaselines-pacman-2021-04-09-14-25-04-440/output/output.tar.gz
Intermediate folder path: s3://sagemaker-us-east-1-596030579944/rl-stabebaselines-pacman-2021-04-09-14-25-04-440/output/intermediate
Create local folder /tmp/rl-stabebaselines-pacman-2021-04-09-14-25-04-440
Waiting for s3://sagemaker-us-east-1-596030579944/rl-stabebaselines-pacman-2021-04-09-14-25-04-440/output/intermediate...
Downloading rl-stabebaselines-pacman-2021-04-09-14-25-04-440/output/intermediate/0.monitor.csv
Downloading rl-stabebaselines-pacman-2021-04-09-14-25-04-440/output/intermediate/1.monitor.csv
Downloading rl-stabebaselines-pacman-2021-04-09-14-25-04-440/output/intermediate/2.monitor.csv
Downloading rl-stabebaselines-pacman-2021-04-09-14-25-04-440/output/intermediate/3.monitor.

['/tmp/rl-stabebaselines-pacman-2021-04-09-14-25-04-440/0.monitor.csv',
 '/tmp/rl-stabebaselines-pacman-2021-04-09-14-25-04-440/1.monitor.csv',
 '/tmp/rl-stabebaselines-pacman-2021-04-09-14-25-04-440/2.monitor.csv',
 '/tmp/rl-stabebaselines-pacman-2021-04-09-14-25-04-440/3.monitor.csv',
 '/tmp/rl-stabebaselines-pacman-2021-04-09-14-25-04-440/rl_out.meta.json',
 '/tmp/rl-stabebaselines-pacman-2021-04-09-14-25-04-440/rl_out.mp4']

### RL 视频输出

In [54]:
import io
import base64
video = io.open("{}/rl_out.mp4".format(tmp_dir), 'r+b').read()
encoded = base64.b64encode(video)
HTML(data='''<video alt="test" controls>
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
             </video>'''.format(encoded.decode('ascii')))

### Stabel baselines 参数调整(可选)

可以调整Stabel baselines 的参数使用更多的机器和step来获得更好的效果:
* `train_instance_count`: 10
* `train_instance_type`: ml.c5.xlarge
* `num_timesteps`: 1e7

使用上述设置训练模型花费了40分钟。您可以使用更少的实例和更长的培训时间来获得类似的输出。

In [None]:
import io
import base64
video = io.open("{}/rl_out.mp4", 'r+b').read()
encoded = base64.b64encode(video)
HTML(data='''<video alt="test" controls>
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
             </video>'''.format(encoded.decode('ascii')))