![](https://static.xingzheai.cn/41000bcda2bb4e2195142927a000c106.png)
# 使用行者AI提供的算法在 Amazon SageMaker上进行强化学习训练
[行者AI](http://www.xingzhe.ai) （成都潜在人工智能科技有限公司）专注于人工智能在游戏领域的研究和应用，凭借自研算法，推出游戏AI、数据平台、内容审核等多项产品，为各大游戏厂商提供相关服务。

## 概要介绍

### 环境介绍
6V6 足球比赛，在SageMaker上训练模型控制6个Agent，尽可能多的将球踢进对方球门，进一球+1分，丢一球-1分，规定时间内得分高者当局获胜，获胜后进球数有效，每一关的最大进球数会记录到排行榜中。

比赛总共开放3个关卡，内置了3个不同难度的AI机器人，不同的关卡难度风格不同，快去探索吧！

**注意**

**与常规的足球比赛不同，比赛环境是一个精简版的足球比赛，没有越位、罚球等复杂足球规则，规则只有一条，在规定时间（2分钟）内，控制Agent，尽可能多的进球。**

### 比赛地址
https://game.xingzheai.cn/soccer


## 足球比赛规则

* 游戏目标：通过比赛所有关卡并达到累计进球数最高。
* 游戏流程：点击开始比赛-选择比赛关卡-选择比赛模型-进行比赛。
* 比赛规则：每个关卡2分钟，进球数需超过官方模型，未超过或与官方持平，均算挑战失败。
* 选择模型：首次比赛需在模型库中上传模型，后续可上传新模型或选择已有进行挑战。
* 挑战次数：比赛规定时间内，不限挑战次数。
* 关卡选择：当前关卡通过后，才可挑战下一关卡。已通过关卡可重复挑战，以达到预期进球数。
* 比赛排行：只有完成所有关卡挑战的参赛者才能进入排行榜，系统将取参赛者每关卡最高进球数进行累计求和，累加最高者为榜首，依此类推。

## SageMaker环境准备

### 导入

导入我们需要的Python库, 以及需要的辅助方法: get_execution_role, wait_for_s3_object.

In [1]:
import sagemaker
import boto3
import sys
import os
import subprocess
from IPython.display import HTML
import time
from time import gmtime, strftime
sys.path.append("common")
from misc import wait_for_s3_object

from sagemaker.rl import RLEstimator

### 设置 S3 桶

通过Sagemaker SDK获取默认s3桶, 该桶将会存储模型,检查点和其他元数据

In [2]:
sage_session = sagemaker.session.Session()
s3_bucket = sage_session.default_bucket()  
s3_output_path = 's3://{}/'.format(s3_bucket)
print("S3 bucket path: {}".format(s3_output_path))

S3 bucket path: s3://sagemaker-us-east-1-596030579944/


### 定义任务/Image 名称变量 

我们定义为训练任务和image定义前缀变量: job_name_prefix*

In [3]:
#RL problem ID
rl_problem = 'soccer'

#创建训练任务名字
job_name_prefix = 'xingzhe-'+rl_problem

### 获取IAM角色

使用SageMaker SDK 的`get_execution_role()` 获取SageMaker Notebook的Role, `role = sagemaker.get_execution_role()` 

In [4]:
role = sagemaker.get_execution_role()
print("Using IAM role arn: {}".format(role))

Using IAM role arn: arn:aws:iam::596030579944:role/service-role/AmazonSageMaker-ExecutionRole-20191130T110013


## 获取已经制作好的镜像
行者AI已经将训练需要的环境打包到标准镜像中了，只需要拉取对应镜像，即可以完成训练环境的搭建




In [None]:
#拉取&推送到ECR
!sh ./use_public_prebuild.sh


In [8]:
#设置Image的名字
aws_account = boto3.Session().client("sts").get_caller_identity()['Account']
aws_region =  boto3.Session().region_name
custom_image_name = f'{aws_account}.dkr.ecr.{aws_region}.amazonaws.com/soccer-sagemaker-2.4.1-cpu-py37-ubuntu18.04-20210517'
print(custom_image_name)

596030579944.dkr.ecr.us-east-1.amazonaws.com/soccer-sagemaker-2.4.1-cpu-py37-ubuntu18.04-20210517


## 配置训练所需的超参数


####  查看config.yaml 超参数配置文件

In [23]:
!pygmentize source_dir/config.yaml

[94mbehaviors[39;49;00m:
  [94mSoccerTwos[39;49;00m:
    [94mtrainer_type[39;49;00m: ppo [37m# 训练算法，可以选择 ppo 和 sac[39;49;00m
    
    [37m# 设置超参数[39;49;00m
    [94mhyperparameters[39;49;00m: 
    
      [94mbatch_size[39;49;00m: 2048   
      [94mbuffer_size[39;49;00m: 20480     [37m# 缓存区的大小[39;49;00m
      [94mlearning_rate[39;49;00m: 0.0003  [37m# 学习率[39;49;00m
      [94mbeta[39;49;00m: 0.005  [37m# [39;49;00m
      [94mepsilon[39;49;00m: 0.2
      [94mlambd[39;49;00m: 0.95
      [94mnum_epoch[39;49;00m: 3
      [94mlearning_rate_schedule[39;49;00m: constant
      
    [37m# 网络设置[39;49;00m
    [94mnetwork_settings[39;49;00m:    
      [94mnormalize[39;49;00m: false           [37m# 是否归一化[39;49;00m
      [94mhidden_units[39;49;00m: 512         [37m# MLP网络隐层神经元的数量[39;49;00m
      [94mnum_layers[39;49;00m: 2             [37m# 网络多少层[39;49;00m
      [94mvis_encode_type[39;49;00m: simple    [37m# visual observation的编码， 在这里不需要设置[39;49;00m

In [39]:
# 修改超参数，调整模型

## 足球比赛RL环境介绍
### 状态空间设计
6个球员具有完全一致的状态空间，长度为112的数组
![](https://static.xingzheai.cn/4969b92678b34d6eaf6abbe70ee35c23_middle.png)


### 动作空间设计

* 离散空间，action shape = （3,3）
* 动作空间表示为：action [dimension0,dimension1,dimension2]
* dimension0包含3个维度，各维度含义如下：
* action [0,,]表示无作为，action [1,,]表示向前移动，action [2,,]表示向后移动
* action [,0,]表示无作为，action [,1,]表示向右移动，action [,2,]表示向左移动
* action [,,0]表示无作为，action [,,1]表示向右旋转，action [,,2]表示向左旋转

### 奖励设计
* 每个step, 奖励值 -1/maxstep， 促使Agent最短时间内进球
* 己方进球 +1
* 对方进球 -1


#### 编写训练代码

训练代码在`./source_dir`目录中的`learn.` 文件.

In [24]:
!pygmentize source_dir/learn.py

[37m# learn[39;49;00m
[34mfrom[39;49;00m [04m[36mmlagents[39;49;00m[04m[36m.[39;49;00m[04m[36mtrainers[39;49;00m[04m[36m.[39;49;00m[04m[36mlearn[39;49;00m [34mimport[39;49;00m parse_command_line

[34mfrom[39;49;00m [04m[36mmlagents[39;49;00m[04m[36m.[39;49;00m[04m[36mtrainers[39;49;00m [34mimport[39;49;00m learn
[34mimport[39;49;00m [04m[36mos[39;49;00m

[34mif[39;49;00m [31m__name__[39;49;00m == [33m'[39;49;00m[33m__main__[39;49;00m[33m'[39;49;00m:
    env_list = [[33m'[39;49;00m[33mSM_OUTPUT_DATA_DIR[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mSM_OUTPUT_DIR[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mSM_MODEL_DIR[39;49;00m[33m'[39;49;00m]
    [34mfor[39;49;00m _ [35min[39;49;00m env_list:
        [36mprint[39;49;00m(_, os.getenv(_))
    [37m# learn.run_cli(parse_command_line(['SoccerTwos.yaml', '--env', '/Users/jty/Desktop/3dball/3dball', '--tensorflow', '--resume']))[39;49;00m
    learn.run_cli(parse_comman

## 使用SageMaker SDK 创建 RL 训练任务

你可以选择 GPU 或者 CPU 来创建SageMaker 训练任务. SageMaker SDK提供了`RLEstimator类`用来创建RL训练任务. 

1. Specify the source directory where the environment, presets and training code is uploaded.
2. Specify the entry point as the training code 
3. Specify the choice of RL toolkit and framework. This automatically resolves to the ECR path for the RL Container. 
4. Define the training parameters such as the instance count, job name, S3 path for output and job name. 
5. Specify the hyperparameters for the RL agent algorithm. The `RLSTABLEBASELINES_PRESET` can be used to specify the RL agent algorithm you want to use. 
6. Define the metrics definitions that you are interested in capturing in your logs. These can also be visualized in CloudWatch and SageMaker Notebooks. 



In [None]:
%%time

# 选择训练机型
instance_type = "ml.c5.xlarge"

estimator = RLEstimator(entry_point="learn.py",
                        source_dir='source_dir',
                        dependencies=["common/sagemaker_rl"],
                        image_uri=custom_image_name,
                        role=role,
                        instance_type=instance_type,
                        use_spot_instances=True,     # 是否启用spot
                        max_wait = (72 * 60 * 60),
                        instance_count=1,
                        output_path=s3_output_path,
                        base_job_name=job_name_prefix,
                        hyperparameters={},
                    )

estimator.fit(wait=True)

2021-05-18 01:25:19 Starting - Starting the training job...
2021-05-18 01:25:43 Starting - Launching requested ML instancesProfilerReport-1621301119: InProgress
............
2021-05-18 01:27:44 Starting - Preparing the instances for training...
2021-05-18 01:28:08 Downloading - Downloading input data
2021-05-18 01:28:08 Training - Downloading the training image......
2021-05-18 01:29:12 Training - Training image download completed. Training in progress.[34m2021-05-18 01:29:13.924336: W tensorflow/core/profiler/internal/smprofiler_timeline.cc:460] Initializing the SageMaker Profiler.[0m
[34m2021-05-18 01:29:13.929713: W tensorflow/core/profiler/internal/smprofiler_timeline.cc:105] SageMaker Profiler is not enabled. The timeline writer thread will not be started, future recorded events will be dropped.[0m
[34m2021-05-18 01:29:14.046079: W tensorflow/core/profiler/internal/smprofiler_timeline.cc:460] Initializing the SageMaker Profiler.[0m
[34m2021-05-18 01:29:17,626 sagemaker-trai

# S3获取模型

In [12]:

job_name = estimator.latest_training_job.job_name
print("Training job: %s" % job_name)

s3_url = "s3://{}/{}".format(s3_bucket, job_name)

model_tar = "{}/output/model.tar.gz".format(job_name)

tmp_dir = "/tmp/{}".format(job_name)
os.system("mkdir {}".format(tmp_dir))
print("Create local folder {}".format(tmp_dir))

model_paths = wait_for_s3_object(s3_bucket, model_tar, tmp_dir) 


Training job: xingzhe-soccer-2021-05-18-01-25-19-342
Create local folder /tmp/xingzhe-soccer-2021-05-18-01-25-19-342
Waiting for s3://sagemaker-us-east-1-596030579944/xingzhe-soccer-2021-05-18-01-25-19-342/output/model.tar.gz...
Downloading xingzhe-soccer-2021-05-18-01-25-19-342/output/model.tar.gz


In [13]:
# 解压文件，得到model.nn
print(f"tar xvf {model_paths[0]} -C {tmp_dir}")
os.system(f"tar xvf {model_paths[0]} -C {tmp_dir}")

# 下载对应的模型文件
latest_model_path = os.path.join(tmp_dir, 'results', 'ppo', 'SoccerTwos.nn')
print(latest_model_path)

tar xvf /tmp/xingzhe-soccer-2021-05-18-01-25-19-342/model.tar.gz -C /tmp/xingzhe-soccer-2021-05-18-01-25-19-342
/tmp/xingzhe-soccer-2021-05-18-01-25-19-342/results/ppo/SoccerTwos.nn


In [14]:

from IPython.display import display, FileLink
# 这里需要下载对应的模型文件， TODO
local_file = FileLink(latest_model_path, result_html_prefix="Click here to download: ")
display(local_file)

## 足球比赛
进入足球比赛页面，**https://game.xingzheai.cn/soccer** , 完成注册，并且登录后，按照以下步骤操作：
1. 点击开始竞赛
2. 选择游戏关卡
3. 上传S3上训练好的模型文件
4. 点击开始，进行比赛


## 排行榜
在比赛主页https://game.xingzheai.cn/soccer ， 可以看到实时的排行榜，最终会按照排行榜，决定比赛的大奖哟！
![](https://static.xingzheai.cn/9b35ee46583541e3b8e547aaef19d680_middle.png)

# 彩蛋，继续训练
恭喜！如果你看到这里，你已经成功一半了

**注意：** 

继续训练，不能调整网络相关超参数

In [None]:
estimator = RLEstimator(entry_point="learn_resume.py",
                        source_dir='source_dir',
                        dependencies=["common/sagemaker_rl"],
                        image_uri=custom_image_name,
                        role=role,
                        instance_type=instance_type,
                        use_spot_instances=True,
                        max_wait = (72 * 60 * 60),
                        instance_count=1,
                        output_path=s3_output_path,
                        base_job_name=job_name_prefix,
                        hyperparameters={},
                    )

estimator.fit(wait=True)

2021-05-18 02:00:49 Starting - Starting the training job...
2021-05-18 02:01:14 Starting - Launching requested ML instancesProfilerReport-1621303248: InProgress
......
2021-05-18 02:02:14 Starting - Preparing the instances for training......
2021-05-18 02:03:16 Downloading - Downloading input data
2021-05-18 02:03:16 Training - Downloading the training image......
2021-05-18 02:04:18 Training - Training image download completed. Training in progress..[34m2021-05-18 02:04:19.005536: W tensorflow/core/profiler/internal/smprofiler_timeline.cc:460] Initializing the SageMaker Profiler.[0m
[34m2021-05-18 02:04:19.011116: W tensorflow/core/profiler/internal/smprofiler_timeline.cc:105] SageMaker Profiler is not enabled. The timeline writer thread will not be started, future recorded events will be dropped.[0m
[34m2021-05-18 02:04:19.107488: W tensorflow/core/profiler/internal/smprofiler_timeline.cc:460] Initializing the SageMaker Profiler.[0m
[34m2021-05-18 02:04:21,761 sagemaker-traini

In [55]:
job_name = estimator.latest_training_job.job_name
print("Training job: %s" % job_name)

s3_url = "s3://{}/{}".format(s3_bucket, job_name)

model_tar = "{}/output/model.tar.gz".format(job_name)

tmp_dir = "/tmp/{}".format(job_name)
os.system("mkdir {}".format(tmp_dir))
print("Create local folder {}".format(tmp_dir))

model_paths = wait_for_s3_object(s3_bucket, model_tar, tmp_dir) 


Training job: xingzhe-soccer-2021-05-17-13-05-46-626
Create local folder /tmp/xingzhe-soccer-2021-05-17-13-05-46-626
Waiting for s3://sagemaker-cn-northwest-1-081191365501/xingzhe-soccer-2021-05-17-13-05-46-626/output/model.tar.gz...
Downloading xingzhe-soccer-2021-05-17-13-05-46-626/output/model.tar.gz


In [11]:
# 解压文件，得到model.nn
print(f"tar xvf {model_paths[0]} -C {tmp_dir}")
os.system(f"tar xvf {model_paths[0]} -C {tmp_dir}")

# 下载对应的模型文件
latest_model_path = os.path.join(tmp_dir, 'results', 'ppo-resume', 'SoccerTwos.nn')
print(latest_model_path)

NameError: name 'model_paths' is not defined

In [59]:
os.system(f'cp {latest_model_path} .')

In [10]:
from IPython.display import display, FileLink
# 这里需要下载对应的模型文件， TODO
local_file = FileLink(latest_model_path, result_html_prefix="Click here to download: ")
display(local_file)

NameError: name 'latest_model_path' is not defined