<a href="https://colab.research.google.com/github/matpg/RL-Agent-for-Unreal-Engine/blob/main/Codigo_zoo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# RL Baselines Zoo: Training in Colab



Github Repo: [https://github.com/araffin/rl-baselines-zoo](https://github.com/araffin/rl-baselines-zoo)

Stable-Baselines Repo: [https://github.com/hill-a/stable-baselines](https://github.com/hill-a/stable-baselines)

Medium article: [https://medium.com/@araffin/stable-baselines-a-fork-of-openai-baselines-df87c4b2fc82](https://medium.com/@araffin/stable-baselines-a-fork-of-openai-baselines-df87c4b2fc82)

# Install Dependencies



In [None]:
# Stable Baselines only supports tensorflow 1.x for now
%tensorflow_version 1.x
!apt-get update
!apt-get install swig cmake libopenmpi-dev zlib1g-dev ffmpeg freeglut3-dev xvfb
!pip install stable-baselines[mpi] --upgrade
!pip install pybullet
!pip install box2d box2d-kengz pyyaml pytablewriter optuna scikit-optimize


TensorFlow 1.x selected.
Get:1 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]
Get:2 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/ InRelease [3,626 B]
Ign:3 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease
Hit:4 http://archive.ubuntu.com/ubuntu bionic InRelease
Ign:5 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  InRelease
Get:6 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  Release [697 B]
Hit:7 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Release
Get:8 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  Release.gpg [836 B]
Get:9 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu bionic InRelease [15.9 kB]
Get:10 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]
Get:11 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages [1,750 kB]
Get:12 htt

## Clone RL Baselines Zoo Repo

In [None]:
!git clone https://github.com/araffin/rl-baselines-zoo

Cloning into 'rl-baselines-zoo'...
remote: Enumerating objects: 33, done.[K
remote: Counting objects: 100% (33/33), done.[K
remote: Compressing objects: 100% (25/25), done.[K
remote: Total 1829 (delta 12), reused 18 (delta 8), pack-reused 1796[K
Receiving objects: 100% (1829/1829), 375.67 MiB | 40.61 MiB/s, done.
Resolving deltas: 100% (1077/1077), done.
Checking out files: 100% (333/333), done.


In [None]:
cd rl-baselines-zoo/

/content/rl-baselines-zoo


## Train an RL Agent


The train agent can be found in the `logs/` folder.

Here we will train A2C on CartPole-v1 environment for 100 000 steps. 


To train it on Pong (Atari), you just have to pass `--env PongNoFrameskip-v4`

Note: You need to update `hyperparams/algo.yml` to support new environments. You can access it in the side panel of Google Colab. (see https://stackoverflow.com/questions/46986398/import-data-into-google-colaboratory)

In [None]:
!pip install gym-minigrid

In [None]:

!python train.py --algo ppo2 --env MiniGrid-SimpleCrossingEnvUmaze-v0 --gym-packages gym_minigrid

The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.


Seed: 0
OrderedDict([('cliprange', 0.2),
             ('ent_coef', 0.0),
             ('env_wrapper', 'gym_minigrid.wrappers.FlatObsWrapper'),
             ('gamma', 0.99),
             ('lam', 0.95),
             ('learning_rate', 0.00025),
             ('n_envs', 8),
             ('n_steps', 128),
             ('n_timesteps', 100000.0),
             ('nminibatches', 32),
             ('noptepochs', 10),
             ('normalize', True),
             ('policy', 'MlpPolicy')])
Using 8 environments
Normalizing input and reward
Creating test environment
Normalization activated: {'norm_reward': False}




Instructions for updating

#### Evaluate trained agent


You can remove the `--folder logs/` to evaluate pretrained agent.

In [None]:
!python enjoy.py --algo a2c --env CartPole-v1 --no-render --n-timesteps 5000 --folder logs/

#### Tune Hyperparameters

We use [Optuna](https://optuna.org/) for optimizing the hyperparameters.

Tune the hyperparameters for PPO2, using a tpe sampler and median pruner, 2 parallels jobs,
with a budget of 1000 trials and a maximum of 50000 steps

In [None]:
!python -m train.py --algo ppo2 --env MountainCar-v0 -n 50000 -optimize --n-trials 1000 --n-jobs 2 --sampler tpe --pruner median

### Record  a Video

In [None]:
# Set up display; otherwise rendering will fail
import os
os.system("Xvfb :1 -screen 0 1024x768x24 &")
os.environ['DISPLAY'] = ':1'

In [None]:
!pip install pyglet==1.3.1  # pyglet v1.4.1 throws an error

In [None]:
!python -m utils.record_video --algo a2c --env CartPole-v1 --exp-id 0 -f logs/ -n 1000

### Display the video

In [None]:
import base64
from pathlib import Path

from IPython import display as ipythondisplay

def show_videos(video_path='', prefix=''):
  """
  Taken from https://github.com/eleurent/highway-env

  :param video_path: (str) Path to the folder containing videos
  :param prefix: (str) Filter the video, showing only the only starting with this prefix
  """
  html = []
  for mp4 in Path(video_path).glob("{}*.mp4".format(prefix)):
      video_b64 = base64.b64encode(mp4.read_bytes())
      html.append('''<video alt="{}" autoplay 
                    loop controls style="height: 400px;">
                    <source src="data:video/mp4;base64,{}" type="video/mp4" />
                </video>'''.format(mp4, video_b64.decode('ascii')))
  ipythondisplay.display(ipythondisplay.HTML(data="<br>".join(html)))

In [None]:
show_videos(prefix='a2c')

### Continue Training

Here, we will continue training of the previous model

In [None]:
!python train.py --algo a2c --env CartPole-v1 --n-timesteps 50000 -i logs/a2c/CartPole-v1.pkl

In [None]:
!python enjoy.py --algo a2c --env CartPole-v1 --no-render --n-timesteps 1000 --folder logs/