<a href="https://colab.research.google.com/github/komazawa-deep-learning/komazawa-deep-learning.github.io/blob/master/2023notebooks/2023_0618rl_baselines_zoo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# RL Baselines3 Zoo: Training in Colab



Github Repo: [https://github.com/DLR-RM/rl-baselines3-zoo](https://github.com/DLR-RM/rl-baselines3-zoo)

Stable-Baselines3 Repo: [https://github.com/DLR-RM/rl-baselines3-zoo](https://github.com/DLR-RM/stable-baselines3)


# Install Dependencies



In [None]:
# for autoformatting
# %load_ext jupyter_black

In [None]:
!apt-get update && apt-get install swig cmake ffmpeg freeglut3-dev xvfb

## Clone RL Baselines3 Zoo Repo

In [3]:
!git clone https://github.com/DLR-RM/rl-baselines3-zoo

Cloning into 'rl-baselines3-zoo'...
remote: Enumerating objects: 5235, done.[K
remote: Counting objects: 100% (76/76), done.[K
remote: Compressing objects: 100% (51/51), done.[K
remote: Total 5235 (delta 34), reused 41 (delta 22), pack-reused 5159[K
Receiving objects: 100% (5235/5235), 3.78 MiB | 24.99 MiB/s, done.
Resolving deltas: 100% (3456/3456), done.


In [4]:
%cd /content/rl-baselines3-zoo/

/content/rl-baselines3-zoo


### Install pip dependencies

In [None]:
!pip install -r requirements.txt

## RL 動作主を訓練 <!-- ## Train an RL Agent-->

train エージェントは `logs/` フォルダに格納されてい。<!-- The train agent can be found in the `logs/` folder. -->

ここでは，A2C を CartPole-v1 環境で 100 000 ステップ学習させる。
<!-- Here we will train A2C on CartPole-v1 environment for 100 000 steps. -->

Pong(Atari) で訓練するには `--env PongNoFrameskip-v4` を渡すだけである。
<!-- To train it on Pong (Atari), you just have to pass `--env PongNoFrameskip-v4` -->

注：新しい環境をサポートするために `hyperparams/algo.yml` を更新する必要がある。Google Colab のサイドパネルからアクセスすることができる。
(https://stackoverflow.com/questions/46986398/import-data-into-google-colaboratory 参照)
<!-- Note: You need to update `hyperparams/algo.yml` to support new environments. You can access it in the side panel of Google Colab.
(see https://stackoverflow.com/questions/46986398/import-data-into-google-colaboratory) -->

In [None]:
!python -m rl_zoo3.train --algo a2c --env CartPole-v1 --n-timesteps 100000

#### Evaluate trained agent


You can remove the `--folder logs/` to evaluate pretrained agent.

In [None]:
!python -m rl_zoo3.enjoy --algo a2c --env CartPole-v1 --no-render --n-timesteps 5000 --folder logs/

#### Tune Hyperparameters

We use [Optuna](https://optuna.org/) for optimizing the hyperparameters.

Tune the hyperparameters for PPO, using a tpe sampler and median pruner, 2 parallels jobs,
with a budget of 1000 trials and a maximum of 50000 steps

In [None]:
!python -m rl_zoo3.train --algo ppo --env MountainCar-v0 -n 50000 -optimize --n-trials 1000 --n-jobs 2 --sampler tpe --pruner median

### Record  a Video

In [7]:
# Set up display; otherwise rendering will fail
import os
os.system("Xvfb :1 -screen 0 1024x768x24 &")
os.environ['DISPLAY'] = ':1'

In [8]:
!python -m rl_zoo3.record_video --algo a2c --env CartPole-v1 --exp-id 0 -f logs/ -n 1000

2023-06-17 22:23:26.951585: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Loading latest experiment, id=1
Loading logs/a2c/CartPole-v1_1/CartPole-v1.zip
Loading logs/a2c/CartPole-v1_1/CartPole-v1.zip
Saving video to /content/rl-baselines3-zoo/logs/a2c/CartPole-v1_1/videos/final-model-a2c-CartPole-v1-step-0-to-step-1000.mp4
Moviepy - Building video /content/rl-baselines3-zoo/logs/a2c/CartPole-v1_1/videos/final-model-a2c-CartPole-v1-step-0-to-step-1000.mp4.
Moviepy - Writing video /content/rl-baselines3-zoo/logs/a2c/CartPole-v1_1/videos/final-model-a2c-CartPole-v1-step-0-to-step-1000.mp4

Moviepy - Done !
Moviepy - video ready /content/rl-baselines3-zoo/logs/a2c/CartPole-v1_1/videos/final-model-a2c-CartPole-v1-step-0-to-step-1000.mp4


### Display the video

In [9]:
import base64
from pathlib import Path

from IPython import display as ipythondisplay


def show_videos(video_path="", prefix=""):
    """
    Taken from https://github.com/eleurent/highway-env

    :param video_path: (str) Path to the folder containing videos
    :param prefix: (str) Filter the video, showing only the only starting with this prefix
    """
    html = []
    for mp4 in Path(video_path).glob("{}*.mp4".format(prefix)):
        video_b64 = base64.b64encode(mp4.read_bytes())
        html.append(
            """<video alt="{}" autoplay
                    loop controls style="height: 400px;">
                    <source src="data:video/mp4;base64,{}" type="video/mp4" />
                </video>""".format(
                mp4, video_b64.decode("ascii")
            )
        )
    ipythondisplay.display(ipythondisplay.HTML(data="<br>".join(html)))

In [10]:
show_videos(video_path='logs/a2c/CartPole-v1_1/videos/', prefix='')

### Continue Training

Here, we will continue training of the previous model

In [None]:
!python -m rl_zoo3.train --algo a2c --env CartPole-v1 --n-timesteps 50000 -i logs/a2c/CartPole-v1_1/CartPole-v1.zip

In [None]:
!python -m rl_zoo3.enjoy --algo a2c --env CartPole-v1 --no-render --n-timesteps 1000 --folder logs/