# Using RL Zoo Baseline3


[`RL Baselines3 Zoo`](https://rl-baselines3-zoo.readthedocs.io/en/master/) is a training framework for Reinforcement Learning (RL), using Stable Baselines3 (SB3), reliable implementations of reinforcement learning algorithms in PyTorch. It provides scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos. In addition, it includes a collection of tuned hyperparameters for common environments and RL algorithms, and agents trained with those settings.

Github repository: https://github.com/DLR-RM/rl-baselines3-zoo

In this notebook, we will train and record demos as well as push the trained agents to Huggingface - all using RL Zoo sb3.

RL Zoo is supposed to be run from command line. However, we can use python notebooks to run commands using "!" bang character before the commands e.g., to run `pwd` unix command to list the current working directory, we can execute following command in a code cell `!pwd`.

We have been using this to install the dependencies while running these notebooks in Google Colab e.g.
```
!pip install "stable-baselines3[extra]
```

## Running in Colab

If you are running this on Colab, please uncomment below cell and run this to install required dependencies.

In [None]:
#uncomment and execute this cell to install all the the dependencies if running in Google Colab
# cd command will change based on which cloud system you are using

# !apt-get update && apt-get install -y swig cmake ffmpeg freeglut3-dev xvfb
# !git clone https://github.com/DLR-RM/rl-baselines3-zoo

## Uncomment for Colab    
# %cd /content/rl-baselines3-zoo/ 

# Uncomment for Kaggle
# %cd /kaggle/working/rl-baselines3-zoo/ 

# !pip install  -r requirements.txt
# !pip install huggingface_sb3>=3.0
# !pip install stable-baselines3[extra]>=2.1
# !pip install sb3-contrib

## Training LunarLander using DQN

Same as Listing 2.3 - except this time done using RL Zoo

Please note that the default parameters printed at the start of executing below command can be changed. You can refer to RL Zoo documentation for more details. Please also note that these default parameters are different form the the defaults while running the `model.train` from `stablebaseline3` - https://stable-baselines3.readthedocs.io/en/master/modules/dqn.html#stable_baselines3.dqn.DQN

In [2]:
# Train a DQN agent on LunarLander-v2

!python -m rl_zoo3.train --algo dqn --env LunarLander-v2 --n-timesteps 100000 --log-interval 400 --progress

2023-07-13 08:38:39.759355: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Seed: 267716603
Loading hyperparameters from: /content/rl-baselines3-zoo/hyperparams/dqn.yml
Default hyperparameters for environment (ones being tuned will be overridden):
OrderedDict([('batch_size', 128),
             ('buffer_size', 50000),
             ('exploration_final_eps', 0.1),
             ('exploration_fraction', 0.12),
             ('gamma', 0.99),
             ('gradient_steps', -1),
             ('learning_rate', 0.00063),
             ('learning_starts', 0),
             ('n_timesteps', 100000.0),
             ('policy', 'MlpPolicy'),
             ('policy_kwargs', 'dict(net_arch=[256, 256])'),
             ('target_update_interval', 250),
             ('train_freq

## Evaluting the agent

We will now evaluate the above trained agent by loading the best model saved by above command.



In [3]:
!python -m rl_zoo3.enjoy --algo dqn --env LunarLander-v2 --no-render --n-timesteps 5000 --folder logs/

2023-07-13 08:44:59.503716: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Loading latest experiment, id=1
Loading logs/dqn/LunarLander-v2_1/LunarLander-v2.zip
Episode Reward: 280.47
Episode Length 273
Episode Reward: 156.79
Episode Length 1000
Episode Reward: 290.06
Episode Length 224
Episode Reward: 233.06
Episode Length 292
Episode Reward: 177.18
Episode Length 474
Episode Reward: 141.88
Episode Length 1000
Episode Reward: 265.48
Episode Length 388
Episode Reward: 263.63
Episode Length 222
Episode Reward: 301.48
Episode Length 208
Episode Reward: 263.89
Episode Length 320
Episode Reward: 262.31
Episode Length 207
Episode Reward: 302.28
Episode Length 263
12 Episodes
Mean reward: 244.88 +/- 53.49
Mean episode length: 405.92 +/- 276.12


## Recordig a video

Let us now record a video of trained agent

In [5]:
!python -m rl_zoo3.record_video --algo dqn --env LunarLander-v2 --exp-id 0 -f logs/ -n 1000

2023-07-13 08:45:14.804334: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Loading latest experiment, id=1
Loading logs/dqn/LunarLander-v2_1/LunarLander-v2.zip
Loading logs/dqn/LunarLander-v2_1/LunarLander-v2.zip
Saving video to /content/rl-baselines3-zoo/logs/dqn/LunarLander-v2_1/videos/final-model-dqn-LunarLander-v2-step-0-to-step-1000.mp4
Moviepy - Building video /content/rl-baselines3-zoo/logs/dqn/LunarLander-v2_1/videos/final-model-dqn-LunarLander-v2-step-0-to-step-1000.mp4.
Moviepy - Writing video /content/rl-baselines3-zoo/logs/dqn/LunarLander-v2_1/videos/final-model-dqn-LunarLander-v2-step-0-to-step-1000.mp4

Moviepy - Done !
Moviepy - video ready /content/rl-baselines3-zoo/logs/dqn/LunarLander-v2_1/videos/final-model-dqn-LunarLander-v2-step-0-t

## Display the video

In [9]:
import base64
from pathlib import Path

from IPython import display as ipythondisplay


def show_videos(video_path="", prefix=""):
    """
    Taken from https://github.com/eleurent/highway-env

    :param video_path: (str) Path to the folder containing videos
    :param prefix: (str) Filter the video, showing only the only starting with this prefix
    """
    html = []
    for mp4 in Path(video_path).glob("{}*.mp4".format(prefix)):
        video_b64 = base64.b64encode(mp4.read_bytes())
        html.append(
            """<video alt="{}" autoplay
                    loop controls style="height: 400px;">
                    <source src="data:video/mp4;base64,{}" type="video/mp4" />
                </video>""".format(
                mp4, video_b64.decode("ascii")
            )
        )
    ipythondisplay.display(ipythondisplay.HTML(data="<br>".join(html)))

In [11]:

show_videos(video_path='logs/dqn/LunarLander-v2_1/videos/', prefix='')

## Pushing to Huggingface

To share with others, you can push the trained model to huggingface. First we need to login into hugginfcae using the token


In [8]:
from huggingface_sb3 import load_from_hub, package_to_hub, push_to_hub
from huggingface_hub import notebook_login # To log to our Hugging Face account to be able to upload models to the Hub.

notebook_login()
!git config --global credential.helper store

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [12]:
# Before you run this, change -orga to your huggingface id

!python -m rl_zoo3.push_to_hub --algo dqn --env LunarLander-v2 -f logs/ -orga nsanghi -m "Initial commit"

Loading latest experiment, id=1
Loading logs/dqn/LunarLander-v2_1/LunarLander-v2.zip
Uploading to nsanghi/dqn-LunarLander-v2, make sure to have the rights
[38;5;4mℹ This function will save, evaluate, generate a video of your agent,
create a model card and push everything to the hub. It might take up to some
minutes if video generation is activated. This is a work in progress: if you
encounter a bug, please open an issue.[0m
Cloning https://huggingface.co/nsanghi/dqn-LunarLander-v2 into local empty directory.
Saving model to: hub/dqn-LunarLander-v2/dqn-LunarLander-v2
Saving video to /tmp/tmpowkaj9ar/-step-0-to-step-1000.mp4
Moviepy - Building video /tmp/tmpowkaj9ar/-step-0-to-step-1000.mp4.
Moviepy - Writing video /tmp/tmpowkaj9ar/-step-0-to-step-1000.mp4

Moviepy - Done !
Moviepy - video ready /tmp/tmpowkaj9ar/-step-0-to-step-1000.mp4
ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
  configuration: --pre

## See model at Huggingface Hub

Click on link below to see the stored trained agent and video on huggingface

https://huggingface.co/nsanghi/dqn-LunarLander-v2

In your case it would look like

`https://huggingface.co/<orga>/<algo>-<env>`

Please try to follow the same steps to see if you can train the agent from Listing2-4 using RL-Zoo