# Using RL Zoo Baseline3


[RL Baselines3 Zoo](https://rl-baselines3-zoo.readthedocs.io/en/master/) is a training framework for Reinforcement Learning (RL), using Stable Baselines3 (SB3), reliable implementations of reinforcement learning algorithms in PyTorch. It provides scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos. In addition, it includes a collection of tuned hyperparameters for common environments and RL algorithms, and agents trained with those settings.

Github repository: https://github.com/DLR-RM/rl-baselines3-zoo

In this notebook, we will train and record demos as well as push the trained agents to Huggingface - all using RL Zoo sb3. 

RL Zoo is suppsoed to be run from command line. However, we can use python notebooks to run commands using "!" bang character before the commands e.g., to run `pwd` unix command to list the current working directory, we can execute following command in a code cell `!pwd`. 

We have been using this to install the dependencies while running these notebooks in Google Colab e.g.
```
!pip install "stable-baselines3[extra]
```

## Running in Colab

If you are running this on Colab, please uncomment below cell and run this to install required dependencies. 

In [None]:
#uncomment and execute this cell to install all the the dependencies if running in Google Colab

# !apt-get update && apt-get install swig cmake
# !pip install box2d-py
# !pip install "stable-baselines3[extra]>=2.0.0a4"
# pip install git+https://github.com/huggingface/huggingface_sb3@gymnasium
# !pip rl_zoo3 @ git+https://github.com/DLR-RM/rl-baselines3-zoo@update/hf

## Training LunarLander using DQN

Same as Listing 2.3 - except this time done using RL Zoo

Please note that the default parameters printed at the start of executing below command can be changed. You can refer to RL Zoo documentation for more details. Please also note that these default parameters are different form the the defaults while running the `model.train` from `stablebaseline3` - https://stable-baselines3.readthedocs.io/en/master/modules/dqn.html#stable_baselines3.dqn.DQN

In [6]:
# Train a DQN agent on LunarLander-v2

!python -m rl_zoo3.train --algo dqn --env LunarLander-v2 --n-timesteps 100000 --log-interval 400 --progress

Seed: 1726845707
Loading hyperparameters from: /home/nsanghi/miniconda3/envs/apress/lib/python3.10/site-packages/rl_zoo3/hyperparams/dqn.yml
Default hyperparameters for environment (ones being tuned will be overridden):
OrderedDict([('batch_size', 128),
             ('buffer_size', 50000),
             ('exploration_final_eps', 0.1),
             ('exploration_fraction', 0.12),
             ('gamma', 0.99),
             ('gradient_steps', -1),
             ('learning_rate', 0.00063),
             ('learning_starts', 0),
             ('n_timesteps', 100000.0),
             ('policy', 'MlpPolicy'),
             ('policy_kwargs', 'dict(net_arch=[256, 256])'),
             ('target_update_interval', 250),
             ('train_freq', 4)])
Using 1 environments
Overwriting n_timesteps with n=100000
Creating test environment
Using cpu device
Log path: logs/dqn/LunarLander-v2_2
[2KEval num_timesteps=25000, episode_reward=102.52 +/- 73.72━━━━━━[0m [32m24,988/100,000 [0m [ [33m0:02:25[0m < 

## Evaluting the agent

We will now evaluate the above trained agent by loading the best model saved by above command.



In [8]:
!python -m rl_zoo3.enjoy --algo dqn --env LunarLander-v2 --no-render --n-timesteps 5000 --folder logs/

Loading latest experiment, id=2
Loading logs/dqn/LunarLander-v2_2/LunarLander-v2.zip
Episode Reward: 315.41
Episode Length 181
Episode Reward: 39.45
Episode Length 136
Episode Reward: 270.85
Episode Length 205
Episode Reward: 62.67
Episode Length 111
Episode Reward: 207.56
Episode Length 402
Episode Reward: 31.54
Episode Length 104
Episode Reward: 114.15
Episode Length 1000
Episode Reward: 252.52
Episode Length 254
Episode Reward: 248.03
Episode Length 388
Episode Reward: 306.31
Episode Length 181
Episode Reward: 163.08
Episode Length 1000
Episode Reward: 244.22
Episode Length 418
Episode Reward: 287.67
Episode Length 201
Episode Reward: 76.37
Episode Length 95
Episode Reward: 290.65
Episode Length 220
15 Episodes
Mean reward: 194.03 +/- 99.71
Mean episode length: 326.40 +/- 282.88


## Recordig a video

Let us now record a video of trained agent

In [12]:
# Set up display; otherwise rendering will fail
# Uncomment and run only if running in Google Colab
# import os
# os.system("Xvfb :1 -screen 0 1024x768x24 &")
# os.environ['DISPLAY'] = ':1'

In [13]:
!python -m rl_zoo3.record_video --algo dqn --env LunarLander-v2 --exp-id 0 -f logs/ -n 1000

Loading latest experiment, id=2
Loading logs/dqn/LunarLander-v2_2/LunarLander-v2.zip
Loading logs/dqn/LunarLander-v2_2/LunarLander-v2.zip
Saving video to /home/nsanghi/sandbox/apress/drl-2ed/chapter2/logs/dqn/LunarLander-v2_2/videos/final-model-dqn-LunarLander-v2-step-0-to-step-1000.mp4
Moviepy - Building video /home/nsanghi/sandbox/apress/drl-2ed/chapter2/logs/dqn/LunarLander-v2_2/videos/final-model-dqn-LunarLander-v2-step-0-to-step-1000.mp4.
Moviepy - Writing video /home/nsanghi/sandbox/apress/drl-2ed/chapter2/logs/dqn/LunarLander-v2_2/videos/final-model-dqn-LunarLander-v2-step-0-to-step-1000.mp4

Moviepy - Done !                                                                
Moviepy - video ready /home/nsanghi/sandbox/apress/drl-2ed/chapter2/logs/dqn/LunarLander-v2_2/videos/final-model-dqn-LunarLander-v2-step-0-to-step-1000.mp4


## Display the video

In [14]:
import base64
from pathlib import Path

from IPython import display as ipythondisplay


def show_videos(video_path="", prefix=""):
    """
    Taken from https://github.com/eleurent/highway-env

    :param video_path: (str) Path to the folder containing videos
    :param prefix: (str) Filter the video, showing only the only starting with this prefix
    """
    html = []
    for mp4 in Path(video_path).glob("{}*.mp4".format(prefix)):
        video_b64 = base64.b64encode(mp4.read_bytes())
        html.append(
            """<video alt="{}" autoplay 
                    loop controls style="height: 400px;">
                    <source src="data:video/mp4;base64,{}" type="video/mp4" />
                </video>""".format(
                mp4, video_b64.decode("ascii")
            )
        )
    ipythondisplay.display(ipythondisplay.HTML(data="<br>".join(html)))

In [17]:
show_videos(video_path='logs/dqn/LunarLander-v2_2/videos/', prefix='')

## Pushing to Huggingface

To share with others, you can push the trained model to huggingface. First we need to login into hugginfcae using the token


In [22]:
from huggingface_sb3 import load_from_hub, package_to_hub, push_to_hub
from huggingface_hub import notebook_login # To log to our Hugging Face account to be able to upload models to the Hub.

notebook_login()
!git config --global credential.helper store

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [23]:
# Before you run this, change -orga to your huggingface id

!python -m rl_zoo3.push_to_hub --algo dqn --env LunarLander-v2 -f logs/ -orga nsanghi -m "Initial commit"

Loading latest experiment, id=2
Loading logs/dqn/LunarLander-v2_2/LunarLander-v2.zip
Uploading to nsanghi/dqn-LunarLander-v2, make sure to have the rights
[38;5;4mℹ This function will save, evaluate, generate a video of your agent,
create a model card and push everything to the hub. It might take up to some
minutes if video generation is activated. This is a work in progress: if you
encounter a bug, please open an issue.[0m
Cloning https://huggingface.co/nsanghi/dqn-LunarLander-v2 into local empty directory.
Download file dqn-LunarLander-v2.zip:  18%|█▌       | 18.4k/103k [00:00<?, ?B/s]
Download file dqn-LunarLander-v2.zip: 100%|██| 103k/103k [00:01<00:00, 86.2kB/s][A

Clean file dqn-LunarLander-v2.zip: 100%|██████| 103k/103k [00:01<00:00, 104kB/s][A
Clean file dqn-LunarLander-v2.zip: 100%|██████| 103k/103k [00:01<00:00, 104kB/s][A
Saving model to: hub/dqn-LunarLander-v2/dqn-LunarLander-v2
Saving video to /tmp/tmp5hnsm3zw/-step-0-to-step-1000.mp4
Moviepy - Building video /tmp/tmp

## See model at Huggingface Hub

Click on link below to see the stored trained agent and video on huggingface

https://huggingface.co/nsanghi/dqn-LunarLander-v2

In your case it would look like 

`https://huggingface.co/<orga>/<algo>-<env>`

Please try to follow the same steps to see if you can train the agent from Listing2-4 using RL-Zoo