<a href="https://colab.research.google.com/github/nsanghi/drl-2ed/blob/main/sb3-train-video-hfupload.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Stable Baselines3 - Training, Saving and Loading

Github Repo: [https://github.com/DLR-RM/stable-baselines3](https://github.com/DLR-RM/stable-baselines3)


[RL Baselines3 Zoo](https://github.com/DLR-RM/rl-baselines3-zoo) is a training framework for Reinforcement Learning (RL), using Stable Baselines3.

It provides scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos.

Documentation is available online: [https://stable-baselines3.readthedocs.io/](https://stable-baselines3.readthedocs.io/)

## Install Dependencies and Stable Baselines Using Pip


```
pip install stable-baselines3[extra]
```

In [1]:
# for autoformatting
# %load_ext jupyter_black

In [2]:
!apt-get update && apt-get install swig cmake
!pip install box2d-py
!pip install "stable-baselines3[extra]>=2.0.0a4"

0% [Working]            Get:1 https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/ InRelease [3,622 B]
0% [Connecting to archive.ubuntu.com (91.189.91.38)] [Connecting to security.ub0% [Connecting to archive.ubuntu.com (91.189.91.38)] [Connecting to security.ub                                                                               Hit:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  InRelease
Hit:3 http://archive.ubuntu.com/ubuntu focal InRelease
Get:4 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]
Get:5 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]
Get:6 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu focal InRelease [18.1 kB]
Get:7 http://archive.ubuntu.com/ubuntu focal-backports InRelease [108 kB]
Get:8 http://archive.ubuntu.com/ubuntu focal-updates/universe amd64 Packages [1,369 kB]
Hit:9 http://ppa.launchpad.net/cran/libgit2/ubuntu focal InRelease
Get:10 http://archive.ubuntu.com/ubuntu 

## Import policy, RL agent, ...

In [3]:
import gymnasium as gym
import numpy as np

from stable_baselines3 import DQN

  if not hasattr(tensorboard, "__version__") or LooseVersion(
  float8_e4m3b11fnuz = ml_dtypes.float8_e4m3b11


## Create the Gym env and instantiate the agent

For this example, we will use Lunar Lander environment.

"Landing outside landing pad is possible. Fuel is infinite, so an agent can learn to fly and then land on its first attempt. Four discrete actions available: do nothing, fire left orientation engine, fire main engine, fire right orientation engine. "

Lunar Lander environment: [https://gymnasium.farama.org/environments/box2d/lunar_lander/](https://gymnasium.farama.org/environments/box2d/lunar_lander/)

![Lunar Lander](https://cdn-images-1.medium.com/max/960/1*f4VZPKOI0PYNWiwt0la0Rg.gif)


We chose the MlpPolicy because input of Lunar Lander is a feature vector, not images.

The type of action to use (discrete/continuous) will be automatically deduced from the environment action space



In [4]:
model = DQN(
    "MlpPolicy",
    "LunarLander-v2",
    verbose=1,
    exploration_final_eps=0.1,
    target_update_interval=250,
)

Using cuda device
Creating environment from the given name 'LunarLander-v2'
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.


We load a helper function to evaluate the agent:

In [5]:
from stable_baselines3.common.evaluation import evaluate_policy

Let's evaluate the un-trained agent, this should be a random agent.

In [6]:
# Separate env for evaluation
eval_env = gym.make("LunarLander-v2")

# Random Agent, before training
mean_reward, std_reward = evaluate_policy(
    model,
    eval_env,
    n_eval_episodes=10,
    deterministic=True,
)

print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")



mean_reward=-132.76 +/- 27.05665177922836


## Train the agent and save it

Warning: this may take a while

In [None]:
# Train the agent
model.learn(total_timesteps=int(1e5))
# Save the agent
model.save("dqn_lunar")
del model  # delete trained model to demonstrate loading

## Load the trained agent

In [8]:
model = DQN.load("dqn_lunar")

In [9]:
# Evaluate the trained agent
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)

print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")

mean_reward=21.69 +/- 64.75838037237618


In [10]:
import gymnasium as gym
from stable_baselines3.common.vec_env import VecVideoRecorder, DummyVecEnv

env_id = "LunarLander-v2"
video_folder = "logs/videos/"
video_length = 100

vec_env = DummyVecEnv([lambda: gym.make(env_id, render_mode="rgb_array")])

obs = vec_env.reset()

# Record the video starting at the first step
vec_env = VecVideoRecorder(vec_env, video_folder,
                       record_video_trigger=lambda x: x == 0, video_length=video_length,
                       name_prefix=f"random-agent-{env_id}")

vec_env.reset()
for _ in range(video_length + 1):
  action, _state = model.predict(obs)
  obs, _, _, _ = vec_env.step(action)
# Save the video
vec_env.close()

  and should_run_async(code)


Saving video to /content/logs/videos/random-agent-LunarLander-v2-step-0-to-step-100.mp4
Moviepy - Building video /content/logs/videos/random-agent-LunarLander-v2-step-0-to-step-100.mp4.
Moviepy - Writing video /content/logs/videos/random-agent-LunarLander-v2-step-0-to-step-100.mp4





Moviepy - Done !
Moviepy - video ready /content/logs/videos/random-agent-LunarLander-v2-step-0-to-step-100.mp4


In [17]:
%%html
<video controls autoplay><source src="./logs/videos/random-agent-LunarLander-v2-step-0-to-step-100.mp4" type="video/mp4"></video>

In [18]:
from IPython.display import HTML
from base64 import b64encode
mp4 = open('./logs/videos/random-agent-LunarLander-v2-step-0-to-step-100.mp4','rb').read()
data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
HTML("""
<video width=400 controls>
      <source src="%s" type="video/mp4">
</video>
""" % data_url)

In [22]:
from huggingface_sb3 import package_to_hub

eval_env = gym.make("LunarLander-v2", render_mode='rgb_array')

# method save, evaluate, generate a model card and record a replay video of your agent before pushing the repo to the hub
package_to_hub(model=model, # Our trained model
               model_name="dqn-LunarLander-v2", # The name of our trained model
               model_architecture="DQN", # The model architecture we used: in our case PPO
               env_id="LunarLander-v2", # Name of the environment
               eval_env=eval_env, # Evaluation Environment
               repo_id="nsanghi/dqn-LunarLander-v2", # id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name} for instance ThomasSimonini/ppo-LunarLander-v2
               commit_message="Push to Hub")

# Note: if after running the package_to_hub function and it gives an issue of rebasing, please run the following code
# cd <path_to_repo> && git add . && git commit -m "Add message" && git pull
# And don't forget to do a "git push" at the end to push the change to the hub.

  and should_run_async(code)
  logger.warn(


[38;5;4mℹ This function will save, evaluate, generate a video of your agent,
create a model card and push everything to the hub. It might take up to 1min.
This is a work in progress: if you encounter a bug, please open an issue.[0m
Saving video to /tmp/tmp_uw7smee/-step-0-to-step-1000.mp4
Moviepy - Building video /tmp/tmp_uw7smee/-step-0-to-step-1000.mp4.
Moviepy - Writing video /tmp/tmp_uw7smee/-step-0-to-step-1000.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmp_uw7smee/-step-0-to-step-1000.mp4
[38;5;4mℹ Pushing repo nsanghi/dqn-LunarLander-v2 to the Hugging Face Hub[0m


policy.pth:   0%|          | 0.00/44.0k [00:00<?, ?B/s]

policy.optimizer.pth:   0%|          | 0.00/45.0k [00:00<?, ?B/s]

dqn-LunarLander-v2.zip:   0%|          | 0.00/105k [00:00<?, ?B/s]

Upload 4 LFS files:   0%|          | 0/4 [00:00<?, ?it/s]

pytorch_variables.pth:   0%|          | 0.00/431 [00:00<?, ?B/s]

[38;5;4mℹ Your model is pushed to the Hub. You can view your model here:
https://huggingface.co/nsanghi/dqn-LunarLander-v2/tree/main/[0m


'https://huggingface.co/nsanghi/dqn-LunarLander-v2/tree/main/'

In [19]:
!pip install huggingface_sb3

  and should_run_async(code)


Collecting huggingface_sb3
  Downloading huggingface_sb3-2.2.5-py3-none-any.whl (9.5 kB)
Collecting huggingface-hub~=0.8 (from huggingface_sb3)
  Downloading huggingface_hub-0.16.2-py3-none-any.whl (268 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m268.5/268.5 kB[0m [31m8.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: huggingface-hub, huggingface_sb3
Successfully installed huggingface-hub-0.16.2 huggingface_sb3-2.2.5


In [20]:
from huggingface_sb3 import load_from_hub, package_to_hub, push_to_hub
from huggingface_hub import notebook_login # To log to our Hugging Face account to be able to upload models to the Hub.

notebook_login()
!git config --global credential.helper store

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [16]:
!pwd

/content
