<a href="https://colab.research.google.com/github/jbpacker/deep-rl-class/blob/main/unit1/HuggingFace_Unit_1_%F0%9F%9A%80.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Unit 1: Train your first Deep Reinforcement Learning Agent 🚀

📄 Copied from [huggingface/deep-rl-class](https://github.com/huggingface/deep-rl-class)

🎮 Environment: [LunarLander-v2](https://www.gymlibrary.ml/environments/box2d/lunar_lander/)

📚 RL-Library: [Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/)

### Step 1: Install dependencies 🔽
The first step is to install the dependencies, we’ll install multiple ones:

- `gym[box2D]`: Contains the LunarLander-v2 environment 🌛
- `stable-baselines3[extra]`: The deep reinforcement learning library.
- `huggingface_sb3`: Additional code for Stable-baselines3 to load and upload models from the Hugging Face 🤗 Hub.

In [None]:
!apt install python-opengl
!apt install ffmpeg
!apt install xvfb
!pip3 install pyvirtualdisplay

# Virtual display
from pyvirtualdisplay import Display

virtual_display = Display(visible=0, size=(1400, 900))
virtual_display.start()

In [None]:
!pip install gym[box2d]
!pip install stable-baselines3[extra]
!pip install huggingface_sb3
!pip install pyglet
!pip install ale-py==0.7.4 # To overcome an issue with gym (https://github.com/DLR-RM/stable-baselines3/issues/875)
!pip install wandb

In [None]:
# Visualizing
!pip install gym pyvirtualdisplay > /dev/null 2>&1
!apt-get install -y xvfb python-opengl ffmpeg > /dev/null 2>&1
!pip install colabgymrender==1.0.2

### Step 2: Import the packages 📦

You can see here all the Deep reinforcement Learning models available 👉 https://huggingface.co/models?pipeline_tag=reinforcement-learning&sort=downloads


In [None]:
import gym
import torch as th
import wandb
from wandb.integration.sb3 import WandbCallback

from huggingface_sb3 import load_from_hub, package_to_hub, push_to_hub
from huggingface_hub import notebook_login # To log to our Hugging Face account to be able to upload models to the Hub.

from stable_baselines3 import PPO
from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.monitor import Monitor
from stable_baselines3.common.vec_env import DummyVecEnv, VecVideoRecorder

wandb.login(relogin=True)

### Step 5: Create the Model 🤖
- Using [Stable Baselines3 (SB3)](https://stable-baselines3.readthedocs.io/en/master/).

In [5]:
from typing import Callable

def linear_schedule(initial_value: float) -> Callable[[float], float]:
    """
    Linear learning rate schedule.

    :param initial_value: Initial learning rate.
    :return: schedule that computes
      current learning rate depending on remaining progress
    """
    def func(progress_remaining: float) -> float:
        """
        Progress will decrease from 1 (beginning) to 0.

        :param progress_remaining:
        :return: current learning rate
        """
        return progress_remaining * initial_value

    return func

In [6]:
## Parameters
n_steps = 2048
batch_size = 2048
n_epochs = 5
init_learning_rate = 0.005
update_steps = 1
total_timesteps = 2000000
num_simulations = 18

env_id = "LunarLander-v2"
model_architecture = "PPO"
policy = 'MlpPolicy'
# default net_arch = [dict(pi=[64, 64], vf=[64, 64])]
policy_kwargs = dict(activation_fn=th.nn.ReLU,
                     net_arch=[dict(pi=[256, 256], vf=[256, 256])])
model_name = f"{model_architecture}-{policy}-{env_id}"
record_video_every_n_steps = n_steps * 4

config = {
    "env_id": env_id,
    "model_architecture":model_architecture,
    "policy": policy,
    "model_name":model_name,
    "total_timesteps": total_timesteps,
    "batch_size": batch_size,
    "n_steps":n_steps,
    "update_steps":update_steps,
}


## Set up logging
run = wandb.init(project="HuggingFace_1", 
                 config=config,
                 sync_tensorboard=True, # auto-upload sb3's tensorboard metrics
                 monitor_gym=True,  # auto-upload the videos of agents playing the game
                 save_code=True)

## Make the environment
def make_env():
    env = gym.make(config["env_id"])
    env = Monitor(env)  # record stats such as returns
    return env

env = DummyVecEnv([make_env] * num_simulations)
env = VecVideoRecorder(
    env, 
    f"videos/{run.id}", 
    record_video_trigger=lambda x: x % record_video_every_n_steps == 0, 
    video_length=200
)

## Make the model
model = PPO(
    policy = policy,
    policy_kwargs=policy_kwargs,
    env = env,
    n_steps = n_steps,
    learning_rate=linear_schedule(init_learning_rate),
    batch_size = batch_size,
    tensorboard_log=f"runs/{run.id}"
) 

## Train!
model.learn(
    total_timesteps=config["total_timesteps"],
    callback=WandbCallback(
        verbose=2,
        model_save_path=f"models/{run.id}"
    )
)
run.finish()

model.save(model_name)
model=PPO.load(model_name)

[34m[1mwandb[0m: Currently logged in as: [33mjefsnacker[0m. Use [1m`wandb login --relogin`[0m to force relogin


Saving video to /content/videos/24cnevl2/rl-video-step-0-to-step-200.mp4
Saving video to /content/videos/24cnevl2/rl-video-step-8192-to-step-8392.mp4
Saving video to /content/videos/24cnevl2/rl-video-step-16384-to-step-16584.mp4
Saving video to /content/videos/24cnevl2/rl-video-step-24576-to-step-24776.mp4
Saving video to /content/videos/24cnevl2/rl-video-step-32768-to-step-32968.mp4
Saving video to /content/videos/24cnevl2/rl-video-step-40960-to-step-41160.mp4
Saving video to /content/videos/24cnevl2/rl-video-step-49152-to-step-49352.mp4
Saving video to /content/videos/24cnevl2/rl-video-step-57344-to-step-57544.mp4


VBox(children=(Label(value='11.260 MB of 11.260 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, m…

0,1
global_step,▁▁▁▂▂▂▂▃▃▃▃▄▄▄▄▅▅▅▅▅▆▆▆▆▇▇▇▇███
rollout/ep_len_mean,▁▁▁▁▂▂▅▆▇███████▅▃▃▂▂▂▂▂▂▂▂▂▂▂▂
rollout/ep_rew_mean,▁▂▃▃▃▃▄▅▅▅▆▆▆▆▆▆▇▇█████████████
time/fps,▁▅▇█▅▅▄▄▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▃▂▃▃▃▃▃▃
train/approx_kl,▅▅▅▅▅▄▄▃▄▃▃▃▃▂▂▇█▄▂▂▂▁▂▂▂▂▁▂▁▁
train/clip_fraction,▅▇██▇▅▃▄▄▃▃▃▃▂▃▅▄▃▂▂▂▁▂▂▂▂▁▂▁▁
train/clip_range,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
train/entropy_loss,▁▁▂▂▃▃▃▃▃▃▄▄▅▅▆▇▇▆▆▆▇▇▇▇▇▇▇███
train/explained_variance,▁▆▇▇▇▇▇▇████████▇▆▇▇██████████
train/learning_rate,███▇▇▇▇▆▆▆▆▅▅▅▅▄▄▄▄▃▃▃▃▂▂▂▂▁▁▁

0,1
global_step,2031616.0
rollout/ep_len_mean,173.19
rollout/ep_rew_mean,273.76819
time/fps,941.0
train/approx_kl,0.00205
train/clip_fraction,0.00817
train/clip_range,0.2
train/entropy_loss,-0.55624
train/explained_variance,0.96354
train/learning_rate,8e-05


### Step 7: Evaluate the agent 📈

- [sb3 documentation](https://stable-baselines3.readthedocs.io/en/master/guide/examples.html#basic-usage-training-saving-loading)

In [7]:
# Create a new environment for evaluation
eval_env = gym.make("LunarLander-v2")

# Evaluate the model
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)
print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")



mean_reward=289.44 +/- 12.699150906928754


### Visualize final model

In [8]:
import gym
from colabgymrender.recorder import Recorder

env = gym.make('LunarLander-v2')
directory = './video'
env = Recorder(env, directory)

obs = env.reset()
done = False
while not done:
  action, _state = model.predict(obs, deterministic=True)
  obs, reward, done, info = env.step(action)

env.play()

Imageio: 'ffmpeg-linux64-v3.3.1' was not found on your computer; downloading it now.
Try 1. Download from https://github.com/imageio/imageio-binaries/raw/master/ffmpeg/ffmpeg-linux64-v3.3.1 (43.8 MB)
Downloading: 8192/45929032 bytes (0.0%)2457600/45929032 bytes (5.4%)5816320/45929032 bytes (12.7%)9125888/45929032 bytes (19.9%)12509184/45929032 bytes (27.2%)15679488/45929032 bytes (34.1%)18997248/45929032 bytes (41.4%)22372352/45929032 bytes (48.7%)25747456/45929032 bytes (56.1%)29196288/45929032 bytes (63.6%)32432128/45929032 bytes (70.6%)35880960/45929032 bytes (78.1%)39337984/45929032 bytes (85.6%)

100%|██████████| 228/228 [00:00<00:00, 294.29it/s]


### Step 8: Publish our trained model on the Hub 🔥

📚 The libraries documentation 👉 https://github.com/huggingface/huggingface_sb3/tree/main#hugging-face--x-stable-baselines3-v20

### Step 8: Publish our trained model on the Hub 🔥
Use token here: https://huggingface.co/settings/tokens

In [9]:
notebook_login()
!git config --global credential.helper store

Login successful
Your token has been saved to /root/.huggingface/token


In [10]:
import gym

from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv
from stable_baselines3.common.env_util import make_vec_env

from huggingface_sb3 import package_to_hub

## repo_id is the id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name} for instance ThomasSimonini/ppo-LunarLander-v2
repo_id = "jefsnacker/rl_class"

## Define the commit message
commit_message = "bigger and better model"

# Create the evaluation env
eval_env = DummyVecEnv([lambda: gym.make(env_id)])

wandb.init()

# method save, evaluate, generate a model card and record a replay video of your agent before pushing the repo to the hub
package_to_hub(model=model, # Our trained model
               model_name=model_name, # The name of our trained model 
               model_architecture=model_architecture, # The model architecture we used: in our case PPO
               env_id=env_id, # Name of the environment
               eval_env=eval_env, # Evaluation Environment
               repo_id=repo_id, # id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name} for instance ThomasSimonini/ppo-LunarLander-v2
               commit_message=commit_message)

wandb.finish()

[38;5;4mℹ This function will save, evaluate, generate a video of your agent,
create a model card and push everything to the hub. It might take up to 1min.
This is a work in progress: If you encounter a bug, please open an issue and use
push_to_hub instead.[0m


Cloning https://huggingface.co/jefsnacker/rl_class into local empty directory.


Download file replay.mp4:   2%|1         | 3.48k/200k [00:00<?, ?B/s]

Clean file replay.mp4:   0%|          | 1.00k/200k [00:00<?, ?B/s]

Saving video to /content/-step-0-to-step-1000.mp4
[38;5;4mℹ Pushing repo rl_class to the Hugging Face Hub[0m


Upload file PPO-MlpPolicy-LunarLander-v2.zip:   0%|          | 3.34k/1.60M [00:00<?, ?B/s]

Upload file PPO-MlpPolicy-LunarLander-v2/policy.pth:   1%|          | 3.34k/541k [00:00<?, ?B/s]

Upload file replay.mp4:   2%|1         | 3.34k/194k [00:00<?, ?B/s]

Upload file PPO-MlpPolicy-LunarLander-v2/policy.optimizer.pth:   0%|          | 3.34k/1.06M [00:00<?, ?B/s]

remote: Enforcing permissions...        
remote: Allowed refs: all        
To https://huggingface.co/jefsnacker/rl_class
   afb22f5..882423c  main -> main



[38;5;4mℹ Your model is pushed to the hub. You can view your model here:
https://huggingface.co/jefsnacker/rl_class[0m


VBox(children=(Label(value='0.204 MB of 0.204 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…