### **Step 01** : Setting up the environment / Installing dependencies

In [1]:
!nvidia-smi

Fri May 13 08:17:37 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   38C    P8    26W / 149W |      0MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [2]:
!apt install python-opengl
!apt install ffmpeg
!apt install xvfb
!pip3 install pyvirtualdisplay

# Virtual display
from pyvirtualdisplay import Display

virtual_display = Display(visible=0, size=(1400, 900))
virtual_display.start()

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages were automatically installed and are no longer required:
  libnvidia-common-460 nsight-compute-2020.2.0
Use 'apt autoremove' to remove them.
Suggested packages:
  libgle3
The following NEW packages will be installed:
  python-opengl
0 upgraded, 1 newly installed, 0 to remove and 42 not upgraded.
Need to get 496 kB of archives.
After this operation, 5,416 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 python-opengl all 3.1.0+dfsg-1 [496 kB]
Fetched 496 kB in 1s (373 kB/s)
Selecting previously unselected package python-opengl.
(Reading database ... 155203 files and directories currently installed.)
Preparing to unpack .../python-opengl_3.1.0+dfsg-1_all.deb ...
Unpacking python-opengl (3.1.0+dfsg-1) ...
Setting up python-opengl (3.1.0+dfsg-1) ...
Reading package lists... Done
Building dependency tree       
Reading s

<pyvirtualdisplay.display.Display at 0x7f066fd80d10>

In [3]:
!pip install gym[box2d]
!pip install stable-baselines3[extra]
!pip install huggingface_sb3
!pip install pyglet
!pip install ale-py==0.7.4 # To overcome an issue with gym (https://github.com/DLR-RM/stable-baselines3/issues/875)

Collecting box2d-py~=2.3.5
  Downloading box2d_py-2.3.8-cp37-cp37m-manylinux1_x86_64.whl (448 kB)
[K     |████████████████████████████████| 448 kB 4.3 MB/s 
Installing collected packages: box2d-py
Successfully installed box2d-py-2.3.8
Collecting stable-baselines3[extra]
  Downloading stable_baselines3-1.5.0-py3-none-any.whl (177 kB)
[K     |████████████████████████████████| 177 kB 4.2 MB/s 
Collecting gym==0.21
  Downloading gym-0.21.0.tar.gz (1.5 MB)
[K     |████████████████████████████████| 1.5 MB 16.4 MB/s 
Collecting ale-py~=0.7.4
  Downloading ale_py-0.7.5-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.6 MB)
[K     |████████████████████████████████| 1.6 MB 37.3 MB/s 
Collecting autorom[accept-rom-license]~=0.4.2
  Downloading AutoROM-0.4.2-py3-none-any.whl (16 kB)
Collecting AutoROM.accept-rom-license
  Downloading AutoROM.accept-rom-license-0.4.2.tar.gz (9.8 kB)
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l

### **Step 02** : Import the packages

In [4]:
import gym

from huggingface_sb3 import load_from_hub, package_to_hub, push_to_hub
from huggingface_hub import notebook_login # To log to Hugging Face account to be able to upload models to the Hub.

from stable_baselines3 import PPO
from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.common.env_util import make_vec_env

### **Step 03** : Create a vectorized LunarLander environment

In [13]:
# Create the environment
env = make_vec_env('LunarLander-v2', n_envs=32)

### **Step 04** : Create the model / Instantiate the Agent

In [14]:
# Define a PPO MlpPolicy architecture
# We use MultiLayerPerceptron (MLPPolicy) because the input is a vector,
# if we had frames as input we would use CnnPolicy

model = PPO(
    policy = 'MlpPolicy',
    env = env,
    n_steps = 2048,
    batch_size = 128,
    n_epochs = 16,
    gamma = 0.999,
    gae_lambda = 0.98,
    ent_coef = 0.01,
    verbose=1)

Using cuda device


### **Step 05** : Train the PPO [Proximal Policy Optimization] Agent

In [15]:
# Train it for 1,000,000 timesteps
model.learn(total_timesteps=1000000)
# Save the model
model_name = "ppo-LunarLander-v2"
model.save(model_name)

---------------------------------
| rollout/           |          |
|    ep_len_mean     | 92.5     |
|    ep_rew_mean     | -179     |
| time/              |          |
|    fps             | 3388     |
|    iterations      | 1        |
|    time_elapsed    | 19       |
|    total_timesteps | 65536    |
---------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 92.4        |
|    ep_rew_mean          | -121        |
| time/                   |             |
|    fps                  | 1383        |
|    iterations           | 2           |
|    time_elapsed         | 94          |
|    total_timesteps      | 131072      |
| train/                  |             |
|    approx_kl            | 0.009778874 |
|    clip_fraction        | 0.104       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.38       |
|    explained_variance   | -0.000511   |
|    learning_rate        | 0.

### **Step 06** : Evaluate the Agent

In [16]:
# Create a new environment for evaluation
eval_env = gym.make("LunarLander-v2")

# Evaluate the model with 10 evaluation episodes and deterministic=True
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)

# Print the results
print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")



mean_reward=254.54 +/- 18.705463472155806


### **Step 07** : Publish the trained model on the Hub

In [17]:
notebook_login()
!git config --global credential.helper store

Login successful
Your token has been saved to /root/.huggingface/token


In [18]:
import gym

from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv
from stable_baselines3.common.env_util import make_vec_env

from huggingface_sb3 import package_to_hub

# PLACE the variables you've just defined two cells above
# Define the name of the environment
env_id = "LunarLander-v2"

# TODO: Define the model architecture we used
model_architecture = "PPO"

## Define a repo_id
## repo_id is the id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name} for instance ThomasSimonini/ppo-LunarLander-v2
## CHANGE WITH YOUR REPO ID
repo_id = "QuickSilver007/rlunit1_ppo-LunarLander-v2"

## Define the commit message
commit_message = "Upload PPO LunarLander-v2 trained agent based on RL Course Unit 1 with modified hyperparameters."

# Create the evaluation env
eval_env = DummyVecEnv([lambda: gym.make(env_id)])

# PLACE the package_to_hub function you've just filled here
package_to_hub(model=model, # Our trained model
               model_name=model_name, # The name of our trained model 
               model_architecture=model_architecture, # The model architecture we used: in our case PPO
               env_id=env_id, # Name of the environment
               eval_env=eval_env, # Evaluation Environment
               repo_id=repo_id, # id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name} for instance ThomasSimonini/ppo-LunarLander-v2
               commit_message=commit_message)


[38;5;4mℹ This function will save, evaluate, generate a video of your agent,
create a model card and push everything to the hub. It might take up to 1min.
This is a work in progress: If you encounter a bug, please open an issue and use
push_to_hub instead.[0m


Cloning https://huggingface.co/QuickSilver007/rlunit1_ppo-LunarLander-v2 into local empty directory.


Download file replay.mp4:   4%|4         | 8.37k/187k [00:00<?, ?B/s]

Clean file replay.mp4:   1%|          | 1.00k/187k [00:00<?, ?B/s]

Download file ppo-LunarLander-v2/policy.pth:  20%|#9        | 8.37k/42.2k [00:00<?, ?B/s]

Download file ppo-LunarLander-v2/pytorch_variables.pth: 100%|##########| 431/431 [00:00<?, ?B/s]

Clean file ppo-LunarLander-v2/pytorch_variables.pth: 100%|##########| 431/431 [00:00<?, ?B/s]

Download file ppo-LunarLander-v2/policy.optimizer.pth:   2%|1         | 1.58k/82.8k [00:00<?, ?B/s]

Download file ppo-LunarLander-v2.zip:   2%|2         | 3.48k/141k [00:00<?, ?B/s]

Clean file ppo-LunarLander-v2/policy.pth:   2%|2         | 1.00k/42.2k [00:00<?, ?B/s]

Clean file ppo-LunarLander-v2/policy.optimizer.pth:   1%|1         | 1.00k/82.8k [00:00<?, ?B/s]



Saving video to /content/-step-0-to-step-1000.mp4
[38;5;4mℹ Pushing repo rlunit1_ppo-LunarLander-v2 to the Hugging Face Hub[0m


Upload file replay.mp4:   1%|1         | 3.34k/244k [00:00<?, ?B/s]

Upload file ppo-LunarLander-v2.zip:   2%|2         | 3.34k/141k [00:00<?, ?B/s]

Upload file ppo-LunarLander-v2/policy.optimizer.pth:   4%|4         | 3.34k/82.9k [00:00<?, ?B/s]

Upload file ppo-LunarLander-v2/policy.pth:   8%|7         | 3.34k/42.2k [00:00<?, ?B/s]

remote: Enforcing permissions...        
remote: Allowed refs: all        
To https://huggingface.co/QuickSilver007/rlunit1_ppo-LunarLander-v2
   479b48c..5ff169b  main -> main



[38;5;4mℹ Your model is pushed to the hub. You can view your model here:
https://huggingface.co/QuickSilver007/rlunit1_ppo-LunarLander-v2[0m


'https://huggingface.co/QuickSilver007/rlunit1_ppo-LunarLander-v2'