<a href="https://colab.research.google.com/github/nsanghi/drl-2ed/blob/main/chapter2/listing2_4-second_agent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Train your Second Agent

In this Notebook we will explore yet another environment, train the agent and then share the result using Huggingface ecosystem.


## Running in Colab

If you are running this on Colab, please uncomment below cell and run this to install required dependencies.



In [None]:
#uncomment and execute this cell to install all the the dependencies if running in Google Colab

# !apt-get update && apt-get install swig cmake ffmpeg freeglut3-dev xvfb
# !pip install box2d-py
# !pip install "stable-baselines3[extra]>=2.0.0a4"
# !pip install git+https://github.com/huggingface/huggingface_sb3@gymnasium

## Imports

In [1]:
import gymnasium as gym

from stable_baselines3.common.env_util import make_atari_env
from stable_baselines3.common.vec_env import VecFrameStack
from stable_baselines3 import A2C

  if not hasattr(tensorboard, "__version__") or LooseVersion(


## Create the Gym env and instantiate the agent

For this example, we will use an environment from Atari Simulator. Atari environments are simulated via the Arcade Learning Environment (ALE) [1] through the [Stella](https://github.com/stella-emu/stella) and the [Arcade Learning Environment](https://github.com/mgbellemare/Arcade-Learning-Environment).

![Pong](https://gymnasium.farama.org/_images/pong.gif)


We will use [Pong](https://gymnasium.farama.org/environments/atari/pong/) from the Atari game simulator. We will be using a specific variant  `PongNoFrameskip-v4`. For now we will not delve into the details except telling you that in this case the state is the image of the game and actions are the game controller actions. As we will be using images as state, we will use a different variation of policy network called `CnnPolicy` which is based on CNNs (Convolutional Neural Networks).

Last notebook we used DQN as the algorithm to train the agent. In this notebook we will be used [A2C algorithm](https://stable-baselines3.readthedocs.io/en/master/modules/a2c.html). A2C is a synchronous, deterministic variant of Asynchronous Advantage Actor Critic ([A3C](https://arxiv.org/abs/1602.01783)).

In [3]:
# There already exists an environment generator
# that will make and wrap atari environments correctly.
# Here we are also multi-worker training (n_envs=4 => 4 environments)
env_id = "PongNoFrameskip-v4"

vec_env = make_atari_env(env_id, n_envs=4, seed=0)
# Frame-stacking with 4 frames
vec_env = VecFrameStack(vec_env, n_stack=4)

model = A2C("CnnPolicy", vec_env, verbose=1)


Using cuda device
Wrapping the env in a VecTransposeImage.


## Train the Agent

It is advisable to run this on an envirnment with GPU as otherwise it may take long to run this

In [4]:
# If you have a GPU, you can increase the `total_timesteps` to something like 1_000_000 i.e. one million
# It would take about 45 mins to train
# The more you train the better the result will be

model.learn(total_timesteps=20_000, log_interval=500, progress_bar=True)

Output()

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 3.66e+03 |
|    ep_rew_mean        | -20.4    |
| time/                 |          |
|    fps                | 321      |
|    iterations         | 500      |
|    time_elapsed       | 31       |
|    total_timesteps    | 10000    |
| train/                |          |
|    entropy_loss       | -1.66    |
|    explained_variance | -0.00417 |
|    learning_rate      | 0.0007   |
|    n_updates          | 499      |
|    policy_loss        | -0.25    |
|    value_loss         | 0.183    |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 3.61e+03 |
|    ep_rew_mean        | -20.5    |
| time/                 |          |
|    fps                | 357      |
|    iterations         | 1000     |
|    time_elapsed       | 55       |
|    total_timesteps    | 20000    |
| train/                |          |
|

<stable_baselines3.a2c.a2c.A2C at 0x7c1f18fc9c90>

## Generate a Video

In [5]:
from stable_baselines3.common.vec_env import VecVideoRecorder, DummyVecEnv

video_length = 1000
video_folder = "logs/videos/"

vec_env = make_atari_env(env_id, n_envs=1, seed=0)
# Frame-stacking with 4 frames
vec_env = VecFrameStack(vec_env, n_stack=4)

obs = vec_env.reset()

# Record the video starting at the first step
vec_env = VecVideoRecorder(vec_env, video_folder,
                       record_video_trigger=lambda x: x == 0, video_length=video_length,
                       name_prefix=f"a2c-agent-{env_id}")

vec_env.reset()
for _ in range(video_length + 1):
  action, _state = model.predict(obs)
  obs, _, _, _ = vec_env.step(action)
# Save the video
vec_env.close()


  logger.warn(


Saving video to /content/logs/videos/a2c-agent-PongNoFrameskip-v4-step-0-to-step-1000.mp4
Moviepy - Building video /content/logs/videos/a2c-agent-PongNoFrameskip-v4-step-0-to-step-1000.mp4.
Moviepy - Writing video /content/logs/videos/a2c-agent-PongNoFrameskip-v4-step-0-to-step-1000.mp4



                                                                 

Moviepy - Done !
Moviepy - video ready /content/logs/videos/a2c-agent-PongNoFrameskip-v4-step-0-to-step-1000.mp4




## Watch the Video

To get better results and if you have GPU, increase the `total_timesteps` to something like `1_000_000` in `model.learn` call above.

In [6]:
from IPython.display import HTML
from base64 import b64encode
mp4 = open('./logs/videos/a2c-agent-PongNoFrameskip-v4-step-0-to-step-1000.mp4','rb').read()
data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
HTML("""
<video width=400 controls>
      <source src="%s" type="video/mp4">
</video>
""" % data_url)

## Share the Video on Huggingface Account

### First let us login into our huggingface account

In [7]:
from huggingface_sb3 import load_from_hub, package_to_hub, push_to_hub
from huggingface_hub import notebook_login # To log to our Hugging Face account to be able to upload models to the Hub.

notebook_login()
!git config --global credential.helper store

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

### Push to HuggingFace

You can execute the code below to push the trained agent to huggingface hub. Towards the end of output there will be a link of the hosted model that you can share with your friends and family. Share the url without the ending paths "tree/main" so that the url takes them to Model Card tab where they can see the animation. e.g. in my case it will be:<br/>
`https://huggingface.co/nsanghi/a2c-Atari-Pong/`<br/>
instead of<br/>
`https://huggingface.co/nsanghi/a2c-Atari-Pong/tree/main/`

NOTE: At the time of writing this book, there is a bug in push_to_hub which results in sample video not getting created. You can refer to this bug here - https://github.com/huggingface/huggingface_sb3/issues/33



In [9]:
eval_env = make_atari_env(env_id, n_envs=1, seed=0)
# Frame-stacking with 4 frames
eval_env = VecFrameStack(eval_env, n_stack=4)

obs = vec_env.reset()

# method save, evaluate, generate a model card and record a replay video of your agent before pushing the repo to the hub

# Please note repo_id is of the form <huggingface_id>/<name of repo>
# you will need to change this to "<your_huggingface_id>/dqn-LunarLander-v2"

package_to_hub(model=model, # Our trained model
               model_name="A2C-Atari-Pong", # The name of our trained model
               model_architecture="A2C", # The model architecture we used: in our case PPO
               env_id=env_id, # Name of the environment
               eval_env=eval_env, # Evaluation Environment
               repo_id="nsanghi/a2c-Atari-Pong", # id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name} for instance ThomasSimonini/ppo-LunarLander-v2
               commit_message="Push to Hub")

# Note: if after running the package_to_hub function and it gives an issue of rebasing, please run the following code
# cd <path_to_repo> && git add . && git commit -m "Add message" && git pull
# And don't forget to do a "git push" at the end to push the change to the hub.

Moviepy - Building video /content/logs/videos/a2c-agent-PongNoFrameskip-v4-step-1001-to-step-2001.mp4.
Moviepy - Writing video /content/logs/videos/a2c-agent-PongNoFrameskip-v4-step-1001-to-step-2001.mp4



                                                  

Moviepy - Done !
Moviepy - video ready /content/logs/videos/a2c-agent-PongNoFrameskip-v4-step-1001-to-step-2001.mp4
[38;5;4mℹ This function will save, evaluate, generate a video of your agent,
create a model card and push everything to the hub. It might take up to 1min.
This is a work in progress: if you encounter a bug, please open an issue.[0m




shimmy.atari_env:AtariEnv
[38;5;1m✘ VecEnvWrapper.render() got an unexpected keyword argument
'render_mode'[0m
[38;5;1m✘ We are unable to generate a replay of your agent, the package_to_hub
process continues[0m
[38;5;1m✘ Please open an issue at
https://github.com/huggingface/huggingface_sb3/issues[0m
Moviepy - Building video /tmp/tmpdk41g9iy/-step-0-to-step-1000.mp4.
Moviepy - Writing video /tmp/tmpdk41g9iy/-step-0-to-step-1000.mp4



Exception ignored in: <function VecVideoRecorder.__del__ at 0x7c1f191363b0>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/common/vec_env/vec_video_recorder.py", line 113, in __del__
    self.close_video_recorder()
  File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/common/vec_env/vec_video_recorder.py", line 104, in close_video_recorder
    self.video_recorder.close()
  File "/usr/local/lib/python3.10/dist-packages/gymnasium/wrappers/monitoring/video_recorder.py", line 168, in close
    self.write_metadata()
  File "/usr/local/lib/python3.10/dist-packages/gymnasium/wrappers/monitoring/video_recorder.py", line 175, in write_metadata
    with open(self.metadata_path, "w") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpdk41g9iy/-step-0-to-step-1000.meta.json'


Moviepy - Done !
Moviepy - video ready /tmp/tmpdk41g9iy/-step-0-to-step-1000.mp4
[38;5;4mℹ Pushing repo nsanghi/a2c-Atari-Pong to the Hugging Face Hub[0m


A2C-Atari-Pong.zip:   0%|          | 0.00/13.8M [00:00<?, ?B/s]

[38;5;4mℹ Your model is pushed to the Hub. You can view your model here:
https://huggingface.co/nsanghi/a2c-Atari-Pong/tree/main/[0m


'https://huggingface.co/nsanghi/a2c-Atari-Pong/tree/main/'