<a href="https://colab.research.google.com/github/migolan/RL-notebooks/blob/main/HF_RL_unit3_distilled.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Intro
This notebook is based on https://huggingface.co/learn/deep-rl-course.

* Train a DQN Agent using [RL-Baselines3-Zoo](https://github.com/DLR-RM/rl-baselines3-zoo), which is built over [StableBaselines3](https://github.com/DLR-RM/stable-baselines3), on the [SpaceInvadersNoFrameskip-v4](https://ale.farama.org/environments/space_invaders) gymnasium environment
* Load and evaluate trained models from the [HF SB3 zoo](https://huggingface.co/sb3)

# Installations

In [None]:
%%capture

!pip install git+https://github.com/DLR-RM/rl-baselines3-zoo
!apt-get install swig cmake ffmpeg
!pip install gymnasium[atari]
!pip install gymnasium[accept-rom-license]
!apt install python-opengl
!apt install xvfb
!pip3 install pyvirtualdisplay

# Create a training hyperparameter config file for RL-Baselines3-Zoo

`dqn.yml`:

```
SpaceInvadersNoFrameskip-v4:
  env_wrapper:
    - stable_baselines3.common.atari_wrappers.AtariWrapper
  frame_stack: 4
  policy: 'CnnPolicy'
  n_timesteps: !!float 1e6
  buffer_size: 100000
  learning_rate: !!float 1e-4
  batch_size: 32
  learning_starts: 100000
  target_update_interval: 1000
  train_freq: 4
  gradient_steps: 1
  exploration_fraction: 0.1
  exploration_final_eps: 0.01
  # If True, you need to deactivate handle_timeout_termination
  # in the replay_buffer_kwargs
  optimize_memory_usage: False
```

- `Atari Wrapper` preprocesses the input (frame reduction ,grayscale, stack 4 frames)
- `CnnPolicy` uses Convolutional layers to process the frames
- Train for `n_timesteps`
- `buffer_size` (experience replay) is the number of experience steps saved to train again the agent with

This will take a long time, don't use over 1M timesteps (will take about an hour).

In order to optimize hyperparamteres, focus on these 3:
- `learning_rate`
- `buffer_size (Experience Memory size)`
- `batch_size`

Check the documentation to understand what each hyperparameters does: https://stable-baselines3.readthedocs.io/en/master/modules/dqn.html#parameters

# Train a Deep Q-Learning Agent to Play Space Invaders

In [None]:
!python -m rl_zoo3.train  --algo dqn  --env SpaceInvadersNoFrameskip-v4  -f logs/  -c dqn.yml

2024-12-28 08:00:50.267470: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-12-28 08:00:50.288026: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-12-28 08:00:50.293833: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-12-28 08:00:50.309205: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Seed: 3060691216
Loading hyperparameters from

# Evaluate the agent
RL-Baselines3-Zoo provides `enjoy.py`, a python script to evaluate the agent. In most RL libraries, evaluation scripts are called `enjoy.py`.

In [None]:
!python -m rl_zoo3.enjoy  --algo dqn  --env SpaceInvadersNoFrameskip-v4  --n-timesteps 5000  --folder logs/

2024-12-28 08:04:21.972376: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-12-28 08:04:22.005516: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-12-28 08:04:22.015682: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-12-28 08:04:22.039978: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Loading latest experiment, id=1
Loading logs/

# Generate replay video

In [None]:
!python -m rl_zoo3.record_video  --algo dqn  --env SpaceInvadersNoFrameskip-v4  -n 1000  --load-best  -f logs/

2024-12-28 08:07:46.437996: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-12-28 08:07:46.472235: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-12-28 08:07:46.482801: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-12-28 08:07:46.506112: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Loading latest experiment, id=1
Loading logs/

# Publish the trained model on the HF Hub

In [None]:
from huggingface_hub import notebook_login # To log to our Hugging Face account to be able to upload models to the Hub.
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
!python -m rl_zoo3.push_to_hub  --algo dqn  --no-render  --env SpaceInvadersNoFrameskip-v4  --repo-name dqn-SpaceInvadersNoFrameskip-v4  -orga migolan  -f logs/

2024-12-28 08:10:31.592453: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-12-28 08:10:31.613581: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-12-28 08:10:31.620020: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-12-28 08:10:31.634898: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Loading latest experiment, id=1
Loading logs/

# Load and evaluate trained models from the HF SB3 zoo
The Stable-Baselines3 team has uploaded more than 150 trained Deep Reinforcement Learning agents on the HF Hub.

You can find them here: https://huggingface.co/sb3

In [None]:
# Download model and save it into the logs/ folder
!python -m rl_zoo3.load_from_hub --algo dqn --env BeamRiderNoFrameskip-v4 -orga sb3 -f rl_trained/
# Evaluate model
!python -m rl_zoo3.enjoy --algo dqn --env BeamRiderNoFrameskip-v4 -n 5000  -f rl_trained/ --no-render

# Some additional challenges
The best way to learn **is to try things by your own**!

Here's a list of environments you can try to train your agent with:
- BeamRiderNoFrameskip-v4
- BreakoutNoFrameskip-v4
- EnduroNoFrameskip-v4
- PongNoFrameskip-v4

**If you want to learn to implement Deep Q-Learning by yourself**, look at CleanRL implementation: https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/dqn_atari.py
