# Stable Baselines3 Demo

Few things about Stable Baselines3:
* `stable-baselines3` https://github.com/DLR-RM/stable-baselines3 contains the core algorithms.
* `rl-baselines3-zoo` https://github.com/DLR-RM/rl-baselines3-zoo contains additional scripts for training, evaluation, tuning and recording.

In [1]:
# Install dependencies
# pip install stable-baselines3[extra]
#!apt-get update && apt-get install swig cmake
#!pip install box2d-py
#!pip install "stable-baselines3[extra]>=2.0.0a4"

### Demo - Playing Donkey Kong using DQN

Suppose that we have to design a RL agent to play the atari variant of [Donkey Kong](https://gymnasium.farama.org/environments/atari/donkey_kong/). The following information is given:

| Information        | Value                       |
|--------------------|-----------------------------|
| Action Space       | Discrete(18)                |
| Observation Space  | Box(0, 255, (210, 160, 3), uint8) |
| Import             | `gymnasium.make("ALE/DonkeyKong-v5")` |

[Deep Q-learning](https://en.wikipedia.org/wiki/Q-learning) was originally demonstrated as working directly with frames from atari games (the state space are images from the game). This is made possible because the action value function $q(s,a, \mathbf{w})$ is approximated using a convolutional neural network (CNN), which naturally handles image data. Of course, DQN can also be used with other function approximators. But when using a Multilayer Perceptron (MLP) for example, feature vectors have to be used.

Deep Q-learning can be seen as an extension of Q-learning. As in Q-learning $\epsilon$-greedy action selection is used for exploration and the target is a deterministic greedy policy:
$$
    \pi_*(s) = \underset{a \in \mathcal{A}}{\text{argmax}} \; q_*(s,a).
$$
Just like Q-learning, DQN is not compatible with continuous action spaces. As can be seen from the stable-baselines3 documentation on which gymnasium space is supported:

| Space            | Action     | Observation |
|------------------|------------|-------------|
| Discrete         | ✔️          | ✔️          |
| Box              | ❌          | ✔️          |
| MultiDiscrete    | ❌          | ✔️          |
| MultiBinary      | ❌          | ✔️          |
| Dict             | ❌          | ✔️          |




In [2]:
import gymnasium as gym
from stable_baselines3 import DQN
from stable_baselines3.common.vec_env import VecFrameStack
from stable_baselines3.common.env_util import make_atari_env
from ale_py import ALEInterface
from ale_py.roms import DonkeyKong

ale = ALEInterface()
ale.loadROM(DonkeyKong)

A.L.E: Arcade Learning Environment (version 0.8.1+53f58b7)
[Powered by Stella]
Game console created:
  ROM file:  /home/matthijs/bsc/BachelorProject/.venv/lib/python3.10/site-packages/AutoROM/roms/donkey_kong.bin
  Cart Name: Donkey Kong (198x)
  Cart MD5:  36b20c427975760cb9cf4a47e41369e4
  Display Format:  AUTO-DETECT ==> NTSC
  ROM Size:        4096
  Bankswitch Type: AUTO-DETECT ==> 4K

Running ROM file...
Random seed is 1710203264


In [3]:
env = gym.make('ALE/Breakout-v5')

A.L.E: Arcade Learning Environment (version 0.8.1+53f58b7)
[Powered by Stella]


First, we want to do some common pre-processing that is done on atari environments. This is done with the `make_atari_env` function.

In [4]:
vec_env = make_atari_env("PongNoFrameskip-v4", n_envs=4, seed=0)

First, we instantiate a DQN object. The first argument of the constructor is

In [5]:
env = gym.make("ALE/DonkeyKong-v5", render_mode="human")

model = DQN(
    "CnnPolicy",               # What model to use to approximate Q-function.
    env,
    verbose=1,
    exploration_final_eps=0.1,
    target_update_interval=250,
)

Using cuda device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Wrapping the env in a VecTransposeImage.


MemoryError: Unable to allocate 93.9 GiB for an array with shape (1000000, 1, 3, 210, 160) and data type uint8