# Deep Value-based Reinforcement Learning

<img src="https://github.com/jeremiedecock/polytechnique-inf639-2024-students/blob/main/assets/logo.jpg?raw=true" style="float: left; width: 15%" />

[CSC_53439_EP-2024](https://moodle.polytechnique.fr/course/view.php?id=19358) Lab session #1

2019-2024 Jérémie Decock

[![Open in Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/jeremiedecock/polytechnique-inf639-2024-students/blob/main/lab1_deep_value-based_reinforcement_learning.ipynb)

[![My Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/jeremiedecock/polytechnique-inf639-2024-students/main?filepath=lab1_deep_value-based_reinforcement_learning.ipynb)

[![NbViewer](https://raw.githubusercontent.com/jupyter/design/master/logos/Badges/nbviewer_badge.svg)](https://nbviewer.jupyter.org/github/jeremiedecock/polytechnique-inf639-2024-students/blob/main/lab1_deep_value-based_reinforcement_learning.ipynb)

[![Local](https://img.shields.io/badge/Local-Save%20As...-blue)](https://github.com/jeremiedecock/polytechnique-inf639-2024-students/raw/main/lab1_deep_value-based_reinforcement_learning.ipynb)

## Introduction

The aim of this lab is to provide an in-depth exploration of the most renowned value-based reinforcement learning techniques, specifically *Deep Q-Networks* and its enhancements.

In this Python notebook, you will implement and evaluate *Deep Q-Networks* (DQN) and its various adaptations.

You can either:
- open, edit and execute the notebook in *Google Colab* following this link: https://colab.research.google.com/github/jeremiedecock/polytechnique-inf639-2024-students/blob/main/lab1_deep_value-based_reinforcement_learning.ipynb ; this is the **recommended** choice as you have nothing to install on your computer
- open, edit and execute the notebook in *MyBinder* (if for any reason the Google Colab solution doesn't work): https://mybinder.org/v2/gh/jeremiedecock/polytechnique-inf639-2024-students/main?filepath=lab1_deep_value-based_reinforcement_learning.ipynb
- download, edit and execute the notebook on your computer if Python3 and JypyterLab are already installed: https://github.com/jeremiedecock/polytechnique-inf639-2024-students/raw/main/lab1_deep_value-based_reinforcement_learning.ipynb

If you work with Google Colab or MyBinder, **remember to save or download your work regularly or you may lose it!**

### Important note

This tutorial has been tested with Python 3.10 and Python 3.11.

It is important to note that **Bonus Section 4: *Test and train a DQN agent to play Atari games*** [is not compatible with Python 3.12](https://github.com/Farama-Foundation/Gymnasium/issues/1081) (the latest stable version of Python). If you plan to complete this bonus section locally on your machine rather than on Google Colab, make sure you have Python 3.10 or 3.11 installed.

If you are using Python 3.12 and prefer not to run this notebook on Google Colab, an alternative solution via Docker will be provided at the beginning of Bonus Section 4.

### Survey

Please answer the following survey to help us improve this lab session: https://moodle.polytechnique.fr/mod/questionnaire/view.php?id=535310

### Deep value-based reinforcement learning

Deep reinforcement learning methods like DQN (Deep Q-Networks) are significant advancements over tabular methods such as Q-Learning because they can handle complex, high-dimensional environments that were previously intractable. While Q-Learning is limited to environments where the state and action spaces are sufficiently small to maintain a table of values, DQN uses neural networks to approximate the Q-value function, allowing it to generalize across similar states and scale to problems with vast state spaces. This enables DQN to learn optimal policies for tasks like video games, robotic control, and other applications where the number of possible states is extraordinarily large.

While DQN was designed to tackle large environments like Atari games, the primary focus of this lab is to delve into the underlying algorithms, understand them thoroughly, and evaluate them comprehensively. It's important to note that working with not-so-deep networks captures the essence of deep reinforcement learning, excluding the computational expense. The transition from tabular Q-learning to DQN involves significant implications, primarily due to the ability of DQN to handle high-dimensional state spaces. Moving from DQN to very-deep-DQN is primarily a matter of scale and computational resources. The core principles remain the same, and understanding these principles is the key to mastering reinforcement learning, regardless of the complexity of the network used.
For these reasons, in this lab, we will focus on studying the CartPole environment. The CartPole problem is a classic in reinforcement learning, and it provides a simpler and more manageable context for understanding the principles of DQN. The convergence in the CartPole environment is much faster than in Atari games - typically within a minute, as opposed to approximately hours on a well-equipped personal computer for Atari games. This allows us to experiment and iterate more quickly, facilitating a deeper understanding of the algorithms at play.

We will therefore focus on algorithms in a simple environment. However, once the key concepts are mastered and tested in these settings, Bonus Section 4 will offer you the opportunity to refine your technical skills by applying them to the game Breakout in the Atari environment.

## Setup the Python environment

This notebook relies on several libraries including `torch`, `gymnasium`, `numpy`, `pandas`, `seaborn`, `imageio`, `pygame`, and `tqdm`.
A complete list of dependencies can be found in the provided [requirements-minimal.txt](https://raw.githubusercontent.com/jeremiedecock/polytechnique-inf639-2024-students/main/requirements-minimal.txt) and [requirements.txt](https://raw.githubusercontent.com/jeremiedecock/polytechnique-inf639-2024-students/main/requirements.txt) files.

- [requirements-minimal.txt](https://raw.githubusercontent.com/jeremiedecock/polytechnique-inf639-2024-students/main/requirements.txt) contains the minimal dependencies required to run this notebook without the optional sections (e.g. the Atari environment, Aim, Tensorboard, Optuna, Weights&Bias, ...).
- [requirements.txt](https://raw.githubusercontent.com/jeremiedecock/polytechnique-inf639-2024-students/main/requirements.txt) contains all the dependencies required to run this notebook with all the optional sections.

### If you use Google Colab

If you use Google Colab, execute the next cell to install required libraries.

In [None]:
import sys, subprocess

def is_colab():
    return "google.colab" in sys.modules

def run_subprocess_command(cmd):
    # run the command
    process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE)
    # print the output
    for line in process.stdout:
        print(line.decode().strip())

if is_colab():
    run_subprocess_command("apt install xvfb x11-utils")
    run_subprocess_command("pip install -r https://raw.githubusercontent.com/jeremiedecock/polytechnique-inf639-2024-students/main/requirements-google-colab.txt")

In [None]:
#! apt install xvfb x11-utils && pip install gymnasium pyvirtualdisplay

### If you have downloaded the notebook on your computer and execute it in your own Python environment

To set up the necessary dependencies, first download the [requirements.txt](https://raw.githubusercontent.com/jeremiedecock/polytechnique-inf639-2024-students/main/requirements.txt) or [requirements-minimal.txt](https://raw.githubusercontent.com/jeremiedecock/polytechnique-inf639-2024-students/main/requirements-minimal.txt) depending on whether you want to run the optional sections of this notebook or not (c.f. *Setup the Python environment* section above).

Ensure it is located in the same directory as this notebook. Next, run the following command to establish a [Python virtual environment (venv)](https://docs.python.org/3/library/venv.html) that includes all the essential libraries for this lab.

#### On Posix systems (Linux, MacOSX, WSL, ...)

```bash
python3 -m venv env
source env/bin/activate
python3 -m pip install --upgrade pip
python3 -m pip install -r requirements.txt
```

Adapt the name of the requirements file if you have chosen to use the minimal version.

#### On Windows

```bash
python3 -m venv env
env\Scripts\activate.bat
python3 -m pip install --upgrade pip
python3 -m pip install -r requirements.txt
```

Adapt the name of the requirements file if you have chosen to use the minimal version.

### Run notebooks locally in a dedicated Docker container

If you are familiar with Docker, an image is available on Docker Hub for this lab:

```bash
docker run -it --rm -p 8888:8888 -v "${PWD}":/home/jovyan/work jdhp/inf639-lab1:latest
```

If you encounter an error during the notebook's execution indicating that writing a file is not possible, this issue may stem from the user ID within the container lacking the necessary permissions in the project directory. This problem can be resolved by modifying the directory's permissions, for example, using the command:

```bash
chmod 777 . figs models
rm -rf figs/*.gif
rm -rf figs/*.png
rm -rf models/*.pth
```

### Import required packages

In [None]:
import collections
import gymnasium as gym
import itertools
import numpy as np
from numpy.typing import NDArray
import pandas as pd
from pathlib import Path
import random
import time
import torch
from torch.optim.lr_scheduler import _LRScheduler
from typing import List, Tuple, Deque, Optional, Callable
# from inf639 import *

In [None]:
%matplotlib inline

import matplotlib.pyplot as plt

import seaborn as sns
from tqdm.notebook import tqdm

In [None]:
gym.__version__

In [None]:
sns.set_context("talk")

In [None]:
FIGS_DIR = Path("figs/")       # Where to save figures (.gif files)
MODELS_DIR = Path("models/")   # Where to save models (.pth files)

In [None]:
if not FIGS_DIR.exists():
    FIGS_DIR.mkdir()
if not MODELS_DIR.exists():
    MODELS_DIR.mkdir()

### Create a Gymnasium rendering wrapper to visualize environments as GIF images within the notebook

This notebook allows you to visualize the episodes as animated GIFs. Run the cell below to enable this feature.

In [None]:
# To display GIF images in the notebook

import imageio     # To render episodes in GIF images (otherwise there would be no render on Google Colab)
                   # C.f. https://stable-baselines.readthedocs.io/en/master/guide/examples.html#bonus-make-a-gif-of-a-trained-agent
import IPython
from IPython.display import Image

if is_colab():
    import pyvirtualdisplay

    _display = pyvirtualdisplay.Display(visible=False,  # use False with Xvfb
                                        size=(1400, 900))
    _ = _display.start()

class RenderWrapper:
    def __init__(self, env, force_gif=False):
        self.env = env
        self.force_gif = force_gif
        self.reset()

    def reset(self):
        self.images = []

    def render(self):
        if not is_colab():
            self.env.render()
            time.sleep(1./60.)

        if is_colab() or self.force_gif:
            img = self.env.render()         # Assumes env.render_mode == 'rgb_array'
            self.images.append(img)

    def make_gif(self, filename="render"):
        if is_colab() or self.force_gif:
            gif_path = filename.with_suffix('.gif')
            imageio.mimsave(gif_path, [np.array(img) for i, img in enumerate(self.images) if i%2 == 0], fps=29, loop=0)
            return Image(open(gif_path,'rb').read())

    @classmethod
    def register(cls, env, force_gif=False):
        env.render_wrapper = cls(env, force_gif=True)

## Define some parameters

### Number of trainings

To achieve more representative outcomes at the conclusion of each exercise, we average the results across multiple training sessions. The `NUMBER_OF_TRAININGS` variable specifies the number of training sessions conducted before the results are displayed. 

We recommend setting a lower value (such as 1 or 2) during the development and testing phases of your implementations. Once you have completed your work and are confident in its functionality, you can increase the number of training sessions to minimize the variance in results. Be aware that a higher number of training sessions will extend the execution time, so adjust this setting in accordance with your computer's capabilities.

Additionally, you have the option to assign a specific value to the `NUMBER_OF_TRAININGS` variable for each exercise directly within the cells where the training loop is defined (the `NUMBER_OF_TRAININGS` variable is commented out at the beginning of these cells).

In [None]:
DEFAULT_NUMBER_OF_TRAININGS = 3

## PyTorch Refresher and Cheat Sheet

In this lab, we will be implementing our deep reinforcement learning algorithms using PyTorch.
If you need a refresher, you might find this [PyTorch Cheat Sheet](https://pytorch.org/tutorials/beginner/ptcheat.html) helpful. It provides a quick reference for many of the most commonly used PyTorch functions and concepts, and can be a valuable resource as you work through this lab.

You can also refer to the [official documentation](https://pytorch.org/docs/stable/index.html).

### PyTorch device

PyTorch can run on both CPUs and GPUs. The following cell will determine the device PyTorch will use. If a GPU is available, PyTorch will use it; otherwise, it will use the CPU.

For utilizing a GPU on Google Colab, you also have to activate it following the steps outlined [here](https://colab.research.google.com/notebooks/gpu.ipynb).

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")  # Set the device to CUDA if available, otherwise use CPU

Note that the GPU is not very useful for CartPole (but useful for Atari Breakout in *Bonus section 4*) because CartPole is a simple and quick problem to solve, and CUDA spends more time transferring data between the CPU and GPU than processing it directly on the CPU.

You can uncomment the next cell to explicitly instruct PyTorch to train neural networks using the CPU.

In [None]:
# device = "cpu"

In [None]:
print(f"PyTorch will train and test neural networks on {device}")

In [None]:
if torch.cuda.is_available():
    for i in range(torch.cuda.device_count()):
        print(f"Device {i}: {torch.cuda.get_device_name(i)}")

If you have a recent GPU (e.g. RTX 4060 Ti 16G) and want to use it, you may need to install a specific version of PyTorch compatible with your Cuda version (e.g. Cuda 12.4). For this, you will have to edit the `requirements.txt` file and replace the current version of PyTorch with the one compatible with your Cuda version. Check the [official PyTorch website](https://pytorch.org/get-started/locally/) for more information.

## Bonus section 1: Monitoring the training process with Tensorboard and Aim

Monitoring the training process is crucial for understanding the agent's learning progress. In this section, we will introduce two open-source tools that can help you visualize the training process: Tensorboard and Aim.

A third tool, [Weights & Biases (W&B)](https://wandb.ai/site) will be presented in *Bonus section 3*.

Tensorflow and Aim are particularly useful for monitoring the training process on a local machine, as they do not require a server to run.
On the opposite, W&B is a cloud-based solution that allows you to monitor the training process from anywhere.
Other solutions will be covered in the next tutorials.

### Monitoring the training process with Tensorboard

TensorBoard is a visualization tool developed by Google, primarily used to track and visualize metrics during the training of deep learning models. While initially designed for TensorFlow, it can also be integrated with **PyTorch** through the **torch.utils.tensorboard** API. This allows real-time tracking of metrics such as losses, accuracy, computation graphs, and even examining images, weight distributions, histograms, and more.

The following cell demonstrates how to use TensorBoard with PyTorch.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.tensorboard import SummaryWriter

# Simple example: model and data
class SimpleModel(nn.Module):
    def __init__(self):
        super(SimpleModel, self).__init__()
        self.fc = nn.Linear(10, 1)

    def forward(self, x):
        return self.fc(x)

# Creating a model, optimizer, and loss function
model = SimpleModel()
optimizer = optim.SGD(model.parameters(), lr=0.01)
loss_fn = nn.MSELoss()

# Initializing TensorBoard SummaryWriter
writer = SummaryWriter()

# Training example
for epoch in range(100):
    # Fake data
    inputs = torch.randn(32, 10)
    targets = torch.randn(32, 1)
    
    # Forward pass
    outputs = model(inputs)
    loss = loss_fn(outputs, targets)
    
    # Backpropagation
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    # Log the loss values to TensorBoard
    writer.add_scalar('Loss/train', loss.item(), epoch)

# Closing SummaryWriter
writer.close()

**`SummaryWriter`** is the central tool for logging data to TensorBoard. It captures metric values, like loss, for each iteration/epoch.

This example trains a simple model on fake data (32 samples with 10 features) over 100 epochs, and at each epoch, the loss value is sent to TensorBoard via `writer.add_scalar`.

To visualize the data from a local machine, run the following command in the terminal:

```bash
tensorboard --logdir=runs
```

then open a web browser and go at the address mentioned in the terminal (usually http://localhost:6006/).

If you are using Google Colab, you can use the **TensorBoard magic** to visualize the data directly in the notebook.

In [None]:
# Load TensorBoard extension in Colab
%load_ext tensorboard
%tensorboard --logdir ./runs

For mor information, check the Official Documentation:
- [TensorBoard for PyTorch Documentation](https://pytorch.org/docs/stable/tensorboard.html)
- [Official TensorBoard Documentation (General)](https://www.tensorflow.org/tensorboard)

### Monitoring the training process with AIM

Aim is an open-source alternative to TensorBoard for managing and visualizing machine learning experiments.
Designed to be simpler and more flexible, **Aim** allows tracking and analyzing metrics, hyperparameters, model outputs, and more for projects using PyTorch, TensorFlow, or other frameworks.

<img src="https://user-images.githubusercontent.com/13848158/136374529-af267918-5dc6-4a4e-8ed2-f6333a332f96.gif" width="600px"></img>

The following cell demonstrates how to use Aim with PyTorch.

In [None]:
import torch
import aim

# Initialize Aim
run = aim.Run()

# set training hyperparameters
run['hparams'] = {
    'learning_rate': 0.01,
    'batch_size': 32,
}

# Simple model example
model = torch.nn.Linear(1, 1)
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

for epoch in range(10):
    inputs = torch.randn(10, 1)
    targets = torch.randn(10, 1)

    # Forward pass
    outputs = model(inputs)
    loss = criterion(outputs, targets)

    # Backward pass and optimization
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    # Track the 'loss' metric
    run.track(loss.item(), name='loss', epoch=epoch)    # Log metric with Aim

This example trains a simple model on fake data over 10 epochs, and at each epoch, the loss value is sent to Aim via `run.track`.

To visualize the data from a local machine, run the following command in the terminal:

```bash
aim up
```

then open a web browser and go at the address mentioned in the terminal (usually http://localhost:43800/).

Or alternatively, you can use the **Aim magic** to [visualize the data directly in the notebook](https://aimstack.readthedocs.io/en/latest/using/jupyter_notebook_ui.html) (for instance if you are using Google Colab).

In [None]:
%load_ext aim
%aim up

For more information, check the official Documentation:
- [Aim Documentation](https://aimstack.readthedocs.io/en/latest/)
- [GitHub Aim](https://github.com/aimhubio/aim)
- [Quick start](https://github.com/aimhubio/aim#-quick-start)
- [Getting started](https://aimstack.readthedocs.io/en/latest/quick_start/setup.html)