In [6]:
from utils import *

### Isak Andersson - AI23 - Deep Learning

# <p style="text-align:center;">Laboration</p>

---

## Intro

This notebook aims to show the progress of understanding and learning how to successfully implement a Reinforcement Learning Network across multiple platforms.

The platforms used are TensorFLow-Keras, PyTorch and Jax, where Keras will be used as a baseline when comparing the others. The reason why these are chosen in this way stems from that the Keras code for this was mostly given before this study began, at the same time as Keras is a dying breed, suggesting that as a future ML engineers, PyTorch and Jax should be the focus.

As per the assignments request the platforms RL models will be tested at playing the game Space Invaders. They will not nessecarily be using the same settings or size, since this notebook is *not* about comparing what platform is best at playing video games, but rather how different syntax and workflow is used when writing the different platforms model descriptions.

But don't worry, there will be graphs of their different scores!


## The Network

All of the platforms are using the same kind of network initially, and where they've been adjusted slighlty during writing. But the base model looks like this (keras_trainer.ipynb):

    Conv2D(filters=32, kernel_size=8, strides=4, activation="relu"),
    Conv2D(filters=64, kernel_size=4, strides=2, activation="relu"),
    Conv2D(filters=64, kernel_size=3, activation="relu"),
    Flatten(),
    Dense(units=512, activation="relu"),
    Dense(units=256, activation="relu"),
    Dense(num_actions(=6), activation="linear")

This is slightly modified from the code given before the assignment, where an extra convolutional layer has been added and the Dense(keras)/Linear(torch) layers are doubled and larger in size.

The **PyTorch** defined network is very similar but with some key differences (torch_trainer.py):

    nn.Conv2d(in_channels=self.num_frames(=4), out_channels=32, kernel_size=8, stride=4), # Output: (32, 20, 20)
    nn.ReLU(),
    nn.Conv2d(32, 64, kernel_size=4, stride=2), # Output: (64, 9, 9)
    nn.ReLU(),
    nn.Conv2d(64, 64, kernel_size=3, stride=1), # Output: (64, 7, 7)
    nn.ReLU(),
    nn.Flatten(), # Flatten to shape: (batch_size, 3136)
    nn.Linear(64 * 7 * 7, 512), # Input size: 3136
    nn.ReLU(),
    nn.Linear(512, 256),
    nn.ReLU(),
    nn.Linear(256, self.num_actions(=6))

Here, the activation functions are obviously separated which might be the most apperant difference. But take note of that the coder needs to do the input/output math for every layer manually. Is this perhaps an oversight? Or is there more flexible options because of this? It seems like this would have been somewhat easy to implement if it's the same math needed every time. In any case, the math when calculating the output/input of a convolution layer is calculated like so:

<center>

$$
Output Size = \frac{Input Size - Kernel Size + 2(Padding)}{Stride}+1
$$

</center>

where padding refers to borderpixels.

**JAX**

## Keras - The Baseline Dinosour



In [7]:
plot_model_stats([
    "./logs/keras/modelstats_08-12.csv",
    "./logs/keras/modelstats_09-12.csv",
    "./logs/keras/modelstats_10-12.csv",
    "./logs/keras/modelstats_11-12.csv",
    "./logs/keras/modelstats_12-12.csv"
    ])

### First try at torch

In [8]:
plot_model_stats([
    "./logs/torch/modelstats_16-12.csv",
    "./logs/torch/modelstats_17-12-1.csv",
])