# 3 - Deep Q-Learning with Atari games

In the last unit, we learned our first reinforcement learning algorithm: Q-Learning. We implemented it from scratch using Numpy and trained it in two environments, FrozenLake-v1 and Taxi-v3.

We got excellent results with this simple algorithm, but these environments were relatively simple given **the state space was discrete and small** (14 different states for FrozenLake-v1 and 500 for Taxi-v3). For comparison, the state space in Atari games can contain $10^{9}$ to $10^{11}$ states.

But as we'll see, producing and updating a **Q-table can become ineffective in large state space environments**. Instead of using a Q-table, **deep Q-Learning uses a neural network that takes a state and approximates Q-values for each action based on that state**.

We'll train our agent to play space invaders and other Atari environments using RL-Zoo, a training framework for RL using Stable-Baselines that provides scripts for training, evaluating agents, tuning hyperparameters, plotting results, and recording videos.

<img src="images/atari-envs.gif" title="" alt="" width="600" data-align="center">

## 3.1 - From Q-Learning to Deep Q-Learning

We learned that Q-Learning is an algorithm we use to train our Q-Function, an action-value function that determines the value of being at a particular state and taking a specific action at that state. The Q comes from “the Quality” of that action at that state.

Internally, our Q-function has a Q-table, a table where each cell corresponds to a state-action pair value. Think of this Q-table as the memory or cheat sheet of our Q-function.

The problem is that Q-Learning is a *tabular method*. This raises a problem in which the states and actions spaces are **small enough to approximate value functions to be represented as arrays and tables**. Also, this is **not scalable.** Q-Learning worked well with small state space environments of 16 and 500 states. But in order to train an agent to play Space invadores or other Atari games, we are going to use the frames as input.

A single frame in Atari is composed of an image of 210x160 pixels. Given the images are in color (RGB), there are 3 channels. As a result, Atari environments have an observation space with a shape of (210, 160, 3), where each pixel contains a value ranging from 0 to 255. That gives us a gigantic state space: $256^{210 \times 160 \times 3} = 256^{100800}$.

Therefore, creating and updating a Q-table for this environment would not be efficient. In this case, the best idea is to approximate the Q-values using a parametrized Q-function $Q_{\theta}(s,a)$.

This neural network will approximate, given a state, the different Q-values for each possible action at that state. And that's exactly what Deep Q-Learning does.

<img src="images/deep.jpg" title="" alt="" width="600" data-align="center">

## 3.2 - The Deep Q-Network (DQN)

As input, we take a **stack of 4 frames** passed through the ntwork as a state and output a vector of Q-values for each possible action at that state. Then, like with Q-Learning, we just need to use our epsilon-greedy policy to select which action to take.

Identically to the Q-table, **when the neural neural network is initialized, the Q-value estimation is terrible**. But during training, our Deep Q-Network agent will **associate a situation with appropriate action** and learn to play the game well.

### 3.2.1 - Preprocessing the input and temporal limitation

We need to preprocess the input. It's an essential step since we want to reduce the complexity of our state to reduce the computation time needed for training.

To achieve