<a id='table-of-contents'></a>
# Table of Contents

1. [Introduction](#introduction)  
2. [Understanding Reinforcement Learning](#understanding-reinforcement-learning)  
   2.1 [Reinforcement Learning Basics](#reinforcement-learning-basics)  
   2.2 [Deep Q-Network (DQN)](#deep-q-network-dqn)  
3. [Setting up the Environment](#setting-up-the-environment)  
   3.1 [Installing Required Libraries](#installing-required-libraries)  
   3.2 [Importing Dependencies](#importing-dependencies)  
   3.3 [Creating the OpenAI Gym Environment](#creating-the-openai-gym-environment)  
4. [Training the Models](#training-the-models)  
   4.1 [Model 1: Cart Pole](#cart-pole)  
      4.1.1 Preprocessing the State  
      4.1.2 Building the DQN Model  
      4.1.3 Training the Model  
      4.1.4 Evaluating the Model  
   4.2 [Model 2: Space Invaders](#space-invaders)  
      4.2.1 Preprocessing the State  
      4.2.2 Building the DQN Model  
      4.2.3 Training the Model  
      4.2.4 Evaluating the Model  
   4.3 [Model 3: Pacman](#pacman)  
      4.3.1 Preprocessing the State  
      4.3.2 Building the DQN Model  
      4.3.3 Training the Model   
      4.3.4 Evaluating the Model  
5. [Results and Analysis](#results-and-analysis)  
6. [Conclusion](#conclusion)  
7. [References](#references)  


<a id='introduction'></a>
## [Introduction](#introduction)

The goal of this project is to build an AI system capable of playing Atari video games using reinforcement learning and the Deep Q-Network (DQN) algorithm. Atari games present a challenging environment for AI agents due to their diverse screens and complex dynamics, making traditional approaches like Q-tables impractical. By leveraging the power of neural networks and GPUs, we aim to develop models that outperform human performance on three different Atari games: Cart Pole, Space Invaders, and Pacman.

Reinforcement learning is a branch of machine learning that focuses on training agents to make sequential decisions in an environment to maximize their cumulative rewards. The DQN algorithm, pioneered by DeepMind, extends the traditional Q-learning algorithm by representing the action-value function as a neural network instead of a lookup table. This allows the agent to handle high-dimensional state spaces and learn more complex strategies.

To tackle this project, we will be using OpenAI Gym, a popular framework for developing and evaluating reinforcement learning algorithms. OpenAI Gym provides a collection of pre-built environments, including the Atari games, which allows us to focus on developing our AI agent without the need for extensive game-specific engineering.

We will start by implementing a DQN model to play the Cart Pole game. This game serves as an introductory environment to reinforce our understanding of the DQN algorithm and its implementation. Once we have a working model for Cart Pole, we will proceed to more challenging games like Space Invaders and Pacman.

Training these models will require significant computational power, and GPUs will play a vital role in accelerating the training process. GPUs excel in parallel computation, making them indispensable for training large neural networks like the DQN model on the diverse screens and complex dynamics of Atari games.

By the end of this project, we aim to have three models that surpass human performance in playing Cart Pole, Space Invaders, and Pacman. Through this endeavor, we hope to gain insights into the capabilities of reinforcement learning algorithms and the impact of neural networks and GPUs in solving complex gaming problems.

### Objectives

The objectives of this project are as follows:

1. Build a model that can play the Cart Pole game using the DQN algorithm.
2. Develop a model that achieves human-level performance in playing the Space Invaders game.
3. Create a model that demonstrates advanced gameplay and high scores in the Pacman game.
4. Utilize reinforcement learning techniques, specifically the DQN algorithm, to train the models.
5. Leverage the power of neural networks to handle high-dimensional state spaces and complex game dynamics.
6. Utilize GPUs to accelerate the training process and handle the computational requirements of training large neural networks.
7. Use the OpenAI Gym framework for creating and evaluating the Atari game environments.
8. Document the approach, implementation, and results in a comprehensive blog post.

By achieving these objectives, we aim to demonstrate the effectiveness of reinforcement learning and the impact of neural networks and GPUs in solving challenging gaming problems.


<a id='understanding-reinforcement-learning'></a>
## [Understanding Reinforcement Learning](#table-of-contents)  

Reinforcement learning (RL) is a machine learning approach that focuses on training agents to make sequential decisions in an environment to maximize their cumulative rewards. Unlike supervised learning, RL agents learn through trial and error, exploring the environment and updating their strategies based on the received feedback.

In RL, an agent interacts with an environment and observes its current state. The agent then takes actions based on its current policy, and the environment transitions to a new state. Each action taken by the agent results in a reward, which can be positive or negative, indicating the desirability of the action. The goal of the agent is to learn a policy that maximizes the expected cumulative reward over time.

The Deep Q-Network (DQN) algorithm is a prominent RL technique that combines Q-learning, a value-based RL method, with deep neural networks. Q-learning approximates the action-value function, known as the Q-function, which estimates the expected cumulative reward for each action in a given state. By representing the Q-function as a neural network, the DQN algorithm can handle high-dimensional state spaces and learn complex strategies.

Through the application of RL and the utilization of DQN, we aim to develop AI models that can successfully play Atari video games, surpassing human-level performance and demonstrating the power of reinforcement learning in tackling complex decision-making tasks.

<a id='reinforcement-learning-basics'></a>
### [Reinforcement Learning Basics](#table-of-contents)  

Reinforcement learning (RL) is a subfield of machine learning focused on training agents to make decisions in an interactive environment to maximize their cumulative rewards. It is inspired by the concept of how humans and animals learn through trial and error.

At the core of RL lies the interaction between an agent and an environment. The agent perceives the current state of the environment, takes actions, and receives feedback in the form of rewards or penalties. The objective of the agent is to learn an optimal policy, a mapping from states to actions, that maximizes its long-term rewards.

The agent's learning process involves a balance between exploration and exploitation. Initially, the agent explores the environment by taking random actions to gather information about the rewards associated with different state-action pairs. As it gathers experience, the agent updates its policy to favor actions that have led to higher rewards in the past. This process of updating the policy based on observed rewards is known as the "learning" or "training" phase.

A key concept in RL is the notion of the value function. The value of a state or state-action pair represents the expected cumulative reward the agent can achieve from that state onwards, taking into account the policy it follows. By estimating the value function, the agent can make informed decisions to maximize its rewards.

RL algorithms employ different strategies to estimate the value function and update the agent's policy. One widely used algorithm is Q-learning, which learns an action-value function called the Q-function. The Q-function approximates the expected cumulative reward of taking a particular action in a given state. By iteratively updating the Q-function based on observed rewards, the agent can gradually improve its decision-making abilities.

Reinforcement learning has applications in various domains, including robotics, game playing, and optimization problems. By learning from interaction with the environment, RL enables agents to adapt and make optimal decisions in complex and uncertain scenarios.

<a id='deep-q-network-dqn'></a>
### [Deep Q-Network (DQN)](#table-of-contents)  

The Deep Q-Network (DQN) algorithm is a breakthrough in reinforcement learning that combines the Q-learning algorithm with deep neural networks, enabling the training of agents to play Atari video games at a level surpassing human performance.

In traditional Q-learning, a table called the Q-table is used to store and update the action values for each state-action pair. However, in complex environments like Atari games with large state spaces, maintaining a Q-table becomes infeasible. The DQN algorithm addresses this challenge by employing a deep neural network as an approximation of the action-value function, known as the Q-function.

The neural network takes the game screen (state) as input and outputs the action values for all possible actions. By training the network to approximate the optimal action-value function, the DQN algorithm allows the agent to learn directly from raw visual input, bypassing the need for handcrafted features.

The DQN algorithm utilizes an experience replay buffer to store and randomly sample experiences (state, action, reward, next state, done) encountered during gameplay. This approach breaks the sequential correlation of experiences and helps stabilize the training process by reducing the bias introduced by consecutive samples.

To further improve the stability and efficiency of learning, the DQN algorithm employs a separate target network. The target network is a copy of the main network that is periodically updated with the weights of the main network. This decouples the target and estimation of Q-values, making the learning process more robust and preventing harmful feedback loops.

The key idea behind DQN's success in solving Atari games is its ability to approximate the Q-function using deep neural networks. By combining deep learning techniques with reinforcement learning, DQN agents are capable of learning complex strategies, recognizing patterns in game screens, and achieving superhuman performance in various Atari games.

Overall, the DQN algorithm has revolutionized the field of reinforcement learning by demonstrating the power of deep neural networks in solving challenging, high-dimensional, and visually-rich environments like Atari games. Its success has paved the way for advancements in deep reinforcement learning and has inspired further research in the intersection of AI and gaming.

<a id='installing-the-required-libraries'></a>
### [Installing Required Libraries](#table-of-contents)

Instructions on installing the necessary libraries and dependencies.


In [None]:
!pip install --quiet gym
!pip install --quiet tensorflow

<a id='importing-dependencies'></a>
### [Importing Dependencies](#table-of-contents)

Code for importing the required libraries.

In [1]:
import numpy as np
import gym
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.optimizers import Adam

<a id='training-the-models'></a>
## [Training the Models](#table-of-contents)

<a id='cart-pole'></a>
### [Model 1: Cart Pole](#table-of-contents)


#### Creating DQN Class

The DQN class represents a Deep Q-Network (DQN) agent for reinforcement learning. It uses a neural network model to approximate the Q-values of different actions in a given state. The DQN agent can remember past experiences, select actions based on an exploration-exploitation strategy, and update its model through experience replay.

In [2]:
class DQN:
    def __init__(self, state_space, action_space):
        self.state_space = state_space
        self.action_space = action_space
        self.memory = []
        self.gamma = 0.95  # discount factor
        self.epsilon = 1.0  # exploration rate
        self.epsilon_min = 0.01
        self.epsilon_decay = 0.995
        self.model = self.build_model()

    def build_model(self):
        model = Sequential()
        model.add(Flatten(input_shape=self.state_space))
        model.add(Dense(24, activation='relu'))
        model.add(Dense(24, activation='relu'))
        model.add(Dense(self.action_space, activation='linear'))
        model.compile(loss='mse', optimizer=Adam(learning_rate=0.001))
        return model

    def remember(self, state, action, reward, next_state, done):
        self.memory.append((state, action, reward, next_state, done))

    def act(self, state):
        if np.random.rand() <= self.epsilon:
            return np.random.randint(self.action_space)
        q_values = self.model.predict(state)
        return np.argmax(q_values[0])
    def experience_replay(self, batch_size):
        """Store and sample experiences from the environment."""
        # Store the most recent experiences in a replay buffer
        for state, action, reward, next_state, done in self.memory:
            self.replay_buffer.append((state, action, reward, next_state, done))

        # Sample a random batch of experiences from the replay buffer
        batch = self.replay_buffer.sample(batch_size)

        # Update the Q-network using the sampled experiences
        self.update_q_network(batch)

    def update_exploration_rate(self):
        self.epsilon = self.epsilon * self.epsilon_decay
        if self.epsilon < self.epsilon_min:
            self.epsilon = self.epsilon_min

##### Initialization
- state_space (int): The dimensionality of the state space.
- action_space (int): The number of possible actions in the environment.
- memory (list): A list to store past experiences.
- gamma (float): The discount factor for future rewards.
- epsilon (float): The exploration rate, determining the probability of selecting a random action.
- epsilon_min (float): The minimum value of the exploration rate.
- epsilon_decay (float): The decay rate for the exploration rate.
- model (Sequential): The neural network model for estimating Q-values.

##### Building the Model

The build_model method constructs the neural network model used by the DQN agent. It consists of several fully connected layers with ReLU activation functions and a linear activation output layer. The model is compiled with mean squared error (MSE) loss and an Adam optimizer with a learning rate of 0.001.

##### Remembering Experiences

The remember method stores a tuple representing an experience in the agent's memory. Each experience consists of the current state, the action taken, the reward received, the next state, and a flag indicating whether the episode is done.

##### Experience Replay

The act method selects an action based on an exploration-exploitation strategy. If a random number is less than or equal to the exploration rate (epsilon), a random action is chosen. Otherwise, the agent uses its model to predict the Q-values for the given state and selects the action with the highest Q-value.

##### Updating Exploration Rate

The update_exploration_rate method updates the exploration rate (epsilon) by multiplying it with the decay rate (epsilon_decay). If the updated exploration rate becomes smaller than the minimum exploration rate (epsilon_min), it is clamped to the minimum value. This allows the agent to gradually reduce exploration over time.

#### Creating the Environment

In [3]:
env = gym.make('CartPole-v0')

state_space = env.observation_space.shape
action_space = env.action_space.n

#### Creating the DQN Agent

In [4]:
agent = DQN(state_space, action_space)

#### Training the Model

In [5]:
# Training loop
batch_size = 32
episodes = 10

for episode in range(episodes):
    state = env.reset()
    done = False
    total_reward = 0

    while not done:
        env.render()
        # Preprocess state (if necessary) and reshape it
        state = np.expand_dims(state, axis=0)

        # Choose action
        action = agent.act(state)

        # Take the action in the environment
        step_result = env.step(action)
        next_state = step_result[0]
        reward = step_result[1]
        done = step_result[2]


        # Preprocess next_state (if necessary) and reshape it
        next_state = np.expand_dims(next_state, axis=0)

        # Store the experience in memory
        agent.remember(state, action, reward, next_state, done)

        # Update the state
        state = next_state
        total_reward += reward

    

    # Print episode statistics
    print(f"Episode: {episode+1}, Total Reward: {total_reward}")

Episode: 1, Total Reward: 25.0
Episode: 2, Total Reward: 21.0
Episode: 3, Total Reward: 11.0
Episode: 4, Total Reward: 24.0
Episode: 5, Total Reward: 33.0
Episode: 6, Total Reward: 24.0
Episode: 7, Total Reward: 41.0
Episode: 8, Total Reward: 19.0
Episode: 9, Total Reward: 10.0
Episode: 10, Total Reward: 24.0


#### Saving the Trained Model

In [6]:
agent.model.save('cartpole_dqn_model.h5')

#### Evaluating the Trained Model

In [7]:
# Evaluate the trained model
env = gym.make('CartPole-v0')

# Load the trained model
model = tf.keras.models.load_model('cartpole_dqn_model.h5')

# Evaluate the model for 10 episodes
total_reward = 0
for episode in range(10):
    state = env.reset()
    done = False
    while not done:
        # Reshape the state
        state = np.expand_dims(state, axis=0)

        # Get the action from the model
        action = np.argmax(model.predict(state))

        # Take the action in the environment
        next_state, reward, done, _ = env.step(action)

        state = next_state
        total_reward += reward

print(f"Average reward: {total_reward / 10}")

Average reward: 9.4


<a id='space-invaders'></a>
### [Model 2: Space Invaders](#table-of-contents)


#### Creating DQN Class

The DQN class represents a Deep Q-Network (DQN) agent for reinforcement learning. It uses a neural network model to approximate the Q-values of different actions in a given state. The DQN agent can remember past experiences, select actions based on an exploration-exploitation strategy, and update its model through experience replay.

In [8]:
class DQN:
    def __init__(self, state_space, action_space):
        self.state_space = state_space
        self.action_space = action_space
        self.memory = []
        self.gamma = 0.95  # discount factor
        self.epsilon = 1.0  # exploration rate
        self.epsilon_min = 0.01
        self.epsilon_decay = 0.995
        self.model = self.build_model()

    def build_model(self):
        model = Sequential()
        model.add(Flatten(input_shape=self.state_space))
        model.add(Dense(24, activation='relu'))
        model.add(Dense(24, activation='relu'))
        model.add(Dense(self.action_space, activation='linear'))
        model.compile(loss='mse', optimizer=Adam(learning_rate=0.001))
        return model

    def remember(self, state, action, reward, next_state, done):
        self.memory.append((state, action, reward, next_state, done))

    def act(self, state):
        if np.random.rand() <= self.epsilon:
            return np.random.randint(self.action_space)
        q_values = self.model.predict(state)
        return np.argmax(q_values[0])
    def experience_replay(self, batch_size):
        """Store and sample experiences from the environment."""
        # Store the most recent experiences in a replay buffer
        for state, action, reward, next_state, done in self.memory:
            self.replay_buffer.append((state, action, reward, next_state, done))

        # Sample a random batch of experiences from the replay buffer
        batch = self.replay_buffer.sample(batch_size)

        # Update the Q-network using the sampled experiences
        self.update_q_network(batch)

    def update_exploration_rate(self):
        self.epsilon = self.epsilon * self.epsilon_decay
        if self.epsilon < self.epsilon_min:
            self.epsilon = self.epsilon_min

##### Initialization
- state_space (int): The dimensionality of the state space.
- action_space (int): The number of possible actions in the environment.
- memory (list): A list to store past experiences.
- gamma (float): The discount factor for future rewards.
- epsilon (float): The exploration rate, determining the probability of selecting a random action.
- epsilon_min (float): The minimum value of the exploration rate.
- epsilon_decay (float): The decay rate for the exploration rate.
- model (Sequential): The neural network model for estimating Q-values.

##### Building the Model

The build_model method constructs the neural network model used by the DQN agent. It consists of several fully connected layers with ReLU activation functions and a linear activation output layer. The model is compiled with mean squared error (MSE) loss and an Adam optimizer with a learning rate of 0.001.

##### Remembering Experiences

The remember method stores a tuple representing an experience in the agent's memory. Each experience consists of the current state, the action taken, the reward received, the next state, and a flag indicating whether the episode is done.

##### Experience Replay

The act method selects an action based on an exploration-exploitation strategy. If a random number is less than or equal to the exploration rate (epsilon), a random action is chosen. Otherwise, the agent uses its model to predict the Q-values for the given state and selects the action with the highest Q-value.

##### Updating Exploration Rate

The update_exploration_rate method updates the exploration rate (epsilon) by multiplying it with the decay rate (epsilon_decay). If the updated exploration rate becomes smaller than the minimum exploration rate (epsilon_min), it is clamped to the minimum value. This allows the agent to gradually reduce exploration over time.

##### Creating the Environment

In [9]:
env = gym.make('SpaceInvaders-v0')

state_space = env.observation_space.shape
action_space = env.action_space.n

##### Creating the DQN agent

In [10]:
agent = DQN(state_space, action_space)


#### Training the Model

Code snippet for training the DQN model on the Space Invaders game.


In [11]:
# Training loop
batch_size = 32
episodes = 10

for episode in range(episodes):
    state = env.reset()
    done = False
    total_reward = 0

    while not done:
        env.render()
        # Preprocess state (if necessary) and reshape it
        state = np.expand_dims(state, axis=0)

        # Choose action
        action = agent.act(state)

        # Take the action in the environment
        step_result = env.step(action)
        next_state = step_result[0]
        reward = step_result[1]
        done = step_result[2]


        # Preprocess next_state (if necessary) and reshape it
        next_state = np.expand_dims(next_state, axis=0)

        # Store the experience in memory
        agent.remember(state, action, reward, next_state, done)

        # Update the state
        state = next_state
        total_reward += reward
        print(f"Episode: {episode+1}, Total Reward: {total_reward}")

Episode: 1, Total Reward: 0.0
Episode: 1, Total Reward: 0.0
Episode: 1, Total Reward: 0.0
Episode: 1, Total Reward: 0.0
Episode: 1, Total Reward: 0.0
Episode: 1, Total Reward: 0.0
Episode: 1, Total Reward: 0.0
Episode: 1, Total Reward: 0.0
Episode: 1, Total Reward: 0.0
Episode: 1, Total Reward: 0.0
Episode: 1, Total Reward: 0.0
Episode: 1, Total Reward: 0.0
Episode: 1, Total Reward: 0.0
Episode: 1, Total Reward: 0.0
Episode: 1, Total Reward: 0.0
Episode: 1, Total Reward: 0.0
Episode: 1, Total Reward: 0.0
Episode: 1, Total Reward: 0.0
Episode: 1, Total Reward: 0.0
Episode: 1, Total Reward: 0.0
Episode: 1, Total Reward: 0.0
Episode: 1, Total Reward: 0.0
Episode: 1, Total Reward: 0.0
Episode: 1, Total Reward: 0.0
Episode: 1, Total Reward: 0.0
Episode: 1, Total Reward: 0.0
Episode: 1, Total Reward: 0.0
Episode: 1, Total Reward: 0.0
Episode: 1, Total Reward: 0.0
Episode: 1, Total Reward: 0.0
Episode: 1, Total Reward: 0.0
Episode: 1, Total Reward: 0.0
Episode: 1, Total Reward: 0.0
Episode: 1

Episode: 1, Total Reward: 90.0
Episode: 1, Total Reward: 90.0
Episode: 1, Total Reward: 90.0
Episode: 1, Total Reward: 90.0
Episode: 1, Total Reward: 90.0
Episode: 1, Total Reward: 90.0
Episode: 1, Total Reward: 90.0
Episode: 1, Total Reward: 90.0
Episode: 1, Total Reward: 90.0
Episode: 1, Total Reward: 90.0
Episode: 1, Total Reward: 90.0
Episode: 1, Total Reward: 90.0
Episode: 1, Total Reward: 90.0
Episode: 1, Total Reward: 90.0
Episode: 1, Total Reward: 90.0
Episode: 1, Total Reward: 90.0
Episode: 1, Total Reward: 90.0
Episode: 1, Total Reward: 90.0
Episode: 1, Total Reward: 90.0
Episode: 1, Total Reward: 90.0
Episode: 1, Total Reward: 90.0
Episode: 1, Total Reward: 90.0
Episode: 1, Total Reward: 90.0
Episode: 1, Total Reward: 90.0
Episode: 1, Total Reward: 90.0
Episode: 1, Total Reward: 90.0
Episode: 1, Total Reward: 90.0
Episode: 1, Total Reward: 90.0
Episode: 1, Total Reward: 90.0
Episode: 1, Total Reward: 90.0
Episode: 1, Total Reward: 90.0
Episode: 1, Total Reward: 90.0
Episode:

Episode: 1, Total Reward: 120.0
Episode: 1, Total Reward: 120.0
Episode: 1, Total Reward: 120.0
Episode: 1, Total Reward: 120.0
Episode: 1, Total Reward: 120.0
Episode: 1, Total Reward: 120.0
Episode: 1, Total Reward: 120.0
Episode: 1, Total Reward: 120.0
Episode: 1, Total Reward: 120.0
Episode: 1, Total Reward: 120.0
Episode: 1, Total Reward: 120.0
Episode: 1, Total Reward: 120.0
Episode: 1, Total Reward: 120.0
Episode: 1, Total Reward: 120.0
Episode: 1, Total Reward: 120.0
Episode: 1, Total Reward: 120.0
Episode: 1, Total Reward: 120.0
Episode: 1, Total Reward: 120.0
Episode: 1, Total Reward: 120.0
Episode: 1, Total Reward: 120.0
Episode: 1, Total Reward: 120.0
Episode: 1, Total Reward: 120.0
Episode: 1, Total Reward: 120.0
Episode: 1, Total Reward: 120.0
Episode: 1, Total Reward: 120.0
Episode: 1, Total Reward: 120.0
Episode: 1, Total Reward: 120.0
Episode: 1, Total Reward: 120.0
Episode: 1, Total Reward: 120.0
Episode: 1, Total Reward: 120.0
Episode: 1, Total Reward: 120.0
Episode:

Episode: 2, Total Reward: 0.0
Episode: 2, Total Reward: 0.0
Episode: 2, Total Reward: 0.0
Episode: 2, Total Reward: 0.0
Episode: 2, Total Reward: 0.0
Episode: 2, Total Reward: 0.0
Episode: 2, Total Reward: 0.0
Episode: 2, Total Reward: 0.0
Episode: 2, Total Reward: 0.0
Episode: 2, Total Reward: 0.0
Episode: 2, Total Reward: 0.0
Episode: 2, Total Reward: 0.0
Episode: 2, Total Reward: 0.0
Episode: 2, Total Reward: 0.0
Episode: 2, Total Reward: 0.0
Episode: 2, Total Reward: 0.0
Episode: 2, Total Reward: 0.0
Episode: 2, Total Reward: 0.0
Episode: 2, Total Reward: 0.0
Episode: 2, Total Reward: 0.0
Episode: 2, Total Reward: 0.0
Episode: 2, Total Reward: 0.0
Episode: 2, Total Reward: 0.0
Episode: 2, Total Reward: 0.0
Episode: 2, Total Reward: 0.0
Episode: 2, Total Reward: 0.0
Episode: 2, Total Reward: 0.0
Episode: 2, Total Reward: 0.0
Episode: 2, Total Reward: 0.0
Episode: 2, Total Reward: 0.0
Episode: 2, Total Reward: 0.0
Episode: 2, Total Reward: 0.0
Episode: 2, Total Reward: 5.0
Episode: 2

Episode: 2, Total Reward: 50.0
Episode: 2, Total Reward: 50.0
Episode: 2, Total Reward: 50.0
Episode: 2, Total Reward: 50.0
Episode: 2, Total Reward: 50.0
Episode: 2, Total Reward: 50.0
Episode: 2, Total Reward: 50.0
Episode: 2, Total Reward: 50.0
Episode: 2, Total Reward: 50.0
Episode: 2, Total Reward: 50.0
Episode: 2, Total Reward: 50.0
Episode: 2, Total Reward: 50.0
Episode: 2, Total Reward: 50.0
Episode: 2, Total Reward: 50.0
Episode: 2, Total Reward: 50.0
Episode: 2, Total Reward: 50.0
Episode: 2, Total Reward: 50.0
Episode: 2, Total Reward: 50.0
Episode: 2, Total Reward: 50.0
Episode: 2, Total Reward: 50.0
Episode: 2, Total Reward: 50.0
Episode: 2, Total Reward: 50.0
Episode: 2, Total Reward: 50.0
Episode: 2, Total Reward: 50.0
Episode: 2, Total Reward: 50.0
Episode: 2, Total Reward: 50.0
Episode: 2, Total Reward: 50.0
Episode: 2, Total Reward: 50.0
Episode: 2, Total Reward: 50.0
Episode: 2, Total Reward: 50.0
Episode: 2, Total Reward: 50.0
Episode: 2, Total Reward: 50.0
Episode:

Episode: 2, Total Reward: 105.0
Episode: 2, Total Reward: 105.0
Episode: 2, Total Reward: 105.0
Episode: 2, Total Reward: 105.0
Episode: 2, Total Reward: 105.0
Episode: 2, Total Reward: 105.0
Episode: 2, Total Reward: 105.0
Episode: 2, Total Reward: 105.0
Episode: 2, Total Reward: 105.0
Episode: 2, Total Reward: 105.0
Episode: 2, Total Reward: 105.0
Episode: 2, Total Reward: 105.0
Episode: 2, Total Reward: 105.0
Episode: 2, Total Reward: 105.0
Episode: 2, Total Reward: 105.0
Episode: 2, Total Reward: 105.0
Episode: 2, Total Reward: 105.0
Episode: 2, Total Reward: 105.0
Episode: 2, Total Reward: 105.0
Episode: 2, Total Reward: 105.0
Episode: 2, Total Reward: 105.0
Episode: 2, Total Reward: 105.0
Episode: 2, Total Reward: 105.0
Episode: 2, Total Reward: 105.0
Episode: 2, Total Reward: 105.0
Episode: 2, Total Reward: 105.0
Episode: 2, Total Reward: 105.0
Episode: 2, Total Reward: 105.0
Episode: 2, Total Reward: 105.0
Episode: 2, Total Reward: 105.0
Episode: 2, Total Reward: 105.0
Episode:

Episode: 3, Total Reward: 0.0
Episode: 3, Total Reward: 0.0
Episode: 3, Total Reward: 0.0
Episode: 3, Total Reward: 0.0
Episode: 3, Total Reward: 0.0
Episode: 3, Total Reward: 0.0
Episode: 3, Total Reward: 0.0
Episode: 3, Total Reward: 0.0
Episode: 3, Total Reward: 0.0
Episode: 3, Total Reward: 0.0
Episode: 3, Total Reward: 0.0
Episode: 3, Total Reward: 0.0
Episode: 3, Total Reward: 0.0
Episode: 3, Total Reward: 0.0
Episode: 3, Total Reward: 0.0
Episode: 3, Total Reward: 0.0
Episode: 3, Total Reward: 0.0
Episode: 3, Total Reward: 0.0
Episode: 3, Total Reward: 0.0
Episode: 3, Total Reward: 0.0
Episode: 3, Total Reward: 0.0
Episode: 3, Total Reward: 0.0
Episode: 3, Total Reward: 0.0
Episode: 3, Total Reward: 0.0
Episode: 3, Total Reward: 0.0
Episode: 3, Total Reward: 0.0
Episode: 3, Total Reward: 0.0
Episode: 3, Total Reward: 0.0
Episode: 3, Total Reward: 0.0
Episode: 3, Total Reward: 0.0
Episode: 3, Total Reward: 0.0
Episode: 3, Total Reward: 0.0
Episode: 3, Total Reward: 0.0
Episode: 3

Episode: 3, Total Reward: 30.0
Episode: 3, Total Reward: 30.0
Episode: 3, Total Reward: 30.0
Episode: 3, Total Reward: 30.0
Episode: 3, Total Reward: 30.0
Episode: 3, Total Reward: 30.0
Episode: 3, Total Reward: 30.0
Episode: 3, Total Reward: 30.0
Episode: 3, Total Reward: 30.0
Episode: 3, Total Reward: 30.0
Episode: 3, Total Reward: 30.0
Episode: 3, Total Reward: 30.0
Episode: 3, Total Reward: 30.0
Episode: 3, Total Reward: 30.0
Episode: 3, Total Reward: 30.0
Episode: 3, Total Reward: 30.0
Episode: 3, Total Reward: 30.0
Episode: 3, Total Reward: 30.0
Episode: 3, Total Reward: 30.0
Episode: 3, Total Reward: 30.0
Episode: 3, Total Reward: 30.0
Episode: 3, Total Reward: 30.0
Episode: 3, Total Reward: 30.0
Episode: 3, Total Reward: 30.0
Episode: 3, Total Reward: 30.0
Episode: 3, Total Reward: 30.0
Episode: 3, Total Reward: 30.0
Episode: 3, Total Reward: 30.0
Episode: 3, Total Reward: 30.0
Episode: 3, Total Reward: 30.0
Episode: 3, Total Reward: 30.0
Episode: 3, Total Reward: 30.0
Episode:

Episode: 3, Total Reward: 135.0
Episode: 3, Total Reward: 135.0
Episode: 3, Total Reward: 135.0
Episode: 3, Total Reward: 135.0
Episode: 3, Total Reward: 135.0
Episode: 3, Total Reward: 135.0
Episode: 3, Total Reward: 135.0
Episode: 4, Total Reward: 0.0
Episode: 4, Total Reward: 0.0
Episode: 4, Total Reward: 0.0
Episode: 4, Total Reward: 0.0
Episode: 4, Total Reward: 0.0
Episode: 4, Total Reward: 0.0
Episode: 4, Total Reward: 0.0
Episode: 4, Total Reward: 0.0
Episode: 4, Total Reward: 0.0
Episode: 4, Total Reward: 0.0
Episode: 4, Total Reward: 0.0
Episode: 4, Total Reward: 0.0
Episode: 4, Total Reward: 0.0
Episode: 4, Total Reward: 0.0
Episode: 4, Total Reward: 0.0
Episode: 4, Total Reward: 0.0
Episode: 4, Total Reward: 0.0
Episode: 4, Total Reward: 0.0
Episode: 4, Total Reward: 0.0
Episode: 4, Total Reward: 0.0
Episode: 4, Total Reward: 0.0
Episode: 4, Total Reward: 0.0
Episode: 4, Total Reward: 0.0
Episode: 4, Total Reward: 0.0
Episode: 4, Total Reward: 0.0
Episode: 4, Total Reward: 

Episode: 4, Total Reward: 35.0
Episode: 4, Total Reward: 35.0
Episode: 4, Total Reward: 35.0
Episode: 4, Total Reward: 35.0
Episode: 4, Total Reward: 35.0
Episode: 4, Total Reward: 35.0
Episode: 4, Total Reward: 35.0
Episode: 4, Total Reward: 35.0
Episode: 4, Total Reward: 35.0
Episode: 4, Total Reward: 35.0
Episode: 4, Total Reward: 35.0
Episode: 4, Total Reward: 35.0
Episode: 4, Total Reward: 35.0
Episode: 4, Total Reward: 35.0
Episode: 4, Total Reward: 35.0
Episode: 4, Total Reward: 35.0
Episode: 4, Total Reward: 35.0
Episode: 4, Total Reward: 35.0
Episode: 4, Total Reward: 35.0
Episode: 4, Total Reward: 35.0
Episode: 4, Total Reward: 35.0
Episode: 4, Total Reward: 35.0
Episode: 4, Total Reward: 35.0
Episode: 4, Total Reward: 35.0
Episode: 4, Total Reward: 35.0
Episode: 4, Total Reward: 35.0
Episode: 4, Total Reward: 35.0
Episode: 4, Total Reward: 35.0
Episode: 4, Total Reward: 35.0
Episode: 4, Total Reward: 35.0
Episode: 4, Total Reward: 35.0
Episode: 4, Total Reward: 35.0
Episode:

Episode: 4, Total Reward: 135.0
Episode: 4, Total Reward: 135.0
Episode: 4, Total Reward: 135.0
Episode: 4, Total Reward: 135.0
Episode: 4, Total Reward: 135.0
Episode: 4, Total Reward: 135.0
Episode: 4, Total Reward: 135.0
Episode: 4, Total Reward: 135.0
Episode: 4, Total Reward: 135.0
Episode: 4, Total Reward: 135.0
Episode: 4, Total Reward: 135.0
Episode: 4, Total Reward: 135.0
Episode: 4, Total Reward: 135.0
Episode: 4, Total Reward: 135.0
Episode: 4, Total Reward: 135.0
Episode: 4, Total Reward: 135.0
Episode: 4, Total Reward: 135.0
Episode: 4, Total Reward: 135.0
Episode: 4, Total Reward: 135.0
Episode: 4, Total Reward: 135.0
Episode: 4, Total Reward: 135.0
Episode: 4, Total Reward: 135.0
Episode: 4, Total Reward: 135.0
Episode: 4, Total Reward: 135.0
Episode: 4, Total Reward: 135.0
Episode: 4, Total Reward: 135.0
Episode: 4, Total Reward: 135.0
Episode: 4, Total Reward: 135.0
Episode: 4, Total Reward: 135.0
Episode: 4, Total Reward: 135.0
Episode: 4, Total Reward: 135.0
Episode:

Episode: 5, Total Reward: 0.0
Episode: 5, Total Reward: 0.0
Episode: 5, Total Reward: 0.0
Episode: 5, Total Reward: 0.0
Episode: 5, Total Reward: 0.0
Episode: 5, Total Reward: 0.0
Episode: 5, Total Reward: 0.0
Episode: 5, Total Reward: 0.0
Episode: 5, Total Reward: 0.0
Episode: 5, Total Reward: 0.0
Episode: 5, Total Reward: 0.0
Episode: 5, Total Reward: 0.0
Episode: 5, Total Reward: 0.0
Episode: 5, Total Reward: 0.0
Episode: 5, Total Reward: 0.0
Episode: 5, Total Reward: 0.0
Episode: 5, Total Reward: 0.0
Episode: 5, Total Reward: 0.0
Episode: 5, Total Reward: 0.0
Episode: 5, Total Reward: 0.0
Episode: 5, Total Reward: 0.0
Episode: 5, Total Reward: 0.0
Episode: 5, Total Reward: 0.0
Episode: 5, Total Reward: 0.0
Episode: 5, Total Reward: 0.0
Episode: 5, Total Reward: 0.0
Episode: 5, Total Reward: 0.0
Episode: 5, Total Reward: 0.0
Episode: 5, Total Reward: 0.0
Episode: 5, Total Reward: 0.0
Episode: 5, Total Reward: 0.0
Episode: 5, Total Reward: 0.0
Episode: 5, Total Reward: 5.0
Episode: 5

Episode: 5, Total Reward: 75.0
Episode: 5, Total Reward: 75.0
Episode: 5, Total Reward: 75.0
Episode: 5, Total Reward: 75.0
Episode: 5, Total Reward: 75.0
Episode: 5, Total Reward: 75.0
Episode: 5, Total Reward: 75.0
Episode: 5, Total Reward: 75.0
Episode: 5, Total Reward: 75.0
Episode: 5, Total Reward: 75.0
Episode: 5, Total Reward: 75.0
Episode: 5, Total Reward: 75.0
Episode: 5, Total Reward: 75.0
Episode: 5, Total Reward: 75.0
Episode: 5, Total Reward: 75.0
Episode: 5, Total Reward: 75.0
Episode: 5, Total Reward: 75.0
Episode: 5, Total Reward: 75.0
Episode: 5, Total Reward: 75.0
Episode: 5, Total Reward: 75.0
Episode: 5, Total Reward: 75.0
Episode: 5, Total Reward: 75.0
Episode: 5, Total Reward: 75.0
Episode: 5, Total Reward: 75.0
Episode: 5, Total Reward: 75.0
Episode: 5, Total Reward: 75.0
Episode: 5, Total Reward: 75.0
Episode: 5, Total Reward: 75.0
Episode: 5, Total Reward: 75.0
Episode: 5, Total Reward: 75.0
Episode: 5, Total Reward: 75.0
Episode: 5, Total Reward: 75.0
Episode:

Episode: 5, Total Reward: 110.0
Episode: 5, Total Reward: 110.0
Episode: 5, Total Reward: 110.0
Episode: 5, Total Reward: 110.0
Episode: 5, Total Reward: 110.0
Episode: 5, Total Reward: 120.0
Episode: 5, Total Reward: 120.0
Episode: 5, Total Reward: 120.0
Episode: 5, Total Reward: 120.0
Episode: 5, Total Reward: 120.0
Episode: 5, Total Reward: 120.0
Episode: 5, Total Reward: 120.0
Episode: 5, Total Reward: 120.0
Episode: 5, Total Reward: 120.0
Episode: 5, Total Reward: 120.0
Episode: 5, Total Reward: 120.0
Episode: 5, Total Reward: 120.0
Episode: 5, Total Reward: 120.0
Episode: 5, Total Reward: 120.0
Episode: 5, Total Reward: 120.0
Episode: 5, Total Reward: 120.0
Episode: 5, Total Reward: 120.0
Episode: 5, Total Reward: 120.0
Episode: 5, Total Reward: 120.0
Episode: 5, Total Reward: 120.0
Episode: 5, Total Reward: 120.0
Episode: 5, Total Reward: 120.0
Episode: 5, Total Reward: 120.0
Episode: 5, Total Reward: 120.0
Episode: 5, Total Reward: 120.0
Episode: 5, Total Reward: 120.0
Episode:

Episode: 6, Total Reward: 55.0
Episode: 6, Total Reward: 55.0
Episode: 6, Total Reward: 55.0
Episode: 6, Total Reward: 55.0
Episode: 6, Total Reward: 55.0
Episode: 6, Total Reward: 55.0
Episode: 6, Total Reward: 55.0
Episode: 6, Total Reward: 55.0
Episode: 6, Total Reward: 55.0
Episode: 6, Total Reward: 55.0
Episode: 6, Total Reward: 55.0
Episode: 6, Total Reward: 55.0
Episode: 6, Total Reward: 55.0
Episode: 6, Total Reward: 55.0
Episode: 6, Total Reward: 55.0
Episode: 6, Total Reward: 55.0
Episode: 6, Total Reward: 55.0
Episode: 6, Total Reward: 55.0
Episode: 6, Total Reward: 55.0
Episode: 6, Total Reward: 55.0
Episode: 6, Total Reward: 55.0
Episode: 6, Total Reward: 55.0
Episode: 6, Total Reward: 55.0
Episode: 6, Total Reward: 55.0
Episode: 6, Total Reward: 55.0
Episode: 6, Total Reward: 55.0
Episode: 6, Total Reward: 55.0
Episode: 6, Total Reward: 55.0
Episode: 6, Total Reward: 55.0
Episode: 6, Total Reward: 55.0
Episode: 6, Total Reward: 55.0
Episode: 6, Total Reward: 55.0
Episode:

Episode: 6, Total Reward: 110.0
Episode: 6, Total Reward: 110.0
Episode: 6, Total Reward: 110.0
Episode: 6, Total Reward: 110.0
Episode: 6, Total Reward: 110.0
Episode: 6, Total Reward: 110.0
Episode: 6, Total Reward: 110.0
Episode: 6, Total Reward: 110.0
Episode: 6, Total Reward: 110.0
Episode: 6, Total Reward: 110.0
Episode: 6, Total Reward: 110.0
Episode: 6, Total Reward: 110.0
Episode: 6, Total Reward: 110.0
Episode: 6, Total Reward: 110.0
Episode: 6, Total Reward: 110.0
Episode: 6, Total Reward: 110.0
Episode: 6, Total Reward: 110.0
Episode: 6, Total Reward: 110.0
Episode: 6, Total Reward: 110.0
Episode: 6, Total Reward: 110.0
Episode: 6, Total Reward: 110.0
Episode: 6, Total Reward: 110.0
Episode: 6, Total Reward: 110.0
Episode: 6, Total Reward: 110.0
Episode: 6, Total Reward: 110.0
Episode: 6, Total Reward: 110.0
Episode: 6, Total Reward: 110.0
Episode: 6, Total Reward: 110.0
Episode: 6, Total Reward: 110.0
Episode: 6, Total Reward: 110.0
Episode: 6, Total Reward: 110.0
Episode:

Episode: 7, Total Reward: 5.0
Episode: 7, Total Reward: 5.0
Episode: 7, Total Reward: 5.0
Episode: 7, Total Reward: 5.0
Episode: 7, Total Reward: 5.0
Episode: 7, Total Reward: 5.0
Episode: 7, Total Reward: 5.0
Episode: 7, Total Reward: 5.0
Episode: 7, Total Reward: 5.0
Episode: 7, Total Reward: 5.0
Episode: 7, Total Reward: 5.0
Episode: 7, Total Reward: 15.0
Episode: 7, Total Reward: 15.0
Episode: 7, Total Reward: 15.0
Episode: 7, Total Reward: 15.0
Episode: 7, Total Reward: 15.0
Episode: 7, Total Reward: 15.0
Episode: 7, Total Reward: 15.0
Episode: 7, Total Reward: 15.0
Episode: 7, Total Reward: 15.0
Episode: 7, Total Reward: 15.0
Episode: 7, Total Reward: 15.0
Episode: 7, Total Reward: 15.0
Episode: 7, Total Reward: 15.0
Episode: 7, Total Reward: 15.0
Episode: 7, Total Reward: 15.0
Episode: 7, Total Reward: 15.0
Episode: 7, Total Reward: 15.0
Episode: 7, Total Reward: 15.0
Episode: 7, Total Reward: 15.0
Episode: 7, Total Reward: 15.0
Episode: 7, Total Reward: 15.0
Episode: 7, Total R

Episode: 7, Total Reward: 70.0
Episode: 7, Total Reward: 90.0
Episode: 7, Total Reward: 90.0
Episode: 7, Total Reward: 90.0
Episode: 7, Total Reward: 90.0
Episode: 7, Total Reward: 90.0
Episode: 7, Total Reward: 90.0
Episode: 7, Total Reward: 90.0
Episode: 7, Total Reward: 90.0
Episode: 7, Total Reward: 90.0
Episode: 7, Total Reward: 90.0
Episode: 7, Total Reward: 90.0
Episode: 7, Total Reward: 90.0
Episode: 7, Total Reward: 90.0
Episode: 7, Total Reward: 90.0
Episode: 7, Total Reward: 90.0
Episode: 7, Total Reward: 90.0
Episode: 7, Total Reward: 90.0
Episode: 7, Total Reward: 90.0
Episode: 7, Total Reward: 90.0
Episode: 7, Total Reward: 90.0
Episode: 7, Total Reward: 90.0
Episode: 7, Total Reward: 90.0
Episode: 7, Total Reward: 90.0
Episode: 7, Total Reward: 120.0
Episode: 7, Total Reward: 120.0
Episode: 7, Total Reward: 120.0
Episode: 7, Total Reward: 120.0
Episode: 7, Total Reward: 120.0
Episode: 7, Total Reward: 120.0
Episode: 7, Total Reward: 120.0
Episode: 7, Total Reward: 120.0


Episode: 8, Total Reward: 0.0
Episode: 8, Total Reward: 0.0
Episode: 8, Total Reward: 0.0
Episode: 8, Total Reward: 0.0
Episode: 8, Total Reward: 0.0
Episode: 8, Total Reward: 0.0
Episode: 8, Total Reward: 0.0
Episode: 8, Total Reward: 0.0
Episode: 8, Total Reward: 0.0
Episode: 8, Total Reward: 0.0
Episode: 8, Total Reward: 0.0
Episode: 8, Total Reward: 0.0
Episode: 8, Total Reward: 0.0
Episode: 8, Total Reward: 5.0
Episode: 8, Total Reward: 5.0
Episode: 8, Total Reward: 5.0
Episode: 8, Total Reward: 5.0
Episode: 8, Total Reward: 5.0
Episode: 8, Total Reward: 5.0
Episode: 8, Total Reward: 5.0
Episode: 8, Total Reward: 5.0
Episode: 8, Total Reward: 5.0
Episode: 8, Total Reward: 5.0
Episode: 8, Total Reward: 5.0
Episode: 8, Total Reward: 5.0
Episode: 8, Total Reward: 5.0
Episode: 8, Total Reward: 5.0
Episode: 8, Total Reward: 5.0
Episode: 8, Total Reward: 5.0
Episode: 8, Total Reward: 5.0
Episode: 8, Total Reward: 5.0
Episode: 8, Total Reward: 5.0
Episode: 8, Total Reward: 5.0
Episode: 8

Episode: 8, Total Reward: 45.0
Episode: 8, Total Reward: 45.0
Episode: 8, Total Reward: 45.0
Episode: 8, Total Reward: 45.0
Episode: 8, Total Reward: 45.0
Episode: 8, Total Reward: 45.0
Episode: 8, Total Reward: 45.0
Episode: 8, Total Reward: 45.0
Episode: 8, Total Reward: 45.0
Episode: 8, Total Reward: 45.0
Episode: 8, Total Reward: 45.0
Episode: 8, Total Reward: 45.0
Episode: 8, Total Reward: 45.0
Episode: 8, Total Reward: 45.0
Episode: 8, Total Reward: 45.0
Episode: 8, Total Reward: 45.0
Episode: 8, Total Reward: 45.0
Episode: 8, Total Reward: 45.0
Episode: 8, Total Reward: 45.0
Episode: 8, Total Reward: 45.0
Episode: 8, Total Reward: 45.0
Episode: 8, Total Reward: 45.0
Episode: 8, Total Reward: 45.0
Episode: 8, Total Reward: 45.0
Episode: 8, Total Reward: 45.0
Episode: 8, Total Reward: 45.0
Episode: 8, Total Reward: 45.0
Episode: 8, Total Reward: 45.0
Episode: 8, Total Reward: 45.0
Episode: 8, Total Reward: 45.0
Episode: 8, Total Reward: 45.0
Episode: 8, Total Reward: 45.0
Episode:

Episode: 9, Total Reward: 30.0
Episode: 9, Total Reward: 30.0
Episode: 9, Total Reward: 30.0
Episode: 9, Total Reward: 30.0
Episode: 9, Total Reward: 30.0
Episode: 9, Total Reward: 30.0
Episode: 9, Total Reward: 30.0
Episode: 9, Total Reward: 30.0
Episode: 9, Total Reward: 30.0
Episode: 9, Total Reward: 30.0
Episode: 9, Total Reward: 30.0
Episode: 9, Total Reward: 30.0
Episode: 9, Total Reward: 30.0
Episode: 9, Total Reward: 30.0
Episode: 9, Total Reward: 30.0
Episode: 9, Total Reward: 30.0
Episode: 9, Total Reward: 30.0
Episode: 9, Total Reward: 30.0
Episode: 9, Total Reward: 30.0
Episode: 9, Total Reward: 30.0
Episode: 9, Total Reward: 30.0
Episode: 9, Total Reward: 30.0
Episode: 9, Total Reward: 30.0
Episode: 9, Total Reward: 30.0
Episode: 9, Total Reward: 30.0
Episode: 9, Total Reward: 30.0
Episode: 9, Total Reward: 30.0
Episode: 9, Total Reward: 30.0
Episode: 9, Total Reward: 30.0
Episode: 9, Total Reward: 30.0
Episode: 9, Total Reward: 30.0
Episode: 9, Total Reward: 30.0
Episode:

Episode: 10, Total Reward: 0.0
Episode: 10, Total Reward: 0.0
Episode: 10, Total Reward: 0.0
Episode: 10, Total Reward: 0.0
Episode: 10, Total Reward: 0.0
Episode: 10, Total Reward: 0.0
Episode: 10, Total Reward: 0.0
Episode: 10, Total Reward: 0.0
Episode: 10, Total Reward: 0.0
Episode: 10, Total Reward: 0.0
Episode: 10, Total Reward: 0.0
Episode: 10, Total Reward: 0.0
Episode: 10, Total Reward: 0.0
Episode: 10, Total Reward: 0.0
Episode: 10, Total Reward: 0.0
Episode: 10, Total Reward: 0.0
Episode: 10, Total Reward: 0.0
Episode: 10, Total Reward: 0.0
Episode: 10, Total Reward: 0.0
Episode: 10, Total Reward: 0.0
Episode: 10, Total Reward: 0.0
Episode: 10, Total Reward: 0.0
Episode: 10, Total Reward: 0.0
Episode: 10, Total Reward: 0.0
Episode: 10, Total Reward: 0.0
Episode: 10, Total Reward: 0.0
Episode: 10, Total Reward: 0.0
Episode: 10, Total Reward: 0.0
Episode: 10, Total Reward: 0.0
Episode: 10, Total Reward: 0.0
Episode: 10, Total Reward: 0.0
Episode: 10, Total Reward: 0.0
Episode:

Episode: 10, Total Reward: 30.0
Episode: 10, Total Reward: 30.0
Episode: 10, Total Reward: 30.0
Episode: 10, Total Reward: 30.0
Episode: 10, Total Reward: 30.0
Episode: 10, Total Reward: 30.0
Episode: 10, Total Reward: 30.0
Episode: 10, Total Reward: 30.0
Episode: 10, Total Reward: 30.0
Episode: 10, Total Reward: 30.0
Episode: 10, Total Reward: 45.0
Episode: 10, Total Reward: 45.0
Episode: 10, Total Reward: 45.0
Episode: 10, Total Reward: 45.0
Episode: 10, Total Reward: 45.0
Episode: 10, Total Reward: 45.0
Episode: 10, Total Reward: 45.0
Episode: 10, Total Reward: 45.0
Episode: 10, Total Reward: 45.0
Episode: 10, Total Reward: 45.0
Episode: 10, Total Reward: 45.0
Episode: 10, Total Reward: 45.0
Episode: 10, Total Reward: 45.0
Episode: 10, Total Reward: 45.0
Episode: 10, Total Reward: 45.0
Episode: 10, Total Reward: 45.0
Episode: 10, Total Reward: 45.0
Episode: 10, Total Reward: 45.0
Episode: 10, Total Reward: 45.0
Episode: 10, Total Reward: 45.0
Episode: 10, Total Reward: 65.0
Episode:

#### Saving the Trained Model

In [12]:
# Save the trained model
agent.model.save('spaceinvaders_dqn_model.h5')

#### Evaluating the Model

Code snippet for evaluating the trained DQN model on the Space Invaders game.

In [13]:
# Evaluate the trained model
env = gym.make('SpaceInvaders-v0')

# Load the trained model
model = tf.keras.models.load_model('spaceinvaders_dqn_model.h5')

# Evaluate the model for 10 episodes
total_reward = 0
for episode in range(10):
    state = env.reset()
    done = False
    while not done:
        # Reshape the state
        state = np.expand_dims(state, axis=0)

        # Get the action from the model
        action = np.argmax(model.predict(state))

        # Take the action in the environment
        next_state, reward, done, _ = env.step(action)

        state = next_state
        total_reward += reward

print(f"Average reward: {total_reward / 10}")





























































Average reward: 76.5


<a id='pacman'></a>
### [Model 3: Pacman](#table-of-contents)


#### Creating DQN Class

The DQN class represents a Deep Q-Network (DQN) agent for reinforcement learning. It uses a neural network model to approximate the Q-values of different actions in a given state. The DQN agent can remember past experiences, select actions based on an exploration-exploitation strategy, and update its model through experience replay.

In [14]:
class DQN:
    def __init__(self, state_space, action_space):
        self.state_space = state_space
        self.action_space = action_space
        self.memory = []
        self.gamma = 0.95  # discount factor
        self.epsilon = 20.0  # exploration rate
        self.epsilon_min = 0.01
        self.epsilon_decay = 0.995
        self.model = self.build_model()

    def build_model(self):
        model = Sequential()
        model.add(Flatten(input_shape=self.state_space))
        model.add(Dense(24, activation='relu'))
        model.add(Dense(24, activation='relu'))
        model.add(Dense(self.action_space, activation='linear'))
        model.compile(loss='mse', optimizer=Adam(learning_rate=0.001))
        return model

    def remember(self, state, action, reward, next_state, done):
        self.memory.append((state, action, reward, next_state, done))

    def act(self, state):
        if np.random.rand() <= self.epsilon:
            return np.random.randint(self.action_space)
        q_values = self.model.predict(state)
        return np.argmax(q_values[0])
    def experience_replay(self, batch_size):
        """Store and sample experiences from the environment."""
        # Store the most recent experiences in a replay buffer
        for state, action, reward, next_state, done in self.memory:
            self.replay_buffer.append((state, action, reward, next_state, done))

        # Sample a random batch of experiences from the replay buffer
        batch = self.replay_buffer.sample(batch_size)

        # Update the Q-network using the sampled experiences
        self.update_q_network(batch)

    def update_exploration_rate(self):
        self.epsilon = self.epsilon * self.epsilon_decay
        if self.epsilon < self.epsilon_min:
            self.epsilon = self.epsilon_min

##### Initialization
- state_space (int): The dimensionality of the state space.
- action_space (int): The number of possible actions in the environment.
- memory (list): A list to store past experiences.
- gamma (float): The discount factor for future rewards.
- epsilon (float): The exploration rate, determining the probability of selecting a random action.
- epsilon_min (float): The minimum value of the exploration rate.
- epsilon_decay (float): The decay rate for the exploration rate.
- model (Sequential): The neural network model for estimating Q-values.

##### Building the Model

The build_model method constructs the neural network model used by the DQN agent. It consists of several fully connected layers with ReLU activation functions and a linear activation output layer. The model is compiled with mean squared error (MSE) loss and an Adam optimizer with a learning rate of 0.001.

##### Remembering Experiences

The remember method stores a tuple representing an experience in the agent's memory. Each experience consists of the current state, the action taken, the reward received, the next state, and a flag indicating whether the episode is done.

##### Experience Replay

The act method selects an action based on an exploration-exploitation strategy. If a random number is less than or equal to the exploration rate (epsilon), a random action is chosen. Otherwise, the agent uses its model to predict the Q-values for the given state and selects the action with the highest Q-value.

##### Updating Exploration Rate

The update_exploration_rate method updates the exploration rate (epsilon) by multiplying it with the decay rate (epsilon_decay). If the updated exploration rate becomes smaller than the minimum exploration rate (epsilon_min), it is clamped to the minimum value. This allows the agent to gradually reduce exploration over time.

#### Creating the Environment

In [15]:
# Create the environment
env = gym.make('MsPacman-v0')

state_space = env.observation_space.shape
action_space = env.action_space.n

#### Creating the DQN Model

In [16]:
# Create the DQN agent
agent = DQN(state_space, action_space)

#### Training the Model

Code snippet for training the DQN model on the Pacman game.

In [17]:
# Training loop
batch_size = 32
episodes = 10

for episode in range(episodes):
    state = env.reset()
    done = False
    total_reward = 0

    while not done:
        env.render()
        # Preprocess state (if necessary) and reshape it
        state = np.expand_dims(state, axis=0)

        # Choose action
        action = agent.act(state)

        # Take the action in the environment
        step_result = env.step(action)
        next_state = step_result[0]
        reward = step_result[1]
        done = step_result[2]


        # Preprocess next_state (if necessary) and reshape it
        next_state = np.expand_dims(next_state, axis=0)

        # Store the experience in memory
        agent.remember(state, action, reward, next_state, done)

        # Update the state
        state = next_state
        total_reward += reward

    print(f"Episode: {episode+1}, Total Reward: {total_reward}")

Episode: 1, Total Reward: 150.0
Episode: 2, Total Reward: 220.0
Episode: 3, Total Reward: 170.0
Episode: 4, Total Reward: 280.0
Episode: 5, Total Reward: 280.0
Episode: 6, Total Reward: 130.0
Episode: 7, Total Reward: 170.0
Episode: 8, Total Reward: 150.0
Episode: 9, Total Reward: 220.0
Episode: 10, Total Reward: 160.0


#### Saving the Model

In [18]:
# Save the trained model
agent.model.save('pacman_dqn_model.h5')

#### Evaluating the Model

Code snippet for evaluating the trained DQN model on the Pacman game.

In [19]:
# Evaluate the trained model
env = gym.make('MsPacman-v0')

# Load the trained model
model = tf.keras.models.load_model('pacman_dqn_model.h5')

# Evaluate the model for 10 episodes
total_reward = 0
for episode in range(10):
    state = env.reset()
    done = False
    while not done:
        # Reshape the state
        state = np.expand_dims(state, axis=0)

        # Get the action from the model
        action = np.argmax(model.predict(state))

        # Take the action in the environment
        next_state, reward, done, _ = env.step(action)

        state = next_state
        total_reward += reward

print(f"Average reward: {total_reward / 10}")

























































































Average reward: 70.0


<a id='results-and-analysis'></a>
## [Results and Analysis](#table-of-contents)

#### CartPole
During the training of the CartPole model, we observed the following rewards for each episode:

Episode: 1, Total Reward: 25.0\
Episode: 2, Total Reward: 21.0\
Episode: 3, Total Reward: 11.0\
Episode: 4, Total Reward: 24.0\
Episode: 5, Total Reward: 33.0\
Episode: 6, Total Reward: 24.0\
Episode: 7, Total Reward: 41.0\
Episode: 8, Total Reward: 19.0\
Episode: 9, Total Reward: 10.0\
Episode: 10, Total Reward: 24.0

After training, we evaluated the model and obtained an average reward of 9.4. This means that, on average, the model achieved a reward of 9.4 points during the evaluation phase. While the model was able to learn and play the CartPole game to some extent, the performance could be further improved.

#### Space Invaders
For the Space Invaders game, we directly evaluated the model without explicitly mentioning the training phase. The evaluation resulted in an average reward of 76.5. This indicates that the model performed relatively well in playing Space Invaders, achieving an average reward of 76.5 points.

#### Pacman
During the training of the Pacman model, we observed the following rewards for each episode:

Episode: 1, Total Reward: 150.0\
Episode: 2, Total Reward: 220.0\
Episode: 3, Total Reward: 170.0\
Episode: 4, Total Reward: 280.0\
Episode: 5, Total Reward: 280.0\
Episode: 6, Total Reward: 130.0\
Episode: 7, Total Reward: 170.0\
Episode: 8, Total Reward: 150.0\
Episode: 9, Total Reward: 220.0\
Episode: 10, Total Reward: 160.0

Similar to CartPole, after training the Pacman model, we evaluated it and obtained an average reward of 70.0. While the model showed some improvement during training, the average reward suggests that further optimization and fine-tuning may be necessary to achieve better performance.

<a id='conclusion'></a>
## [Conclusion](#table-of-contents)

In this project, we utilized reinforcement learning and the Deep Q-Network (DQN) approach to build models capable of playing Atari video games. We trained models for three different games: CartPole, Space Invaders, and Pacman.

The results indicate that the models achieved varying levels of success in playing the games. While the Space Invaders model performed relatively well with an average reward of 9.4, the CartPole and Pacman models showed room for improvement with average rewards of 76.5 and 70.0, respectively.

Further exploration and experimentation with different architectures, hyperparameters, and training techniques may lead to enhanced performance and more optimal gameplay. By continuing to refine and iterate on the models, we can strive to build AI agents that surpass human-level performance in playing Atari video games.

<a id='references'></a>
## [References](#table-of-contents)

1. github.com/purnastarc/CartPole-v1
2. stackoverflow.com/questions/67258316/keras-rl-reinforce-model-after-its-training
3. github.com/DableUTeeF/RL
4. github.com/ianmagyar/machine-learning-ii
5. github.com/shivaverma/OpenAIGym
6. github.com/HebdaMarta/Deep-learning-projects
7. github.com/rllabmcgill/rl_final_project_turtle