# Navigation Project

This project implements a deep Q-network similar to the paper [Human-level control through deep reinforcement learning](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf) to train an agent who navigates to yellow bananas in a Unity environment

## General DQN Learning Algorithm

![DQN Algorithm](images/DQN_algorithm1.png)

## Implementation:

The primary difference between the implementation of DQN in [Human-level control through deep reinforcement learning](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf) and this Navigation project is direct knowledge of the enviornment in Navigation. The original DQN used only the screen images to learn Atari games. They pre-process the screen shots using $\phi$ to generalize across many games. Our enviornment is only 37 inputs and we are able to use a simple multi-layer perceptron Q Network function approximator instead of the CNN archetecture listed in the paper <br>

The file dqn_agent_nav.py and model_nav.py contain the specific implementation used. PyTorch and Python 3 were used to implement the algorithm. <br>

See the file navigation_unity_environment.ipynb to interact with the environment.
### Hyperparameters:

In [1]:
BUFFER_SIZE = int(1e5)  # replay buffer size
BATCH_SIZE = 64         # minibatch size 
GAMMA = 0.99            # discount factor 
TAU = 1e-3              # for soft update of target parameters 
LR = 5e-4               # learning rate 
UPDATE_EVERY = 4        # how often to update the network

### Q-Network Architecture

The Q-Network is implemented with a MLP. It uses 3 fully connected layers with ReLU activation. The loss is computed by minimizing the MSE of the Q expected with the Q target.

In [2]:
state_size = 37
action_size = 4
seed = 0

import torch
from model_nav import QNetwork
# Q-Network
qnetwork = QNetwork(state_size, action_size, seed)

print(qnetwork)

QNetwork(
  (fc1): Linear(in_features=37, out_features=64, bias=True)
  (fc2): Linear(in_features=64, out_features=64, bias=True)
  (fc3): Linear(in_features=64, out_features=4, bias=True)
)


## Results

The goal was an average of 13 averaged over 100 episodes. I ran it to 15 to be certain I would meet the threshold. The DQN agent Solved the Environment in 626 episodes with an average score of 15.02.

![](images/results.png)

## Ideas for future work

This project implements a basic DQN. Performance would likely increase with the implementation of double DQN, a dueling DQN, and/or prioritized experience replay. The Hyperparameters were selected from an implementation to solve the OpenAI gym lunar lander environment. They could be tuned using a grid search to speed up the training time of this Navigation implementation.