# Project Navigation Report

## Goal

The task of the project is to train an agent, using Deep Reinforcement Learning, to collect ripe bananas (yellow banana) and avoid the spoiled ones (blue banana).

A ripe banana yields a score of +1, while a spoiled banana yields a score of -1.

The agent must collect at least 13 bananas, over at least 100 episodes.

## Implementation details

The chosen algorithm used to solve the task is Double Deep Q-Network; which enables the agent to estimate the Q value rather than having to maintain a Q table in memory. 

This algorithm variant, also mitigates the overestimation of the action values by training a separate symmetrical neural network.

![double_deep_q_formula.PNG](attachment:double_deep_q_formula.PNG)
> Qa and Qb are two separate neural networks, each with its own set of weights

The implementation also makes use of Experience Replay, a technique used to separate the exploratory experience accumulation task from the learning task. 

Given this separation of concerns we are now able to sample a batch of scenarios we want to learn from, thus avoid learning from highly correlated experience scenarios.


### Neural network architecture

The network have an input shape of 37 (the number of state dimensions), two hidden layers with 64 neurons and finally an output layer of 4 neurons (the number of actions available).

![neural_net_shape.PNG](attachment:neural_net_shape.PNG)
>The neural network

This architecture remains the same for both networks, the local and target network. 




### Training parameters

> Number of episodes: 1000

> Maximum timestep for each episode: 1000

> Epsilon initial value: 1.0

> Epsilon minimum value: 0.01

> Epsilon decay over time: 0.005

> Experience replay buffer size: 100,000

> Batch size: 64

> Gamma discount factor: 0.99

> Tau: 0.0003 (Softening factor, used to update the target network weights)

> Learning rate: 0.00005

> Update rate: 4 (How often to update the network in episodes)



### Training result

The agent was able to reach the established score of 13 at roughly 400 episodes and the score of 15 at 441 episodes!

![score_plot.PNG](attachment:score_plot.PNG)

### Thoughts for improvement: 

Extend the current vanilla DQN with the following algorithms:
  - DDQN
  - Prioritized DDQN
  - Dueling DDQN
  - A3C
  - Distributional DQN
  - Noisy DQN
  - Rainbow
  
Fine tune the hyperparameters along the implementations and benchmark the performance of each.