# Results of the Navigation Project

### 1. Introduction & goal of the project

The objective of this project is to train an agent to navigate (and collect bananas!) in a large, square world. It's a Unity environment provided by [Udacity](https://www.udacity.com/).

<img src=" https://user-images.githubusercontent.com/10624937/42135619-d90f2f28-7d12-11e8-8823-82b970a54d7e.gif"/>

The state space has 37 dimensions and contains the agent's velocity, along with ray-based perception of objects around agent's forward direction.  Given this information, the agent has to learn how to best select actions.  Four discrete actions are available, corresponding to:
- **`0`** - move forward.
- **`1`** - move backward.
- **`2`** - turn left.
- **`3`** - turn right.

The task is episodic, and in order to solve the environment, the agent must get an average score of +13 over 100 consecutive episodes.

A reward of +1 is provided for collecting a yellow banana, and a reward of -1 is provided for collecting a blue banana.  Thus, the goal of the agent is to collect as many yellow bananas as possible while avoiding blue bananas.  

### 2. Settings
* ML Framework: **`Pytorch 1.4.0`** 
* GPU acceleration: **`CUDAToolkit 10.1`**
* Operating system: **`Ubuntu 18.04`**
* Unity: **`Versin 2019.3`**
* Unity environment: **`Banana_Linux/Banana.x86_64`** provided by Udacity.

### 3. Packages
**Implementation:** The project is implemented in `python 3.6` and within a `Jupyter Notebook`. 
<br>
The following packages where imported:
```python
from unityagents import UnityEnvironment
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

import time
import random

from collections import namedtuple, deque

import matplotlib.pyplot as plt
%matplotlib inline
```

### 4. Agent
The project has been solved with a **Double-DQN** agent. 
- The replay buffer size is 100000
- The batch size is 32
- Gamma = 0.99
- Tau = 0.01 (for soft update of target parameters)
- Learning rate = 0.0005
- The weights will be updated every 4 steps

The Implementation of the Double-DQN agent is according to [`Mnih et al., 2015`](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf) and  [`van Hasselt et al., 2015`](https://arxiv.org/pdf/1509.06461.pdf)

### 5. Training

* The maximum number of training episodes is 2000.
* The maximum number of timesteps per episode is 1000.
* The starting value of epsilon, for epsilon-greedy action selection is 1.0.
* The minimum value of epsilon is 0.1
* The multiplicative factor (per episode) for decreasing epsilon is 0.995.

### 6. Results depending on the Model
#### Model

The value function has been represented by a deep neural network:
- one input layer of size 37 (=size of the state space).
- variable hidden layers of different sizes (see below).
- one output layer of size 4 (= size of the action space).
- The activation function is **Relu**.
- The **dropout probability is 0,1** (=10%).

#### Results
The results are depicted according to the sizes of the hidden layers!

##### Size of the hidden layers: 64, 64, 64
Training results:

<img src="images/training_results_ddqn_64_64_64.png" width="500" alt="Training Results" align="left"/>

- Mean value of the 1. validation of 50 samples with 400 steps: **16.58**
- Mean value of the 2. validation of 50 samples with 400 steps: **16.64**
- Mean value of the 3. validation of 50 samples with 400 steps: **16.26**
- Mean value of the 4. validation of 50 samples with 400 steps: **16.14**
- Mean value of the 5. validation of 50 samples with 400 steps: **16.88**

Total mean value: **16.50**

##### Size of the hidden layers: 128, 128, 128
Training results:

<img src="images/training_results_ddqn_128_128_128.png" width="500" alt="Training Results" align="left"/>

- Mean value of the 1. validation of 50 samples with 400 steps: **16.08**
- Mean value of the 2. validation of 50 samples with 400 steps: **15.38**
- Mean value of the 3. validation of 50 samples with 400 steps: **14.92**
- Mean value of the 4. validation of 50 samples with 400 steps: **16.26**
- Mean value of the 5. validation of 50 samples with 400 steps: **15.08**

Total mean value: **15.54**

##### Size of the hidden layers: 256, 256, 256
Training results:

<img src="images/training_results_ddqn_256_256_256.png" width="500" alt="Training Results" align="left"/>

- Mean value of the 1. validation of 50 samples with 400 steps: **15.44**
- Mean value of the 2. validation of 50 samples with 400 steps: **16.14**
- Mean value of the 3. validation of 50 samples with 400 steps: **15.92**
- Mean value of the 4. validation of 50 samples with 400 steps: **15.0**
- Mean value of the 5. validation of 50 samples with 400 steps: **15.68**

Total mean value: **15.64**

##### Size of the hidden layers: 64, 64, 128
Training results:

<img src="images/training_results_ddqn_64_64_128.png" width="500" alt="Training Results" align="left"/>

- Mean value of the 1. validation of 50 samples with 400 steps: **15.56**
- Mean value of the 2. validation of 50 samples with 400 steps: **15.0**
- Mean value of the 3. validation of 50 samples with 400 steps: **16.0**
- Mean value of the 4. validation of 50 samples with 400 steps: **15.4**
- Mean value of the 5. validation of 50 samples with 400 steps: **15.5**

Total mean value: **15.49**

##### Size of the hidden layers: 256, 256, 128
Training results:

<img src="images/training_results_ddqn_256_256_128.png" width="500" alt="Training Results" align="left"/>

- Mean value of the 1. validation of 50 samples with 400 steps: **17.06**
- Mean value of the 2. validation of 50 samples with 400 steps: **17.06**
- Mean value of the 3. validation of 50 samples with 400 steps: **16.1**
- Mean value of the 4. validation of 50 samples with 400 steps: **16.92**
- Mean value of the 5. validation of 50 samples with 400 steps: **16.5**

Total mean value: **16.73**

##### Size of the hidden layers: 256, 256, 64
Training results:

<img src="images/training_results_ddqn_256_256_64.png" width="500" alt="Training Results" align="left"/>

- Mean value of the 1. validation of 50 samples with 400 steps: **15.5**
- Mean value of the 2. validation of 50 samples with 400 steps: **15.64**
- Mean value of the 3. validation of 50 samples with 400 steps: **16.28**
- Mean value of the 4. validation of 50 samples with 400 steps: **15.42**
- Mean value of the 5. validation of 50 samples with 400 steps: **16.0**

Total mean value: **15.77**

##### Size of the hidden layers: 32, 32, 32
Training results:

<img src="images/training_results_ddqn_32_32_32.png" width="500" alt="Training Results" align="left"/>

- Mean value of the 1. validation of 50 samples with 400 steps: **16.0**
- Mean value of the 2. validation of 50 samples with 400 steps: **16.16**
- Mean value of the 3. validation of 50 samples with 400 steps: **15.92**
- Mean value of the 4. validation of 50 samples with 400 steps: **16.28**
- Mean value of the 5. validation of 50 samples with 400 steps: **15.88**

Total mean value: **16.05**

##### Size of the hidden layers: 16, 16, 16
**The network did not solve the exercise within 1000 episodes!** 

Training results:

<img src="images/training_results_ddqn_16_16_16.png" width="400" alt="Training Results" align="left"/>

- Mean value of the 1. validation of 50 samples with 400 steps: **9.68**
- Mean value of the 2. validation of 50 samples with 400 steps: **10.06**
- Mean value of the 3. validation of 50 samples with 400 steps: **10.5**
- Mean value of the 4. validation of 50 samples with 400 steps: **9.26**
- Mean value of the 5. validation of 50 samples with 400 steps: **10.34**

Total mean value: **9.97**

##### Size of the hidden layers: 32, 32
Training results:

<img src="images/training_results_ddqn_32_32.png" width="500" alt="Training Results" align="left"/>

- Mean value of the 1. validation of 50 samples with 400 steps: **15.52**
- Mean value of the 2. validation of 50 samples with 400 steps: **15.34**
- Mean value of the 3. validation of 50 samples with 400 steps: **15.22**
- Mean value of the 4. validation of 50 samples with 400 steps: **14.86**
- Mean value of the 5. validation of 50 samples with 400 steps: **15.94**

Total mean value: **15.38**

##### Size of the hidden layers: 32
Training results:

<img src="images/training_results_ddqn_32.png" width="500" alt="Training Results" align="left"/>

- Mean value of the 1. validation of 50 samples with 400 steps: **15.22**
- Mean value of the 2. validation of 50 samples with 400 steps: **16.16**
- Mean value of the 3. validation of 50 samples with 400 steps: **15.02**
- Mean value of the 4. validation of 50 samples with 400 steps: **15.08**
- Mean value of the 5. validation of 50 samples with 400 steps: **15.64**

Total mean value: **15.42**

##### Size of the hidden layers: 24
Training results:

<img src="images/training_results_ddqn_24.png" width="500" alt="Training Results" align="left"/>

- Mean value of the 1. validation of 50 samples with 400 steps: **16.26**
- Mean value of the 2. validation of 50 samples with 400 steps: **16.14**
- Mean value of the 3. validation of 50 samples with 400 steps: **16.52**
- Mean value of the 4. validation of 50 samples with 400 steps: **16.04**
- Mean value of the 5. validation of 50 samples with 400 steps: **15.84**

Total mean value: **16.16**

##### Size of the hidden layers: 20
Training results:

<img src="images/training_results_ddqn_20.png" width="500" alt="Training Results" align="left"/>

- Mean value of the 1. validation of 50 samples with 400 steps: **15.0**
- Mean value of the 2. validation of 50 samples with 400 steps: **15.24**
- Mean value of the 3. validation of 50 samples with 400 steps: **14.36**
- Mean value of the 4. validation of 50 samples with 400 steps: **14.6**
- Mean value of the 5. validation of 50 samples with 400 steps: **15.02**

Total mean value: **14.84**

##### Size of the hidden layers: 18
Training results:

<img src="images/training_results_ddqn_18.png" width="500" alt="Training Results" align="left"/>

- Mean value of the 1. validation of 50 samples with 400 steps: **16.14**
- Mean value of the 2. validation of 50 samples with 400 steps: **14.78**
- Mean value of the 3. validation of 50 samples with 400 steps: **16.12**
- Mean value of the 4. validation of 50 samples with 400 steps: **15.72**
- Mean value of the 5. validation of 50 samples with 400 steps: **16.02**

Total mean value: **15.76**

##### Size of the hidden layers: 16
Training results:

<img src="images/training_results_ddqn_16.png" width="500" alt="Training Results" align="left"/>

- Mean value of the 1. validation of 50 samples with 400 steps: **14.96**
- Mean value of the 2. validation of 50 samples with 400 steps: **15.7**
- Mean value of the 3. validation of 50 samples with 400 steps: **15.8**
- Mean value of the 4. validation of 50 samples with 400 steps: **15.54**
- Mean value of the 5. validation of 50 samples with 400 steps: **15.62**

Total mean value: **15.52**

##### Size of the hidden layers: 10
Training results:

**The network did not solve the exercise within 1000 episodes!** 

<img src="images/training_results_ddqn_10.png" width="400" alt="Training Results" align="left"/>

- Mean value of the 1. validation of 50 samples with 400 steps: **12.94**
- Mean value of the 2. validation of 50 samples with 400 steps: **12.44**
- Mean value of the 3. validation of 50 samples with 400 steps: **13.08**
- Mean value of the 4. validation of 50 samples with 400 steps: **13.26**
- Mean value of the 5. validation of 50 samples with 400 steps: **13.32**

Total mean value: **13.01**