# Report
---
In this notebook, We see my implementation of the first project of the Deep Reinforcement Learning Nanodegree

## Implementation Details

### Summary
I implement a Deep Q-Network. Based on DRLND Project Sample.

### Details
The Deep Q-Network has following Neural Networks.

- First fully connected layer
  - inputs (states) -> $N_1$ 
- Second fully connected layer
  - $N_1$ -> $N_2$
- Third fully connected layer
  - $N_2$ -> outputs (actions)
  
### Hyperparameters

Agent hyperparameters are:

|parameter | value | description |
|----------|-------|-------------|
|buffer_size|100000| Number of experiences to hold in the replay memory |
|batch_size|64| Minibatch size used at each step |
|gamma | 0.95 | Discount applied to future rewards |
|tau | 0.0001 | Scaling parameter applied to spup update |
|LR | 0.00005 | Learning rate for Adam optimizer |
|update_interval | 4 | Number of agent steps between update oprations |
|hidden_layer| (128, 128) | Number of nodes for hidden layer ($N_1$, $N_2$)|

Training parameters are:

|parameter | value | description |
|----------|-------|-------------|
|episodes  | 2000 | Maximum number of training episodes |
|eps_start, eps_end, eps_decay | 1.0, 0.01, 0.995 | parameters of epslion-greedy policy |
| expect | 13 | Expected score to solved | 

## Execution


### Import and setting env

In [1]:
#import requirements

from unityagents import UnityEnvironment
import numpy as np

# My helpers
from scores import *
from dqn_agent import *
from model import *

cpu


In [2]:
# for mac
env = UnityEnvironment(file_name="Banana.app")

# for linux
#env = UnityEnvironment(file_name="Banana_Linux_NoVis/Banana.x86_64")

# for windows
#env = UnityEnvironment(file_name="Banana_Windows_x86_64/Banana.exe")

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		
Unity brain name: BananaBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 37
        Number of stacked Vector Observation: 1
        Vector Action space type: discrete
        Vector Action space size (per agent): 4
        Vector Action descriptions: , , , 


In [None]:
# get the default brain
brain_name = env.brain_names[0]
brain = env.brains[brain_name]

# Environement reset
env_info = env.reset(train_mode=True)[brain_name]

# number of agents in the environment
print('Number of agents:', len(env_info.agents))

# number of actions
action_size = brain.vector_action_space_size
print('Number of actions:', action_size)

# examine the state space 
state = env_info.vector_observations[0]
print('States look like:', state)
state_size = len(state)
print('States have length:', state_size)

### Functions for single run or training

In [None]:
# function to run single episode & Train function

def runOnce(env, agent, eps ):
    score = 0
    env_info = env.reset(train_mode=True)[brain_name]
    state = env_info.vector_observations[0]
    done = False
    while not done:
        action = agent.act(state, eps )
        env_info = env.step(action)[brain_name]
        next_state = env_info.vector_observations[0]
        reward = env_info.rewards[0]
        done = env_info.local_done[0]
        agent.step(state, action, reward, next_state, done)
        score += reward
        state = next_state
    return score

def TrainAgent( env, agent, expect, prefix,  episodes=2000, window_size=100, eps_start=1.0, eps_end=0.01, eps_decay=0.995):
    scores = Scores(expect, window_size)
    success = False
    eps = eps_start
    for i_episode in range(1, episodes+1):
        score = runOnce(env, agent, eps )
        eps = max(eps_end, eps_decay*eps)
        if scores.AddScore(score) == True:
            agent.Save('model.pt')
            success = True
            break

    scores.FlushLog(prefix, True)
    return success

### Hyperparameters

In [None]:
# Hyperparameters
params = {
    'buffer_size': int(1e5),    # replay buffer size
    'batch_size': 64,           # minibatch size
    'gamma': 0.95,              # discount factor
    'tau': 1e-3,                # for soft update of target parameters
    'LR': 5e-4,                 # learning rate 
    'update_interval': 4,       # how often to update the network
    'hidden_layer': (128, 128)  # hidden layer info
}

### Train and result

In [None]:
# Train it!
agent = Agent(state_size=state_size, action_size=action_size, seed=0, params = params ) 

TrainAgent(env, agent, 13, 'banana13')

### Train for more scores

In [None]:
# Train for more score


agent = Agent(state_size=state_size, action_size=action_size, seed=0, params = params ) 

TrainAgent(env, agent, 15, 'banana_15')

### Comments
My implementation of DQN may be not bad. But when expect score is 13, I  got some worse score case (under 5). 

I increase score to 15. It task some more epoch, and average score is increase. but still got very low score. I think it caused only state and action based learning. If agent get more information like optional project, I get more good result.

## Ideas for future work

- Implement DQN extensions likes Dueling networks, Prioritized Experience Relay, Raindow DQN or etc.
- Change NN Model
  - More hidden layers
  - More nodes in hidden layers.
- Learing from pixel information (Optional project)