# Navigation

---

In this notebook, you will learn how to use the Unity ML-Agents environment for the first project of the [Deep Reinforcement Learning Nanodegree](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893).

### 1. Start the Environment

We begin by importing some necessary packages.  If the code cell below returns an error, please revisit the project instructions to double-check that you have installed [Unity ML-Agents](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Installation.md) and [NumPy](http://www.numpy.org/).

In [1]:
from unityagents import UnityEnvironment
import numpy as np
import os, sys
from collections import deque
import matplotlib.pyplot as plt
%matplotlib inline

Next, we will start the environment!  **_Before running the code cell below_**, change the `file_name` parameter to match the location of the Unity environment that you downloaded.

- **Mac**: `"path/to/Banana.app"`
- **Windows** (x86): `"path/to/Banana_Windows_x86/Banana.exe"`
- **Windows** (x86_64): `"path/to/Banana_Windows_x86_64/Banana.exe"`
- **Linux** (x86): `"path/to/Banana_Linux/Banana.x86"`
- **Linux** (x86_64): `"path/to/Banana_Linux/Banana.x86_64"`
- **Linux** (x86, headless): `"path/to/Banana_Linux_NoVis/Banana.x86"`
- **Linux** (x86_64, headless): `"path/to/Banana_Linux_NoVis/Banana.x86_64"`

For instance, if you are using a Mac, then you downloaded `Banana.app`.  If this file is in the same folder as the notebook, then the line below should appear as follows:
```
env = UnityEnvironment(file_name="Banana.app")
```

In [2]:
env = UnityEnvironment(file_name="M:/notebooks/Jonathan/Udacity_Project1/p1_navigation/Banana_Windows_x86_64/Banana_Windows_x86_64/Banana.app")

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		
Unity brain name: BananaBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 37
        Number of stacked Vector Observation: 1
        Vector Action space type: discrete
        Vector Action space size (per agent): 4
        Vector Action descriptions: , , , 


Environments contain **_brains_** which are responsible for deciding the actions of their associated agents. Here we check for the first brain available, and set it as the default brain we will be controlling from Python.

In [3]:
# get the default brain
brain_name = env.brain_names[0]
brain = env.brains[brain_name]

### 2. Examine the State and Action Spaces

The simulation contains a single agent that navigates a large environment.  At each time step, it has four actions at its disposal:
- `0` - walk forward 
- `1` - walk backward
- `2` - turn left
- `3` - turn right

The state space has `37` dimensions and contains the agent's velocity, along with ray-based perception of objects around agent's forward direction.  A reward of `+1` is provided for collecting a yellow banana, and a reward of `-1` is provided for collecting a blue banana. 

Run the code cell below to print some information about the environment.

In [4]:
# reset the environment
env_info = env.reset(train_mode=True)[brain_name]

# number of agents in the environment
print('Number of agents:', len(env_info.agents))

# number of actions
action_size = brain.vector_action_space_size
print('Number of actions:', action_size)

# examine the state space 
state = env_info.vector_observations[0]
print('States look like:', state)
state_size = len(state)
print('States have length:', state_size)

Number of agents: 1
Number of actions: 4
States look like: [1.         0.         0.         0.         0.84408134 0.
 0.         1.         0.         0.0748472  0.         1.
 0.         0.         0.25755    1.         0.         0.
 0.         0.74177343 0.         1.         0.         0.
 0.25854847 0.         0.         1.         0.         0.09355672
 0.         1.         0.         0.         0.31969345 0.
 0.        ]
States have length: 37


### Suggestions for future work.
#### The impletmentation of Prioritized Experience Replay, Noisy DQN, or distributional DQN...


In [5]:
#Here is Double dueling with prioritized experience replay
path=os.getcwd()
path_Prioritized_DQN=path
sys.path.append(path_Prioritized_DQN)

import torch
import numpy as np


from agent import Agent
import util

from options import options
options = options()


opts = options.parse()
for arg in vars(opts):
    print( arg, getattr(opts, arg))
agent = Agent(state_size,action_size , opts=opts, seed=0)


****************************
Loading Dueling Q Learning PyTorch Model
****************************
****************************
Loading Dueling Q Learning Util
****************************
****************************
Loading Dueling Q Learning Options
****************************
batch 64
memory_size 1000000
update_freq 64
lr 0.0001
discount_rate 0.9
transfer_rate 0.001
env Unity_Banana
env_seed 0
num_episodes 3000
max_iteration 1000
min_epsilon 0.1
decay 0.995
win_cond 13
render True
f C:\Users\jonathanoh\AppData\Roaming\jupyter\runtime\kernel-e608cf32-e19b-438d-a402-4cb164434a7c.json
tutu


In [6]:

def Prioritized_DQN(num_episodes = opts.num_episodes, max_iteration = opts.max_iteration, init_epsilon = 1.0, min_epsilon = opts.min_epsilon, decay = opts.decay):
    '''
    :param num_episodes:
    :param max_iteration:
    :param init_epsilon:
    :param min_epsilon:
    :param decay:
    :return:
    '''

    total_reward = []
    total_reward_window = deque(maxlen=100)
    epsilon = init_epsilon

    for i in range(num_episodes):
        rewards = 0
        env_info = env.reset(train_mode=True)[brain_name] # reset the environment
        state = env_info.vector_observations[0]            # get the current state
        toto=0
        for k in range(max_iteration):

            action = (agent.act(state, epsilon)).astype(int)
            env_info = env.step(action)[brain_name]
            next_state = env_info.vector_observations[0]
            reward = env_info.rewards[0] 
            done = env_info.local_done[0]
            #print(toto)
            #toto+=1
            agent.step(state, action, reward, next_state, done)

            state = next_state
            rewards += reward
            if done:
                break

        total_reward_window.append(rewards)
        total_reward.append(rewards)

        epsilon = max(min_epsilon, epsilon * decay)

        print('\rEpisode {}\tAverage Score: {:.2f}'.format(i, np.mean(total_reward_window)), end="")
        if i % 100 == 0:
            print('\rEpisode {}\tAverage Score: {:.2f}'.format(i, np.mean(total_reward_window)))

        if np.mean(total_reward_window) >= opts.win_cond:
            print('\nEnvironment solved in {:d} episodes!\tAverage Score: {:.2f}'.format(i ,
                                                                                         np.mean(total_reward_window)))
            torch.save(agent.local_model.state_dict(), path_Prioritized_DQN+'\\'+'checkpoint.pth')
            break

    torch.save(agent.local_model.state_dict(), path_Prioritized_DQN+'\\'+'checkpoint_end.pth')
    return total_reward

scores_Prioritized_dqn = Prioritized_DQN()

# plot the scores
fig = plt.figure()
ax = fig.add_subplot(111)
plt.plot(np.arange(len(scores_Prioritized_dqn)), scores_Prioritized_dqn)
plt.ylabel('Score')
plt.xlabel('Episode #')
plt.show()


write 0
write 1
write 2
write 3
write 4
write 5
write 6
write 7
write 8
write 9
write 10
write 11
write 12
write 13
write 14
write 15
write 16
write 17
write 18
write 19
write 20
write 21
write 22
write 23
write 24
write 25
write 26
write 27
write 28
write 29
write 30
write 31
write 32
write 33
write 34
write 35
write 36
write 37
write 38
write 39
write 40
write 41
write 42
write 43
write 44
write 45
write 46
write 47
write 48
write 49
write 50
write 51
write 52
write 53
write 54
write 55
write 56
write 57
write 58
write 59
write 60
write 61
write 62
write 63
[999999, 1000000, 1000001, 1000002, 1000003, 1000004, 1000005, 1000006, 1000007, 1000008, 1000009, 1000010, 1000011, 1000012, 1000013, 1000014, 1000015, 1000016, 1000017, 1000018, 1000019, 1000020, 1000021, 1000022, 1000023, 1000024, 1000025, 1000026, 1000027, 1000028, 1000029, 1000030, 1000031, 1000032, 1000033, 1000034, 1000035, 1000036, 1000037, 1000038, 1000039, 1000040, 1000041, 1000042, 1000043, 1000044, 1000045, 1000046, 10

ERROR:root:
UnicodeDecodeError while processing traceback.



ValueError: shape mismatch: value array of shape (64,1) could not be broadcast to indexing result of shape (64,)

In [None]:
import pandas as pd
PIK = "scores_vanilla_dqn.dat"
with open(PIK, "rb") as f:
    data= pickle.load(f)
    scores_vanilla_dqn=pd.Series(np.array(data))
PIK = "scores_double_dqn.dat"
with open(PIK, "rb") as f:
    data= pickle.load(f)
    scores_double_dqn=pd.Series(np.array(data))
PIK = "scores_dueling_dqn.dat"
with open(PIK, "rb") as f:
    data= pickle.load(f)
    scores_dueling_dqn=pd.Series(np.array(data))

fig = plt.figure()
ax = fig.add_subplot(111)
plt.plot( scores_vanilla_dqn.rolling(100).mean(), label='Vanilla')
plt.plot( scores_double_dqn.rolling(100).mean(), label='Double')
plt.plot( scores_dueling_dqn.rolling(100).mean(), label='Dueling')
plt.ylabel('Score')
plt.xlabel('Episode #')
plt.legend()
plt.show()