# Navigation

---

This notebook walks through the process of setting up the Unity Environment training the agent and observing the trained agent. 

### 1. Start the Environment

We begin by importing some necessary packages.  If the code cell below returns an error, please revisit the project instructions to double-check that you have installed [Unity ML-Agents](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Installation.md) and [NumPy](http://www.numpy.org/).

For this code you can either install Unity or download the prebuilt environments.
The environment has not been comitted to this repo given the size but can be obtained from the folllowing loctaions:

* [Linux](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/Banana_Linux.zip)
* [Mac OSX](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/Banana.app.zip)
* [Windows (32-bit)](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/Banana_Windows_x86.zip)
* [Windows (64-bit)](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/Banana_Windows_x86_64.zip)

It is also assumed that a python conda environment has been setup of the Jupyter kernal is pointing to it. See the readme file for more details on this process.

In [1]:
from unityagents import UnityEnvironment
import numpy as np

Next, start the environment. Given your os you will have to point to the correct file.

- **Mac**: `"path/to/Banana.app"`
- **Windows** (x86): `"path/to/Banana_Windows_x86/Banana.exe"`
- **Windows** (x86_64): `"path/to/Banana_Windows_x86_64/Banana.exe"`
- **Linux** (x86): `"path/to/Banana_Linux/Banana.x86"`
- **Linux** (x86_64): `"path/to/Banana_Linux/Banana.x86_64"`
- **Linux** (x86, headless): `"path/to/Banana_Linux_NoVis/Banana.x86"`
- **Linux** (x86_64, headless): `"path/to/Banana_Linux_NoVis/Banana.x86_64"`

In [2]:
env = UnityEnvironment(file_name="../Banana.app")

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		
Unity brain name: BananaBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 37
        Number of stacked Vector Observation: 1
        Vector Action space type: discrete
        Vector Action space size (per agent): 4
        Vector Action descriptions: , , , 


Environments contain **_brains_** which are responsible for deciding the actions of their associated agents. Here we check for the first brain available, and set it as the default brain we will be controlling from Python.

In [3]:
# get the default brain
brain_name = env.brain_names[0]
brain = env.brains[brain_name]

### 2. Examine the State and Action Spaces

The simulation contains a single agent that navigates a large environment.  At each time step, it has four actions at its disposal:
- `0` - walk forward 
- `1` - walk backward
- `2` - turn left
- `3` - turn right

The state space has `37` dimensions and contains the agent's velocity, along with ray-based perception of objects around agent's forward direction.  A reward of `+1` is provided for collecting a yellow banana, and a reward of `-1` is provided for collecting a blue banana. 

Run the code cell below to print some information about the environment.

In [4]:
# reset the environment
env_info = env.reset(train_mode=True)[brain_name]

# number of agents in the environment
print('Number of agents:', len(env_info.agents))

# number of actions
action_size = brain.vector_action_space_size
print('Number of actions:', action_size)

# examine the state space 
state = env_info.vector_observations[0]
print('States look like:', state)
state_size = len(state)
print('States have length:', state_size)

Number of agents: 1
Number of actions: 4
States look like: [1.         0.         0.         0.         0.84408134 0.
 0.         1.         0.         0.0748472  0.         1.
 0.         0.         0.25755    1.         0.         0.
 0.         0.74177343 0.         1.         0.         0.
 0.25854847 0.         0.         1.         0.         0.09355672
 0.         1.         0.         0.         0.31969345 0.
 0.        ]
States have length: 37


### 3. Train The Agent

In the next code cell the agent can be trained.

Once this cell is executed, the agent will start exploring the environment. 
A window will be displayed that allows you to observe the agent as it moves through the environment.  
For training it is recommended that you disable the UI.

The training runs for 1500 episodes.
If the agent achieves an average score of over 13 the local qnetworks weights are saved to the `checkpoint.pth` file.
This file currently has the pre-trained weights.


This section can be skipped if you instead would prefer to look at the trained agent.
The trained network weights have been saved in the `checkpoint.pth` file.
The file `Training.py` is provided and can be run via terminal to train independently of Jupyter notebook.

In [6]:
from collections import deque
from Agent import Agent

MAX_T = 277
EPS_START = 1.0
EPS_END = 0.01
EPS_DECAY = 0.995

agent = Agent(state_size,action_size,seed=0)
scores_window = deque(maxlen=100)
scores = []

eps = EPS_START
success_score = 13
for i_episode in range(1,1500):
    env_info = env.reset(train_mode=True)[brain_name]
    state = env_info.vector_observations[0]
    score = 0
    for t in range(MAX_T):
        action = agent.get_action(state,eps)
        env_info = env.step(action)[brain_name]
        next_state = env_info.vector_observations[0]
        reward = env_info.rewards[0]
        done = env_info.local_done[0]
        agent.step(state,action,reward,next_state,done)
        score += reward
        state = next_state
        if done:
            break
    eps = max(EPS_END,EPS_DECAY*eps)
    scores_window.append(score)
    scores.append(score)

    if i_episode % 100 == 0:
        print('\rEpisode {} \tAverage Score: {:.2f} \tEpsilon: {:.5f}'.format(i_episode, np.mean(scores_window),eps))

    if np.mean(scores_window) >= success_score:
        print('Environment solved in {:d} episodes. Average Score: {:.2f} Saving model parameters.'.format(i_episode-100,np.mean(scores_window)))
        torch.save(agent.local_qnetwork.state_dict(), 'checkpoint.pth')
        success_score+=1

KeyboardInterrupt: 

### 4. Running The Trained Agent

Now that the agent is trained we can load the weights from the `checkpoint.pth` file and observe it as it navigates its environment.

This code assumes that you have not changed the architecture of the qnetworks. 
If the architecture has been changes a new `checkpoint.pth` will have to be generated by training under this new architecture.

```python
env_info = env.reset(train_mode=False)[brain_name]
```

In [12]:
from Agent import Agent
import torch

MAX_T = 277

agent = Agent(state_size,action_size,seed=0)
agent.local_qnetwork.load_state_dict(torch.load('checkpoint.pth'))

env_info = env.reset(train_mode=False)[brain_name] # reset the environment
state = env_info.vector_observations[0]
score = 0
eps = 0.01
for t in range(MAX_T):
    action = agent.get_action(state,eps)
    env_info = env.step(action)[brain_name]
    next_state = env_info.vector_observations[0]
    reward = env_info.rewards[0]
    done = env_info.local_done[0]
    agent.step(state,action,reward,next_state,done)
    score += reward
    state = next_state
    if done:
        break
print('\rThe agent received a score of {:.2f}'.format(score))

The agent received a score of 8.00


When you are finised you can close the environment.

In [13]:
env.close()