# Navigation

---

You are welcome to use this coding environment to train your agent for the project.  Follow the instructions below to get started!

### 1. Start the Environment

Please run the next code cell without making any changes.

In [2]:
from unityagents import UnityEnvironment
import numpy as np
from collections import deque
import torch

# modify file_name value.
env = UnityEnvironment(file_name=r"D:\deep-reinforcement-learning\p1_navigation\Banana_Windows_x86_64\Banana.exe")

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		
Unity brain name: BananaBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 37
        Number of stacked Vector Observation: 1
        Vector Action space type: discrete
        Vector Action space size (per agent): 4
        Vector Action descriptions: , , , 


Environments contain **_brains_** which are responsible for deciding the actions of their associated agents. Here we check for the first brain available, and set it as the default brain we will be controlling from Python.

In [3]:
# get the default brain
brain_name = env.brain_names[0]
brain = env.brains[brain_name]
# create agent brain
from navigation_agent import Agent
agent = Agent(state_size=37, action_size=4, seed=0)

# 2. Setup variables

Run the code cell below to create setup variables.

In [4]:
#setup
n_episodes    = 1000                     # number of training episodes
scores        = []                       # list containing scores from each episode
scores_window = deque(maxlen=100)        # last 100 scores
eps           = 1                        # starting value of epsilon
eps_end       = 0.01                     # minimum value of epsilon
eps_decay     = eps_end**(1/n_episodes)  # decreasing epsilon

### 3. Take Actions in the Environment

In the next code cell, a Python API is used to control the agent and receive feedback from the environment.

Note that **in this coding environment, you will not be able to watch the agent while it is training**, and you should set `train_mode=True` to restart the environment.

In [6]:
for i_episode in range(n_episodes):
    env_info = env.reset(train_mode=True)[brain_name] # reset the environment
    score    = 0                                       # initialize the score
    while True:
        state      = env_info.vector_observations[0] # get the current state
        action     = agent.act(state, eps)           # select an action
        env_info   = env.step(action)[brain_name]    # send the action to the environment
        next_state = env_info.vector_observations[0] # get the next state
        reward     = env_info.rewards[0]             # get the reward
        done       = env_info.local_done[0]          # see if episode has finished
        agent.step(state, action, reward, 
                   next_state, done)                 # Save experience and learn 
        score     += reward                          # update the score
        state      = next_state                      # roll over the state to next time step
        if done:                                     # exit loop if episode finished, done with 300 time stamps
            break
    scores_window.append(score)       # save most recent score
    scores.append(score)              # save most recent score
    eps = max(eps_end, eps_decay*eps) # decrease epsilon
    print('\rEpisode {}\tAverage Score: {:.2f}'.format(i_episode, np.mean(scores_window)), end="")
    if np.mean(scores_window)>=13.0:
        print('\nEnvironment solved in episode {:d}!\tAverage Score: {:.2f}'.format(i_episode, np.mean(scores_window)))
        torch.save(agent.qnetwork_local.state_dict(), 'model.pth')
        break    

Episode 0	Average Score: 13.18
Environment solved in episode 0!	Average Score: 13.18


4. Watch an Agent
In the next code cell, you will load the trained weights from file to watch an agent!

In [10]:
agent.qnetwork_local.load_state_dict(torch.load('model.pth'))
env_info = env.reset(train_mode=False)[brain_name] # reset the environment
score    = 0                                       # initialize the score
while True:
    state      = env_info.vector_observations[0] # get the current state
    action     = agent.act(state, eps)           # select an action
    env_info   = env.step(action)[brain_name]    # send the action to the environment
    next_state = env_info.vector_observations[0] # get the next state
    reward     = env_info.rewards[0]             # get the reward
    done       = env_info.local_done[0]          # see if episode has finished
    agent.step(state, action, reward, 
               next_state, done)                 # Save experience and learn 
    score     += reward                          # update the score
    state      = next_state                      # roll over the state to next time step
    print('\rScore {}'.format(score), end="")
    if done:                                     # exit loop if episode finished, done with 300 time stamps
        break

Score 16.0

When finished, you can close the environment.

In [11]:
env.close()