# Navigation

---

In this notebook, you will find a solution for the first project of the [Deep Reinforcement Learning Nanodegree](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893). This project involves the use of the Unity ML-Agents environment.

Firstly, you need to choose how you want to run this notebook:

- Set 'run_mode' to "train" if you want to train a new agent from scratch.
- Alternatively, set 'run_mode' to "test" if you want to evaluate a pre-trained model.

If you prefer not to visualize the agent during the training and testing process, change 'no_rendering' to True.


In [1]:
# Select the 'train' to train an agent from scratch or 'test' to test a saved agent.
run_mode = 'test'  

# Set to True to visualize the agent during evaluation.
no_rendering = True   


### 1. Start the Environment

We begin by importing some necessary packages.

In [2]:
from unityagents import UnityEnvironment
import numpy as np
from collections import deque
import random
import torch
import numpy as np
from dqn_agent import Agent


import matplotlib.pyplot as plt
%matplotlib inline

Next, we're going to start the environment! **_Before you run the code cell below_**, ensure the `file_name` parameter matches the location of the Unity environment that you downloaded.

- **Mac**: `"path/to/Banana.app"`
- **Windows** (x86): `"path/to/Banana_Windows_x86/Banana.exe"`
- **Windows** (x86_64): `"path/to/Banana_Windows_x86_64/Banana.exe"`
- **Linux** (x86): `"path/to/Banana_Linux/Banana.x86"`
- **Linux** (x86_64): `"path/to/Banana_Linux/Banana.x86_64"`
- **Linux** (x86, headless): `"path/to/Banana_Linux_NoVis/Banana.x86"`
- **Linux** (x86_64, headless): `"path/to/Banana_Linux_NoVis/Banana.x86_64"`

Since I'm running this on Windows 10 64-bit, and the environment is located in "./Banana_Windows_x86_64/", I'm going to set my environment like so:

```python
env = UnityEnvironment(file_name="./Banana_Windows_x86_64/banana.exe")


In [3]:
env = UnityEnvironment(file_name="./Banana_Windows_x86_64/banana.exe", no_graphics=no_rendering)

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		
Unity brain name: BananaBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 37
        Number of stacked Vector Observation: 1
        Vector Action space type: discrete
        Vector Action space size (per agent): 4
        Vector Action descriptions: , , , 


Environments contain **_brains_** which are responsible for deciding the actions of their associated agents. Here we check for the first brain available, and set it as the default brain we will be controlling from Python.

In [4]:
# get the default brain
brain_name = env.brain_names[0]
brain = env.brains[brain_name]

### 2. Examine the State and Action Spaces

The simulation contains a single agent that navigates a large environment.  At each time step, it has four actions at its disposal:
- `0` - walk forward 
- `1` - walk backward
- `2` - turn left
- `3` - turn right

The state space has `37` dimensions and contains the agent's velocity, along with ray-based perception of objects around agent's forward direction.  A reward of `+1` is provided for collecting a yellow banana, and a reward of `-1` is provided for collecting a blue banana. 

Run the code cell below to print some information about the environment.

In [5]:
# reset the environment
env_info = env.reset(train_mode=False)[brain_name]

# number of agents in the environment
print('Number of agents:', len(env_info.agents))

# number of actions
action_size = brain.vector_action_space_size
print('Number of actions:', action_size)

# examine the state space 
state = env_info.vector_observations[0]
print('States look like:', state)
state_size = len(state)
print('States have length:', state_size)

Number of agents: 1
Number of actions: 4
States look like: [1.         0.         0.         0.         0.84408134 0.
 0.         1.         0.         0.0748472  0.         1.
 0.         0.         0.25755    1.         0.         0.
 0.         0.74177343 0.         1.         0.         0.
 0.25854847 0.         0.         1.         0.         0.09355672
 0.         1.         0.         0.         0.31969345 0.
 0.        ]
States have length: 37


### 4. Creating a Smart Agent

In this section, we will instantiate the Deep Q-Learning Agent, defined in `dqn_agent.py`. We only need to specify the state size, the action size, and a seed for generating random numbers.


In [6]:
from dqn_agent import Agent

agent = Agent(state_size=state_size, action_size=action_size, seed=0)

### 5. Training Loop

In this step, we will train our agent using the 'dqn' function. The training will run for 'n_episodes' or until the agent achieves an average score of 15 or higher over a span of 100 episodes. After training, a graph is generated to provide a visual representation of the agent's performance across the episodes, illustrating how the score has evolved over the course of training.




In [7]:
if run_mode == 'train':
    def dqn(n_episodes=2000, max_t=1000, eps_start=1.0, eps_end=0.01, eps_decay=0.995):
        """Deep Q-Learning.

        Params
        ======
            n_episodes (int): maximum number of training episodes
            max_t (int): maximum number of timesteps per episode
            eps_start (float): starting value of epsilon, for epsilon-greedy action selection
            eps_end (float): minimum value of epsilon
            eps_decay (float): multiplicative factor (per episode) for decreasing epsilon
        """
        scores = []                        # list containing scores from each episode
        scores_window = deque(maxlen=100)  # last 100 scores
        eps = eps_start                    # initialize epsilon
        for i_episode in range(1, n_episodes+1):
            env_info = env.reset(train_mode=True)[brain_name] # reset the environment
            state = env_info.vector_observations[0]            # get the current state

            score = 0
            for t in range(max_t):
                action = agent.act(state, eps)
                action = int(action)
                # print(action)
                # print(type(action))
                # next_state, reward, done, _ = env.step(action)
                env_info = env.step(action)[brain_name]        # send the action to the environment
                next_state = env_info.vector_observations[0]   # get the next state
                reward = env_info.rewards[0]                   # get the reward
                done = env_info.local_done[0]                  # see if episode has finished

                agent.step(state, action, reward, next_state, done)
                state = next_state
                score += reward
                if done:
                    break 
            scores_window.append(score)       # save most recent score
            scores.append(score)              # save most recent score
            eps = max(eps_end, eps_decay*eps) # decrease epsilon
            print('\rEpisode {}\tAverage Score: {:.2f}'.format(i_episode, np.mean(scores_window)), end="")
            if i_episode % 10 == 0:
                print('\rEpisode {}\tAverage Score: {:.2f}'.format(i_episode, np.mean(scores_window)))
            if np.mean(scores_window)>=15:
                print('\nEnvironment solved in {:d} episodes!\tAverage Score: {:.2f}'.format(i_episode-100, np.mean(scores_window)))
                torch.save(agent.qnetwork_local.state_dict(), 'checkpoint.pth')
                break
        return scores

    scores = dqn()

    # plot the scores
    fig = plt.figure()
    ax = fig.add_subplot(111)
    plt.plot(np.arange(len(scores)), scores)
    plt.ylabel('Score')
    plt.xlabel('Episode #')
    plt.show()

else:
    # Load the weights
    agent.qnetwork_local.load_state_dict(torch.load('checkpoint.pth'))


![Learning curve](./scores.png)


### 6. Testing

In this section, we evaluate the performance of the trained model over 100 episodes. Our agent is considered successful if it achieves an average score of 13 or higher. If the average score falls below this threshold, it indicates that the agent requires further training or parameter tuning.


In [9]:
n_episodes = 100  
episodes_score = [] 
for i_episode in range(1, n_episodes+1):
    env_info = env.reset(train_mode=False)[brain_name] # reset the environment
    state = env_info.vector_observations[0]            # get the current state   
    score = 0                                   # initialize the score
    
    while True:
        eps=0.01
        action = agent.act(state, eps)
        action = int(action)
        env_info = env.step(action)[brain_name]        # send the action to the environment
        next_state = env_info.vector_observations[0]   # get the next state
        reward = env_info.rewards[0]                   # get the reward
        done = env_info.local_done[0]                  # see if episode has finished
        score += reward                                # update the score
        state = next_state                             # roll over the state to next time step
        if done:                                       # exit loop if episode finished
            break
    episodes_score.append(score)
    print("Episode {} Score: {}".format(i_episode, score))

score_avg = sum(episodes_score) / len(episodes_score)
if score_avg > 13:
    print("Smart Agent PASSED :) Average score = ", score_avg)
else:
    print("Smart Agent FAILED :( Average score = ", score_avg)

Episode 1 Score: 22.0
Episode 2 Score: 17.0
Episode 3 Score: 15.0
Episode 4 Score: 18.0
Episode 5 Score: 20.0
Episode 6 Score: 19.0
Episode 7 Score: 20.0
Episode 8 Score: 15.0
Episode 9 Score: 0.0
Episode 10 Score: 21.0
Episode 11 Score: 18.0
Episode 12 Score: 23.0
Episode 13 Score: 17.0
Episode 14 Score: 17.0
Episode 15 Score: 19.0
Episode 16 Score: 14.0
Episode 17 Score: 19.0
Episode 18 Score: 17.0
Episode 19 Score: 17.0
Episode 20 Score: 15.0
Episode 21 Score: 15.0
Episode 22 Score: 10.0
Episode 23 Score: 19.0
Episode 24 Score: 24.0
Episode 25 Score: 11.0
Episode 26 Score: 16.0
Episode 27 Score: 19.0
Episode 28 Score: 19.0
Episode 29 Score: 18.0
Episode 30 Score: 18.0
Episode 31 Score: 23.0
Episode 32 Score: 10.0
Episode 33 Score: 18.0
Episode 34 Score: 19.0
Episode 35 Score: 13.0
Episode 36 Score: 2.0
Episode 37 Score: 12.0
Episode 38 Score: 20.0
Episode 39 Score: 18.0
Episode 40 Score: 4.0
Episode 41 Score: 20.0
Episode 42 Score: 13.0
Episode 43 Score: 5.0
Episode 44 Score: 18.0
E

In [10]:
env.close()