# Navigation

---

In this notebook, you will learn how to use the Unity ML-Agents environment for to train an agent to collect bananas!

### 1. Start the Environment

We begin by importing some necessary packages.  If the code cell below returns an error, please revisit the project instructions to double-check that you have installed [Unity ML-Agents](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Installation.md) and [NumPy](http://www.numpy.org/).

In [None]:
# Watch for changes
%load_ext autoreload
%autoreload 2

from torch.utils.tensorboard import SummaryWriter
from unityagents import UnityEnvironment
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# Monkey patch missing attributes for newer numpy versions
if not hasattr(np, "float_"):
    np.float_ = np.float64
    
if not hasattr(np, "int_"):
    np.int_ = np.int64

Next, we will start the environment!  **_Before running the code cell below_**, change the `file_name` parameter to match the location of the Unity environment that you downloaded.

- **Mac**: `"path/to/Banana.app"`
- **Windows** (x86): `"path/to/Banana_Windows_x86/Banana.exe"`
- **Windows** (x86_64): `"path/to/Banana_Windows_x86_64/Banana.exe"`
- **Linux** (x86): `"path/to/Banana_Linux/Banana.x86"`
- **Linux** (x86_64): `"path/to/Banana_Linux/Banana.x86_64"`
- **Linux** (x86, headless): `"path/to/Banana_Linux_NoVis/Banana.x86"`
- **Linux** (x86_64, headless): `"path/to/Banana_Linux_NoVis/Banana.x86_64"`

For instance, if you are using a Mac, then you downloaded `Banana.app`.  If this file is in the same folder as the notebook, then the line below should appear as follows:
```
env = UnityEnvironment(file_name="Banana.app")
```

In [2]:
# print current working directory
import os
print(os.getcwd())

/home/oliver/project-showroom/projects/reinforcement-learning/navigation/p1_navigation


In [2]:
env = UnityEnvironment(file_name="Banana_Linux/Banana.x86_64")

Found path: /home/oliver/project-showroom/projects/reinforcement-learning/navigation/p1_navigation/Banana_Linux/Banana.x86_64
Mono path[0] = '/home/oliver/project-showroom/projects/reinforcement-learning/navigation/p1_navigation/Banana_Linux/Banana_Data/Managed'
Mono config path = '/home/oliver/project-showroom/projects/reinforcement-learning/navigation/p1_navigation/Banana_Linux/Banana_Data/MonoBleedingEdge/etc'
Preloaded 'libgrpc_csharp_ext.x64.so'
Unable to preload the following plugins:
	ScreenSelector.so
	libgrpc_csharp_ext.x86.so
	ScreenSelector.so
Logging to /home/oliver/.config/unity3d/Unity Technologies/Unity Environment/Player.log


INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		
Unity brain name: BananaBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 37
        Number of stacked Vector Observation: 1
        Vector Action space type: discrete
        Vector Action space size (per agent): 4
        Vector Action descriptions: , , , 


Environments contain **_brains_** which are responsible for deciding the actions of their associated agents. Here we check for the first brain available, and set it as the default brain we will be controlling from Python.

In [3]:
# get the default brain
brain_name = env.brain_names[0]
brain = env.brains[brain_name]

### 2. Examine the State and Action Spaces

The simulation contains a single agent that navigates a large environment.  At each time step, it has four actions at its disposal:
- `0` - walk forward 
- `1` - walk backward
- `2` - turn left
- `3` - turn right

The state space has `37` dimensions and contains the agent's velocity, along with ray-based perception of objects around agent's forward direction.  A reward of `+1` is provided for collecting a yellow banana, and a reward of `-1` is provided for collecting a blue banana. 

Run the code cell below to print some information about the environment and to initialize **action_size** and **state_size**.

In [None]:
# reset the environment
env_info = env.reset(train_mode=True)[brain_name]

# number of agents in the environment
print('Number of agents:', len(env_info.agents))

# number of actions
action_size = brain.vector_action_space_size
print('Number of actions:', action_size)

# examine the state space 
state = env_info.vector_observations[0]
print('States look like:', state)

state_size = len(state)
print('States have length:', state_size)

Number of agents: 1
Number of actions: 4
States look like: [1.         0.         0.         0.         0.84408134 0.
 0.         1.         0.         0.0748472  0.         1.
 0.         0.         0.25755    1.         0.         0.
 0.         0.74177343 0.         1.         0.         0.
 0.25854847 0.         0.         1.         0.         0.09355672
 0.         1.         0.         0.         0.31969345 0.
 0.        ]
States have length: 37


### 3. Take Random Actions in the Environment

In the next code cell, you will learn how to use the Python API to control the agent and receive feedback from the environment.

Once this cell is executed, you will watch the agent's performance, if it selects an action (uniformly) at random with each time step.  A window should pop up that allows you to observe the agent, as it moves through the environment.  

Of course, as part of the project, you'll have to change the code so that the agent is able to use its experience to gradually choose better actions when interacting with the environment!

In [8]:
env_info = env.reset(train_mode=False)[brain_name] # reset the environment
state = env_info.vector_observations[0]            # get the current state
score = 0                                          # initialize the score
while True:
    action = np.random.randint(action_size)        # select an action
    env_info = env.step(action)[brain_name]        # send the action to the environment
    next_state = env_info.vector_observations[0]   # get the next state
    reward = env_info.rewards[0]                   # get the reward
    done = env_info.local_done[0]                  # see if episode has finished
    score += reward                                # update the score
    state = next_state                             # roll over the state to next time step
    if done:                                       # exit loop if episode finished
        break
    
print("Score: {}".format(score))

Score: -1.0


### 4. Now train the agent in the banana environment!

Now it's your turn to train your own agent to solve the environment!  When training the environment, set `train_mode=True`, so that the line for resetting the environment looks like the following:
```python
env_info = env.reset(train_mode=True)[brain_name]
```

<div style="border-left: 5px solid #2196F3; background-color: #E3F2FD; padding: 10px;">
    <b>ℹ️:</b> The source code for the agent and the neural net model can be found in 'codebase/v2' directory.
</div>


In [None]:
from collections import deque
import torch
from codebase.v2.dqn_agent import Agent
from tqdm.notebook import tqdm
from datetime import datetime

# Generate a timestamped directory name
timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
log_dir = f"runs/banana_experiment_{timestamp}"

# Initialize TensorBoard writer with a unique directory
writer = SummaryWriter(log_dir=log_dir)

agent = Agent(state_size=state_size, action_size=action_size, seed=20)

def dqn(env, n_episodes=2000, max_t=1000, eps_start=1.0, eps_end=0.01, eps_decay=0.99):
    """Deep Q-Learning with tqdm progress bar updates.
    
    Params
    ======
        n_episodes (int): maximum number of training episodes
        max_t (int): maximum number of timesteps per episode
        eps_start (float): starting value of epsilon for epsilon-greedy action selection
        eps_end (float): minimum value of epsilon
        eps_decay (float): multiplicative factor (per episode) for decreasing epsilon
    """
    scores = []                        # list containing scores from each episode
    scores_window = deque(maxlen=100)  # last 100 scores
    eps = eps_start

    for i_episode in range(1, n_episodes+1):
        env_info = env.reset(train_mode=False)[brain_name]  # reset the environment
        state = env_info.vector_observations[0]
        score = 0

        # Use tqdm to create a progress bar for the steps within an episode.
        with tqdm(total=max_t, desc=f"Episode {i_episode}", leave=False) as pbar:
            for _ in range(max_t):
                # Select an action and let the agent record its Q-values.
                action = agent.act(state, eps)
                
                # Access the Q-values after act() is called.                                
                writer.add_histogram("Q-values", agent.last_q_values, global_step=i_episode)
            
                q_values = agent.last_q_values
                
                env_info = env.step(action)[brain_name]  # send the action to the environment
                next_state = env_info.vector_observations[0]  # get the next state
                reward = env_info.rewards[0]                  # get the reward
                done = env_info.local_done[0]                 # check if episode is finished
                score += reward

                agent.step(state, action, reward, next_state, done)

                # Retrieve the loss from the last learning update.
                loss = agent.last_loss if hasattr(agent, "last_loss") else "N/A"

                if loss is not None: # this is a hack to avoid the error when loss is None
                    writer.add_scalar("Loss", agent.last_loss, global_step=i_episode)
                
                state = next_state

                # Update the progress bar's postfix with the current loss and Q-values.
                pbar.set_postfix(loss=loss, q_values=q_values)
                pbar.update(1)

                writer.add_scalar("Episode/Beta", agent.last_beta, i_episode)

                if done:
                    break

        scores_window.append(score)  # save most recent score
        scores.append(score)         # save most recent score
        eps = max(eps_end, eps_decay * eps)  # decrease epsilon

        # Log episode-level statistics.
        writer.add_scalar("Episode/Average_Score", np.mean(scores_window), i_episode)
        writer.add_scalar("Episode/Score", score, i_episode)
        writer.add_scalar("Episode/Epsilon", eps, i_episode)
        
        print('\rEpisode {}\tAverage Score: {:.2f}'.format(i_episode, np.mean(scores_window)), end="")
        if i_episode % 100 == 0:
            print('\rEpisode {}\tAverage Score: {:.2f}'.format(i_episode, np.mean(scores_window)))
        
        # save every 5 episodes
        if i_episode % 5 == 0:
            torch.save(agent.qnetwork_local.state_dict(), 'checkpoint.pth')

        if np.mean(scores_window) >= 13.0: # I am stupid, I forgot to set this to 13.0 (from 200) before training the agent, rrrrrr
            print('\nEnvironment solved in {:d} episodes!\tAverage Score: {:.2f}'
                  .format(i_episode - 100, np.mean(scores_window)))
            
            torch.save(agent.qnetwork_local.state_dict(), 'checkpoint.pth')
            break
    
    writer.close()
    
    return scores

Buffer size: 100000, Batch size: 64, Beta start: 0.4, Beta frames: 100000


In [6]:
scores = dqn(
    env, n_episodes=2000, max_t=300, eps_start=1.0, eps_end=0.01, eps_decay=0.99)

# plot the scores
fig = plt.figure()
ax = fig.add_subplot(111)
plt.plot(np.arange(len(scores)), scores)
plt.ylabel('Score')
plt.xlabel('Episode #')
plt.show()

Episode 267	Average Score: 9.53

Episode 268:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 268	Average Score: 9.52

Episode 269:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 269	Average Score: 9.55

Episode 270:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 270	Average Score: 9.53

Episode 271:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 271	Average Score: 9.65

Episode 272:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 272	Average Score: 9.76

Episode 273:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 273	Average Score: 9.72

Episode 274:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 274	Average Score: 9.77

Episode 275:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 275	Average Score: 9.67

Episode 276:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 276	Average Score: 9.72

Episode 277:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 277	Average Score: 9.75

Episode 278:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 278	Average Score: 9.77

Episode 279:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 279	Average Score: 9.87

Episode 280:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 280	Average Score: 9.87

Episode 281:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 281	Average Score: 9.84

Episode 282:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 282	Average Score: 9.92

Episode 283:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 283	Average Score: 9.94

Episode 284:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 284	Average Score: 9.96

Episode 285:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 285	Average Score: 10.02

Episode 286:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 286	Average Score: 10.11

Episode 287:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 287	Average Score: 10.13

Episode 288:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 288	Average Score: 10.13

Episode 289:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 289	Average Score: 10.20

Episode 290:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 290	Average Score: 10.17

Episode 291:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 291	Average Score: 10.08

Episode 292:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 292	Average Score: 10.05

Episode 293:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 293	Average Score: 9.97

Episode 294:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 294	Average Score: 9.97

Episode 295:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 295	Average Score: 9.94

Episode 296:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 296	Average Score: 9.86

Episode 297:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 297	Average Score: 9.96

Episode 298:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 298	Average Score: 9.96

Episode 299:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 299	Average Score: 9.96

Episode 300:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 300	Average Score: 10.00


Episode 301:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 301	Average Score: 9.98

Episode 302:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 302	Average Score: 10.01

Episode 303:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 303	Average Score: 10.01

Episode 304:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 304	Average Score: 10.08

Episode 305:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 305	Average Score: 10.16

Episode 306:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 306	Average Score: 10.14

Episode 307:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 307	Average Score: 10.14

Episode 308:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 308	Average Score: 10.21

Episode 309:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 309	Average Score: 10.23

Episode 310:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 310	Average Score: 10.18

Episode 311:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 311	Average Score: 10.24

Episode 312:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 312	Average Score: 10.24

Episode 313:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 313	Average Score: 10.29

Episode 314:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 314	Average Score: 10.29

Episode 315:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 315	Average Score: 10.26

Episode 316:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 316	Average Score: 10.40

Episode 317:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 317	Average Score: 10.42

Episode 318:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 318	Average Score: 10.43

Episode 319:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 319	Average Score: 10.47

Episode 320:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 320	Average Score: 10.39

Episode 321:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 321	Average Score: 10.35

Episode 322:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 322	Average Score: 10.39

Episode 323:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 323	Average Score: 10.30

Episode 324:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 324	Average Score: 10.35

Episode 325:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 325	Average Score: 10.38

Episode 326:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 326	Average Score: 10.36

Episode 327:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 327	Average Score: 10.34

Episode 328:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 328	Average Score: 10.42

Episode 329:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 329	Average Score: 10.40

Episode 330:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 330	Average Score: 10.35

Episode 331:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 331	Average Score: 10.33

Episode 332:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 332	Average Score: 10.36

Episode 333:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 333	Average Score: 10.36

Episode 334:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 334	Average Score: 10.34

Episode 335:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 335	Average Score: 10.36

Episode 336:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 336	Average Score: 10.45

Episode 337:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 337	Average Score: 10.41

Episode 338:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 338	Average Score: 10.51

Episode 339:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 339	Average Score: 10.47

Episode 340:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 340	Average Score: 10.42

Episode 341:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 341	Average Score: 10.42

Episode 342:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 342	Average Score: 10.34

Episode 343:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 343	Average Score: 10.30

Episode 344:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 344	Average Score: 10.30

Episode 345:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 345	Average Score: 10.31

Episode 346:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 346	Average Score: 10.25

Episode 347:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 347	Average Score: 10.23

Episode 348:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 348	Average Score: 10.21

Episode 349:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 349	Average Score: 10.22

Episode 350:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 350	Average Score: 10.34

Episode 351:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 351	Average Score: 10.36

Episode 352:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 352	Average Score: 10.31

Episode 353:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 353	Average Score: 10.34

Episode 354:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 354	Average Score: 10.40

Episode 355:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 355	Average Score: 10.49

Episode 356:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 356	Average Score: 10.47

Episode 357:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 357	Average Score: 10.51

Episode 358:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 358	Average Score: 10.59

Episode 359:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 359	Average Score: 10.52

Episode 360:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 360	Average Score: 10.46

Episode 361:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 361	Average Score: 10.55

Episode 362:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 362	Average Score: 10.49

Episode 363:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 363	Average Score: 10.50

Episode 364:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 364	Average Score: 10.48

Episode 365:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 365	Average Score: 10.52

Episode 366:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 366	Average Score: 10.53

Episode 367:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 367	Average Score: 10.54

Episode 368:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 368	Average Score: 10.61

Episode 369:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 369	Average Score: 10.64

Episode 370:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 370	Average Score: 10.71

Episode 371:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 371	Average Score: 10.71

Episode 372:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 372	Average Score: 10.70

Episode 373:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 373	Average Score: 10.76

Episode 374:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 374	Average Score: 10.76

Episode 375:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 375	Average Score: 10.80

Episode 376:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 376	Average Score: 10.85

Episode 377:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 377	Average Score: 10.84

Episode 378:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 378	Average Score: 10.89

Episode 379:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 379	Average Score: 10.81

Episode 380:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 380	Average Score: 10.81

Episode 381:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 381	Average Score: 10.89

Episode 382:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 382	Average Score: 10.83

Episode 383:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 383	Average Score: 10.84

Episode 384:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 384	Average Score: 10.90

Episode 385:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 385	Average Score: 10.90

Episode 386:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 386	Average Score: 10.88

Episode 387:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 387	Average Score: 10.74

Episode 388:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 388	Average Score: 10.79

Episode 389:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 389	Average Score: 10.75

Episode 390:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 390	Average Score: 10.79

Episode 391:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 391	Average Score: 10.91

Episode 392:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 392	Average Score: 11.00

Episode 393:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 393	Average Score: 10.96

Episode 394:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 394	Average Score: 10.92

Episode 395:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 395	Average Score: 11.03

Episode 396:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 396	Average Score: 11.14

Episode 397:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 397	Average Score: 11.15

Episode 398:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 398	Average Score: 11.15

Episode 399:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 399	Average Score: 11.22

Episode 400:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 400	Average Score: 11.36


Episode 401:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 401	Average Score: 11.43

Episode 402:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 402	Average Score: 11.46

Episode 403:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 403	Average Score: 11.56

Episode 404:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 404	Average Score: 11.54

Episode 405:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 405	Average Score: 11.55

Episode 406:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 406	Average Score: 11.53

Episode 407:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 407	Average Score: 11.49

Episode 408:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 408	Average Score: 11.44

Episode 409:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 409	Average Score: 11.45

Episode 410:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 410	Average Score: 11.53

Episode 411:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 411	Average Score: 11.51

Episode 412:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 412	Average Score: 11.55

Episode 413:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 413	Average Score: 11.62

Episode 414:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 414	Average Score: 11.63

Episode 415:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 415	Average Score: 11.71

Episode 416:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 416	Average Score: 11.68

Episode 417:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 417	Average Score: 11.70

Episode 418:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 418	Average Score: 11.78

Episode 419:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 419	Average Score: 11.75

Episode 420:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 420	Average Score: 11.85

Episode 421:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 421	Average Score: 11.95

Episode 422:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 422	Average Score: 11.96

Episode 423:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 423	Average Score: 12.08

Episode 424:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 424	Average Score: 12.04

Episode 425:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 425	Average Score: 12.05

Episode 426:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 426	Average Score: 12.04

Episode 427:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 427	Average Score: 12.13

Episode 428:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 428	Average Score: 12.07

Episode 429:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 429	Average Score: 12.05

Episode 430:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 430	Average Score: 12.17

Episode 431:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 431	Average Score: 12.22

Episode 432:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 432	Average Score: 12.12

Episode 433:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 433	Average Score: 12.14

Episode 434:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 434	Average Score: 12.25

Episode 435:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 435	Average Score: 12.30

Episode 436:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 436	Average Score: 12.31

Episode 437:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 437	Average Score: 12.42

Episode 438:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 438	Average Score: 12.41

Episode 439:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 439	Average Score: 12.45

Episode 440:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 440	Average Score: 12.48

Episode 441:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 441	Average Score: 12.58

Episode 442:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 442	Average Score: 12.71

Episode 443:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 443	Average Score: 12.77

Episode 444:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 444	Average Score: 12.78

Episode 445:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 445	Average Score: 12.91

Episode 446:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 446	Average Score: 12.91

Episode 447:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 447	Average Score: 12.99

Episode 448:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 448	Average Score: 13.09

Episode 449:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 449	Average Score: 13.14

Episode 450:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 450	Average Score: 13.14

Episode 451:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 451	Average Score: 13.07

Episode 452:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 452	Average Score: 13.13

Episode 453:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 453	Average Score: 13.20

Episode 454:   0%|          | 0/300 [00:00<?, ?it/s]

Episode 454	Average Score: 13.18

Episode 455:   0%|          | 0/300 [00:00<?, ?it/s]

KeyboardInterrupt: 

I am an idiot. I forgot to adapt the reward check in my training (taken over from my lunarlander code) from 200 to only 13. Thus I had to terminate the training by myself. Let's hope the weights are properly saved.

But here I insert some **metrics** I recorded with **Tensorboard** to document the training progress:

Development of Average Score / Episode:
![Episode / Average Score](screenshots/Episode_Average_Score.png)
---
Development of Epsilon:
![Episode / Epsilon Value](screenshots/Episode_Epsilon_Score.png)
---
Loss of the Q-Model:
![Loss / Q-Model](screenshots/Loss_QModel.png)
---
Development of Q-Values
![Q Values growing](screenshots/QValues_orange.png)
---

**Explanation**: The bluish curves were from a previous not successful training run. It crashed after several hours (the unity environment stopped responding).

## Let's get our trained agent into the wilds!
Ok, now let's check the weights and if our trained agent can collect enough bananas:

In [None]:
from codebase.v2.dqn_agent import Agent
import torch

# environment is startet some cells above
env_info = env.reset(train_mode=False)[brain_name] # reset the environment
state = env_info.vector_observations[0]            # get the current state
score = 0                                          # initialize the score

bananaCollector = Agent(state_size=state_size, action_size=action_size, seed=20)
bananaCollector.qnetwork_local.load_state_dict(torch.load('checkpoint.pth', weights_only=True))

while True:
    action = bananaCollector.act(state)            # select an action
    env_info = env.step(action)[brain_name]        # send the action to the environment
    next_state = env_info.vector_observations[0]   # get the next state
    reward = env_info.rewards[0]                   # get the reward
    done = env_info.local_done[0]                  # see if episode has finished
    score += reward                                # update the score
    state = next_state                             # roll over the state to next time step
    if done:                                       # exit loop if episode finished
        break
    
print("Score: {}".format(score))

Buffer size: 100000, Batch size: 64, Beta start: 0.4, Beta frames: 100000
Agent's Device: cuda:0
Score: 20.0


Results: **Wow, wow, wow!!** <b>20</b> is a nice result. Training payed off ;-)