# Navigation

---

In this notebook, you will learn how to use the Unity ML-Agents environment for the first project of the [Deep Reinforcement Learning Nanodegree](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893).

### 1. Start the Environment

We begin by importing some necessary packages.  If the code cell below returns an error, please revisit the project instructions to double-check that you have installed [Unity ML-Agents](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Installation.md) and [NumPy](http://www.numpy.org/).

In [1]:
from unityagents import UnityEnvironment
import numpy as np

Next, we will start the environment!  **_Before running the code cell below_**, change the `file_name` parameter to match the location of the Unity environment that you downloaded.

- **Mac**: `"path/to/Banana.app"`
- **Windows** (x86): `"path/to/Banana_Windows_x86/Banana.exe"`
- **Windows** (x86_64): `"path/to/Banana_Windows_x86_64/Banana.exe"`
- **Linux** (x86): `"path/to/Banana_Linux/Banana.x86"`
- **Linux** (x86_64): `"path/to/Banana_Linux/Banana.x86_64"`
- **Linux** (x86, headless): `"path/to/Banana_Linux_NoVis/Banana.x86"`
- **Linux** (x86_64, headless): `"path/to/Banana_Linux_NoVis/Banana.x86_64"`

For instance, if you are using a Mac, then you downloaded `Banana.app`.  If this file is in the same folder as the notebook, then the line below should appear as follows:
```
env = UnityEnvironment(file_name="Banana.app")
```

In [2]:
env = UnityEnvironment(file_name="/home/koyal-il/deep-reinforcement-learning/p1_navigation/Banana_Linux/Banana.x86")

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		
Unity brain name: BananaBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 37
        Number of stacked Vector Observation: 1
        Vector Action space type: discrete
        Vector Action space size (per agent): 4
        Vector Action descriptions: , , , 


Environments contain **_brains_** which are responsible for deciding the actions of their associated agents. Here we check for the first brain available, and set it as the default brain we will be controlling from Python.

In [3]:
# get the default brain
brain_name = env.brain_names[0]
brain = env.brains[brain_name]

### 2. Examine the State and Action Spaces

The simulation contains a single agent that navigates a large environment.  At each time step, it has four actions at its disposal:
- `0` - walk forward 
- `1` - walk backward
- `2` - turn left
- `3` - turn right

The state space has `37` dimensions and contains the agent's velocity, along with ray-based perception of objects around agent's forward direction.  A reward of `+1` is provided for collecting a yellow banana, and a reward of `-1` is provided for collecting a blue banana. 

Run the code cell below to print some information about the environment.

In [4]:
# reset the environment
env_info = env.reset(train_mode=True)[brain_name]

# number of agents in the environment
print('Number of agents:', len(env_info.agents))

# number of actions
action_size = brain.vector_action_space_size
print('Number of actions:', action_size)

# examine the state space 
state = env_info.vector_observations[0]
print('States look like:', state)
state_size = len(state)
print('States have length:', state_size)

Number of agents: 1
Number of actions: 4
States look like: [1.         0.         0.         0.         0.84408134 0.
 0.         1.         0.         0.0748472  0.         1.
 0.         0.         0.25755    1.         0.         0.
 0.         0.74177343 0.         1.         0.         0.
 0.25854847 0.         0.         1.         0.         0.09355672
 0.         1.         0.         0.         0.31969345 0.
 0.        ]
States have length: 37


### 3. Take Random Actions in the Environment

In the next code cell, you will learn how to use the Python API to control the agent and receive feedback from the environment.

Once this cell is executed, you will watch the agent's performance, if it selects an action (uniformly) at random with each time step.  A window should pop up that allows you to observe the agent, as it moves through the environment.  

Of course, as part of the project, you'll have to change the code so that the agent is able to use its experience to gradually choose better actions when interacting with the environment!

In [5]:
env_info = env.reset(train_mode=False)[brain_name] # reset the environment
state = env_info.vector_observations[0]            # get the current state
score = 0                                          # initialize the score
i=0
while True:
    i=i+1
    action = np.random.randint(action_size)        # select an action
    env_info = env.step(action)[brain_name]        # send the action to the environment
    next_state = env_info.vector_observations[0]   # get the next state
    reward = env_info.rewards[0]                   # get the reward
    done = env_info.local_done[0]                  # see if episode has finished
    score += reward                                # update the score
    state = next_state                             # roll over the state to next time step
    print('reward:', reward)
    print('score:', score)
    if done:                                       # exit loop if episode finished
        break
print("i",i)    
print("Score: {}".format(score))

reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0
score: 0.0
reward: 0.0

In [6]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class QNetwork(nn.Module):
    """Actor (Policy) Model."""

    def __init__(self, state_size, action_size, seed, fc1_units=64, fc2_units=64):
        """Initialize parameters and build model.
        Params
        ======
            state_size (int): Dimension of each state
            action_size (int): Dimension of each action
            seed (int): Random seed
            fc1_units (int): Number of nodes in first hidden layer
            fc2_units (int): Number of nodes in second hidden layer
        """
        super(QNetwork, self).__init__()
        self.seed = torch.manual_seed(seed)
        self.fc1 = nn.Linear(state_size, fc1_units)
        self.fc2 = nn.Linear(fc1_units, fc2_units)
        self.fc3 = nn.Linear(fc2_units, action_size)

    def forward(self, state):
        """Build a network that maps state -> action values."""
        x = F.relu(self.fc1(state))
        x = F.relu(self.fc2(x))
        return self.fc3(x)


In [7]:
import numpy as np
import random
from collections import namedtuple, deque

# from model import QNetwork

import torch
import torch.nn.functional as F
import torch.optim as optim

BUFFER_SIZE = int(1e5)  # replay buffer size
BATCH_SIZE = 64         # minibatch size
GAMMA = 0.99            # discount factor
TAU = 1e-3              # for soft update of target parameters
LR = 5e-4               # learning rate 
UPDATE_EVERY = 4        # how often to update the network

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

class Agent():
    """Interacts with and learns from the environment."""

    def __init__(self, state_size, action_size, seed):
        """Initialize an Agent object.
        
        Params
        ======
            state_size (int): dimension of each state
            action_size (int): dimension of each action
            seed (int): random seed
        """
        self.state_size = state_size
        self.action_size = action_size
        self.seed = random.seed(seed)

        # Q-Network
        self.qnetwork_local = QNetwork(state_size, action_size, seed).to(device)
        self.qnetwork_target = QNetwork(state_size, action_size, seed).to(device)
        self.optimizer = optim.Adam(self.qnetwork_local.parameters(), lr=LR)

        # Replay memory
        self.memory = ReplayBuffer(action_size, BUFFER_SIZE, BATCH_SIZE, seed)
        # Initialize time step (for updating every UPDATE_EVERY steps)
        self.t_step = 0
    
    def step(self, state, action, reward, next_state, done):
        # Save experience in replay memory
        self.memory.add(state, action, reward, next_state, done)
        
        # Learn every UPDATE_EVERY time steps.
        self.t_step = (self.t_step + 1) % UPDATE_EVERY
        if self.t_step == 0:
            # If enough samples are available in memory, get random subset and learn
            if len(self.memory) > BATCH_SIZE:
                experiences = self.memory.sample()
                self.learn(experiences, GAMMA)

    def act(self, state, eps=0.):
        """Returns actions for given state as per current policy.
        
        Params
        ======
            state (array_like): current state
            eps (float): epsilon, for epsilon-greedy action selection
        """
        state = torch.from_numpy(state).float().unsqueeze(0).to(device)
        self.qnetwork_local.eval()
        with torch.no_grad():
            action_values = self.qnetwork_local(state)
        self.qnetwork_local.train()

        # Epsilon-greedy action selection
        if random.random() > eps:
            return np.argmax(action_values.cpu().data.numpy())
        else:
            return random.choice(np.arange(self.action_size))

    def learn(self, experiences, gamma):
        """Update value parameters using given batch of experience tuples.

        Params
        ======
            experiences (Tuple[torch.Variable]): tuple of (s, a, r, s', done) tuples 
            gamma (float): discount factor
        """
        states, actions, rewards, next_states, dones = experiences

        # Get max predicted Q values (for next states) from target model
        Q_targets_next = self.qnetwork_target(next_states).detach().max(1)[0].unsqueeze(1)
        # Compute Q targets for current states 
        Q_targets = rewards + (gamma * Q_targets_next * (1 - dones))

        # Get expected Q values from local model
        Q_expected = self.qnetwork_local(states).gather(1, actions)

        # Compute loss
        loss = F.mse_loss(Q_expected, Q_targets)
        # Minimize the loss
        self.optimizer.zero_grad()
        loss.backward()
        self.optimizer.step()

        # ------------------- update target network ------------------- #
        self.soft_update(self.qnetwork_local, self.qnetwork_target, TAU)                     

    def soft_update(self, local_model, target_model, tau):
        """Soft update model parameters.
        θ_target = τ*θ_local + (1 - τ)*θ_target

        Params
        ======
            local_model (PyTorch model): weights will be copied from
            target_model (PyTorch model): weights will be copied to
            tau (float): interpolation parameter 
        """
        for target_param, local_param in zip(target_model.parameters(), local_model.parameters()):
            target_param.data.copy_(tau*local_param.data + (1.0-tau)*target_param.data)


class ReplayBuffer:
    """Fixed-size buffer to store experience tuples."""

    def __init__(self, action_size, buffer_size, batch_size, seed):
        """Initialize a ReplayBuffer object.

        Params
        ======
            action_size (int): dimension of each action
            buffer_size (int): maximum size of buffer
            batch_size (int): size of each training batch
            seed (int): random seed
        """
        self.action_size = action_size
        self.memory = deque(maxlen=buffer_size)  
        self.batch_size = batch_size
        self.experience = namedtuple("Experience", field_names=["state", "action", "reward", "next_state", "done"])
        self.seed = random.seed(seed)
    
    def add(self, state, action, reward, next_state, done):
        """Add a new experience to memory."""
        e = self.experience(state, action, reward, next_state, done)
        self.memory.append(e)
    
    def sample(self):
        """Randomly sample a batch of experiences from memory."""
        experiences = random.sample(self.memory, k=self.batch_size)

        states = torch.from_numpy(np.vstack([e.state for e in experiences if e is not None])).float().to(device)
        actions = torch.from_numpy(np.vstack([e.action for e in experiences if e is not None])).long().to(device)
        rewards = torch.from_numpy(np.vstack([e.reward for e in experiences if e is not None])).float().to(device)
        next_states = torch.from_numpy(np.vstack([e.next_state for e in experiences if e is not None])).float().to(device)
        dones = torch.from_numpy(np.vstack([e.done for e in experiences if e is not None]).astype(np.uint8)).float().to(device)
  
        return (states, actions, rewards, next_states, dones)

    def __len__(self):
        """Return the current size of internal memory."""
        return len(self.memory)

In [8]:
# # from dqn_agent import Agent

agent = Agent(state_size=37, action_size=4, seed=0)

# # watch an untrained agent
# state = env.reset()
# img = plt.imshow(env.render(mode='rgb_array'))
# for j in range(200):
#     action = agent.act(state)
#     img.set_data(env.render(mode='rgb_array')) 
#     plt.axis('off')
#     display.display(plt.gcf())
#     display.clear_output(wait=True)
#     state, reward, done, _ = env.step(action)
#     if done:
#         break 
        
# # env.close()

In [None]:
def dqn(n_episodes=2000, max_t=1000, eps_start=1.0, eps_end=0.01, eps_decay=0.995):
    """Deep Q-Learning.
    
    Params
    ======
        n_episodes (int): maximum number of training episodes
        max_t (int): maximum number of timesteps per episode
        eps_start (float): starting value of epsilon, for epsilon-greedy action selection
        eps_end (float): minimum value of epsilon
        eps_decay (float): multiplicative factor (per episode) for decreasing epsilon
    """
    scores = []                        # list containing scores from each episode
    scores_window = deque(maxlen=100)  # last 100 scores
    eps = eps_start                    # initialize epsilon
    for i_episode in range(1, n_episodes+1):
#         state = env.reset()
#         score = 0
        env_info = env.reset(train_mode=True)[brain_name]
        score = 0
        state = env_info.vector_observations[0]
        for t in range(max_t):
            action = agent.act(state, eps)
            env_info = env.step(action)[brain_name]        # send the action to the environment
            next_state = env_info.vector_observations[0]   # get the next state
            reward = env_info.rewards[0]                   # get the reward
            done = env_info.local_done[0]                  # see if episode has finished
#             score += reward                                # update the score
#             state = next_state
#             next_state, reward, done, _ = env.step(action)
            agent.step(state, action, reward, next_state, done)
            state = next_state
            score += reward
#             print('avg score',score/(t+1))
            if done:
                break 
        scores_window.append(score)       # save most recent score
        scores.append(score)              # save most recent score
        eps = max(eps_end, eps_decay*eps) # decrease epsilon
        print('\rEpisode {}\tAverage Score: {:.2f}'.format(i_episode, np.mean(scores_window)), end="")
        if i_episode % 100 == 0:
            print('\rEpisode {}\tAverage Score: {:.2f}'.format(i_episode, np.mean(scores_window)))
        if np.mean(scores_window)>=200.0:
            print('\nEnvironment solved in {:d} episodes!\tAverage Score: {:.2f}'.format(i_episode-100, np.mean(scores_window)))
            torch.save(agent.qnetwork_local.state_dict(), 'checkpoint.pth')
            break
    return scores

scores = dqn()

# plot the scores
fig = plt.figure()
ax = fig.add_subplot(111)
plt.plot(np.arange(len(scores)), scores)
plt.ylabel('Score')
plt.xlabel('Episode #')
plt.show()

avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.02631578947368421
avg score 0.02564102564102564
avg score 0.025
avg score 0.024390243902439025
avg score 0.023809523809523808
avg score 0.023255813953488372
avg score 0.022727272727272728
avg score 0.022222222222222223
avg score 0.021739130434782608
avg score 0.02127659574468085
avg score 0.020833333333333332
avg score 0.02040816326530612
avg score 0.02
avg score 0.0196078431372549
avg score 0.019230769230769232
avg score 0.018867924528301886
avg score 0.01851851851

avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg sc

avg score -0.0038022813688212928
avg score -0.003787878787878788
avg score -0.0037735849056603774
avg score -0.0037593984962406013
avg score -0.003745318352059925
avg score -0.0037313432835820895
avg score -0.0037174721189591076
avg score -0.003703703703703704
avg score -0.0036900369003690036
avg score -0.003676470588235294
avg score -0.003663003663003663
avg score -0.0036496350364963502
avg score -0.0036363636363636364
avg score -0.0036231884057971015
avg score -0.0036101083032490976
avg score -0.0035971223021582736
avg score -0.0035842293906810036
avg score -0.0035714285714285713
avg score -0.0035587188612099642
avg score -0.0035460992907801418
avg score -0.0035335689045936395
avg score -0.0035211267605633804
avg score -0.0035087719298245615
avg score -0.0034965034965034965
avg score -0.003484320557491289
avg score -0.003472222222222222
avg score -0.0034602076124567475
avg score -0.0034482758620689655
avg score -0.003436426116838488
avg score -0.003424657534246575
avg score -0.003412

avg score 0.0038461538461538464
avg score 0.0038314176245210726
avg score 0.003816793893129771
avg score 0.0038022813688212928
avg score 0.003787878787878788
avg score 0.0037735849056603774
avg score 0.0037593984962406013
avg score 0.003745318352059925
avg score 0.0037313432835820895
avg score 0.0037174721189591076
avg score 0.003703703703703704
avg score 0.0036900369003690036
avg score 0.003676470588235294
avg score 0.003663003663003663
avg score 0.0036496350364963502
avg score 0.0036363636363636364
avg score 0.0036231884057971015
avg score 0.0036101083032490976
avg score 0.0035971223021582736
avg score 0.0035842293906810036
avg score 0.0035714285714285713
avg score 0.0035587188612099642
avg score 0.0035460992907801418
avg score 0.0035335689045936395
avg score 0.0035211267605633804
avg score 0.0035087719298245615
avg score 0.0034965034965034965
avg score 0.003484320557491289
avg score 0.003472222222222222
avg score 0.0034602076124567475
avg score 0.0034482758620689655
avg score 0.0034

avg score 0.03225806451612903
avg score 0.03125
avg score 0.030303030303030304
avg score 0.029411764705882353
avg score 0.02857142857142857
avg score 0.027777777777777776
avg score 0.02702702702702703
avg score 0.02631578947368421
avg score 0.02564102564102564
avg score 0.025
avg score 0.024390243902439025
avg score 0.023809523809523808
avg score 0.023255813953488372
avg score 0.022727272727272728
avg score 0.022222222222222223
avg score 0.021739130434782608
avg score 0.02127659574468085
avg score 0.020833333333333332
avg score 0.02040816326530612
avg score 0.02
avg score 0.0196078431372549
avg score 0.019230769230769232
avg score 0.018867924528301886
avg score 0.018518518518518517
avg score 0.01818181818181818
avg score 0.017857142857142856
avg score 0.017543859649122806
avg score 0.017241379310344827
avg score 0.01694915254237288
avg score 0.016666666666666666
avg score 0.01639344262295082
avg score 0.016129032258064516
avg score 0.015873015873015872
avg score 0.015625
avg score 0.01

avg score -0.011235955056179775
avg score -0.011111111111111112
avg score -0.01098901098901099
avg score -0.010869565217391304
avg score -0.010752688172043012
avg score -0.010638297872340425
avg score -0.010526315789473684
avg score -0.010416666666666666
avg score -0.010309278350515464
avg score -0.01020408163265306
avg score -0.010101010101010102
avg score -0.01
avg score -0.009900990099009901
avg score -0.00980392156862745
avg score -0.009708737864077669
avg score -0.009615384615384616
avg score -0.009523809523809525
avg score -0.009433962264150943
avg score -0.009345794392523364
avg score -0.009259259259259259
avg score -0.009174311926605505
avg score -0.00909090909090909
avg score -0.009009009009009009
avg score -0.008928571428571428
avg score -0.008849557522123894
avg score -0.008771929824561403
avg score -0.008695652173913044
avg score -0.008620689655172414
avg score -0.008547008547008548
avg score -0.00847457627118644
avg score -0.008403361344537815
avg score -0.0083333333333333

avg score -0.023529411764705882
avg score -0.023255813953488372
avg score -0.022988505747126436
avg score -0.022727272727272728
avg score -0.033707865168539325
avg score -0.03333333333333333
avg score -0.03296703296703297
avg score -0.03260869565217391
avg score -0.03225806451612903
avg score -0.031914893617021274
avg score -0.031578947368421054
avg score -0.03125
avg score -0.030927835051546393
avg score -0.030612244897959183
avg score -0.030303030303030304
avg score -0.03
avg score -0.0297029702970297
avg score -0.029411764705882353
avg score -0.02912621359223301
avg score -0.028846153846153848
avg score -0.02857142857142857
avg score -0.02830188679245283
avg score -0.028037383177570093
avg score -0.027777777777777776
avg score -0.027522935779816515
avg score -0.02727272727272727
avg score -0.02702702702702703
avg score -0.026785714285714284
avg score -0.02654867256637168
avg score -0.02631578947368421
avg score -0.02608695652173913
avg score -0.02586206896551724
avg score -0.0256410

avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg sc

avg score 0.012345679012345678
avg score 0.012269938650306749
avg score 0.012195121951219513
avg score 0.012121212121212121
avg score 0.012048192771084338
avg score 0.011976047904191617
avg score 0.011904761904761904
avg score 0.011834319526627219
avg score 0.011764705882352941
avg score 0.011695906432748537
avg score 0.011627906976744186
avg score 0.011560693641618497
avg score 0.011494252873563218
avg score 0.011428571428571429
avg score 0.011363636363636364
avg score 0.011299435028248588
avg score 0.011235955056179775
avg score 0.0111731843575419
avg score 0.011111111111111112
avg score 0.011049723756906077
avg score 0.01098901098901099
avg score 0.01092896174863388
avg score 0.010869565217391304
avg score 0.010810810810810811
avg score 0.010752688172043012
avg score 0.0106951871657754
avg score 0.010638297872340425
avg score 0.010582010582010581
avg score 0.010526315789473684
avg score 0.010471204188481676
avg score 0.010416666666666666
avg score 0.010362694300518135
avg score 0.01

Episode 12	Average Score: -0.42avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg

avg score -0.022727272727272728
avg score -0.02247191011235955
avg score -0.022222222222222223
avg score -0.02197802197802198
avg score -0.021739130434782608
avg score -0.021505376344086023
avg score -0.02127659574468085
avg score -0.021052631578947368
avg score -0.020833333333333332
avg score -0.020618556701030927
avg score -0.02040816326530612
avg score -0.020202020202020204
avg score -0.02
avg score -0.019801980198019802
avg score -0.0196078431372549
avg score -0.019417475728155338
avg score -0.019230769230769232
avg score -0.01904761904761905
avg score -0.018867924528301886
avg score -0.018691588785046728
avg score -0.018518518518518517
avg score -0.01834862385321101
avg score -0.01818181818181818
avg score -0.018018018018018018
avg score -0.017857142857142856
avg score -0.017699115044247787
avg score -0.017543859649122806
avg score -0.017391304347826087
avg score -0.017241379310344827
avg score -0.017094017094017096
avg score -0.01694915254237288
avg score -0.01680672268907563
avg

avg score 0.006369426751592357
avg score 0.006329113924050633
avg score 0.006289308176100629
avg score 0.00625
avg score 0.006211180124223602
avg score 0.006172839506172839
avg score 0.006134969325153374
avg score 0.006097560975609756
avg score 0.006060606060606061
avg score 0.006024096385542169
avg score 0.005988023952095809
avg score 0.005952380952380952
avg score 0.005917159763313609
avg score 0.0058823529411764705
avg score 0.005847953216374269
avg score 0.005813953488372093
avg score 0.005780346820809248
avg score 0.005747126436781609
avg score 0.005714285714285714
avg score 0.005681818181818182
avg score 0.005649717514124294
avg score 0.0056179775280898875
avg score 0.00558659217877095
avg score 0.005555555555555556
avg score 0.0055248618784530384
avg score 0.005494505494505495
avg score 0.00546448087431694
avg score 0.005434782608695652
avg score 0.005405405405405406
avg score 0.005376344086021506
avg score 0.0053475935828877
avg score 0.005319148936170213
avg score 0.0052910052

avg score -0.005
avg score -0.004975124378109453
avg score -0.0049504950495049506
avg score -0.0049261083743842365
avg score -0.004901960784313725
avg score -0.004878048780487805
avg score -0.0048543689320388345
avg score -0.004830917874396135
avg score -0.004807692307692308
avg score -0.004784688995215311
avg score -0.004761904761904762
avg score -0.004739336492890996
avg score -0.0047169811320754715
avg score -0.004694835680751174
avg score -0.004672897196261682
avg score -0.004651162790697674
avg score -0.004629629629629629
avg score -0.004608294930875576
avg score -0.0045871559633027525
avg score -0.0045662100456621
avg score -0.004545454545454545
avg score -0.004524886877828055
avg score -0.0045045045045045045
avg score -0.004484304932735426
avg score -0.004464285714285714
avg score -0.0044444444444444444
avg score -0.004424778761061947
avg score -0.004405286343612335
avg score -0.0043859649122807015
avg score -0.004366812227074236
avg score -0.004347826086956522
avg score -0.0043

avg score 0.0036900369003690036
avg score 0.003676470588235294
avg score 0.003663003663003663
avg score 0.0036496350364963502
avg score 0.0036363636363636364
avg score 0.0036231884057971015
avg score 0.0036101083032490976
avg score 0.0035971223021582736
avg score 0.0035842293906810036
avg score 0.0035714285714285713
avg score 0.0035587188612099642
avg score 0.0035460992907801418
avg score 0.0035335689045936395
avg score 0.0035211267605633804
avg score 0.0035087719298245615
avg score 0.0034965034965034965
avg score 0.003484320557491289
avg score 0.003472222222222222
avg score 0.0034602076124567475
avg score 0.0034482758620689655
avg score 0.003436426116838488
avg score 0.003424657534246575
avg score 0.0034129692832764505
avg score 0.003401360544217687
avg score 0.003389830508474576
avg score 0.0033783783783783786
avg score 0.003367003367003367
avg score 0.003355704697986577
avg score 0.006688963210702341
avg score 0.006666666666666667
Episode 17	Average Score: -0.18avg score 0.0
avg sco

avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
Episode 19	Average Score: -0.16avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg

avg score 0.007352941176470588
avg score 0.0072992700729927005
avg score 0.007246376811594203
avg score 0.007194244604316547
avg score 0.007142857142857143
avg score 0.0070921985815602835
avg score 0.007042253521126761
avg score 0.006993006993006993
avg score 0.006944444444444444
avg score 0.006896551724137931
avg score 0.00684931506849315
avg score 0.006802721088435374
avg score 0.006756756756756757
avg score 0.006711409395973154
avg score 0.006666666666666667
avg score 0.006622516556291391
avg score 0.006578947368421052
avg score 0.006535947712418301
avg score 0.006493506493506494
avg score 0.0064516129032258064
avg score 0.00641025641025641
avg score 0.006369426751592357
avg score 0.006329113924050633
avg score 0.006289308176100629
avg score 0.00625
avg score 0.006211180124223602
avg score 0.006172839506172839
avg score 0.006134969325153374
avg score 0.006097560975609756
avg score 0.006060606060606061
avg score 0.006024096385542169
avg score 0.005988023952095809
avg score 0.00595238

avg score -0.00684931506849315
avg score -0.006802721088435374
avg score -0.006756756756756757
avg score -0.006711409395973154
avg score -0.006666666666666667
avg score -0.006622516556291391
avg score -0.006578947368421052
avg score -0.006535947712418301
avg score -0.006493506493506494
avg score -0.0064516129032258064
avg score -0.00641025641025641
avg score -0.006369426751592357
avg score -0.006329113924050633
avg score -0.006289308176100629
avg score -0.00625
avg score -0.006211180124223602
avg score -0.006172839506172839
avg score -0.006134969325153374
avg score -0.006097560975609756
avg score -0.006060606060606061
avg score -0.006024096385542169
avg score -0.005988023952095809
avg score -0.005952380952380952
avg score -0.005917159763313609
avg score -0.0058823529411764705
avg score -0.005847953216374269
avg score -0.005813953488372093
avg score -0.005780346820809248
avg score -0.005747126436781609
avg score -0.005714285714285714
avg score -0.005681818181818182
avg score -0.00564971

avg score -0.004784688995215311
avg score -0.004761904761904762
avg score -0.004739336492890996
avg score -0.0047169811320754715
avg score -0.004694835680751174
avg score -0.004672897196261682
avg score -0.004651162790697674
avg score -0.004629629629629629
avg score -0.004608294930875576
avg score -0.0045871559633027525
avg score -0.0045662100456621
avg score -0.004545454545454545
avg score -0.004524886877828055
avg score -0.0045045045045045045
avg score -0.004484304932735426
avg score -0.004464285714285714
avg score -0.0044444444444444444
avg score -0.004424778761061947
avg score -0.004405286343612335
avg score -0.0043859649122807015
avg score -0.004366812227074236
avg score -0.004347826086956522
avg score -0.004329004329004329
avg score -0.004310344827586207
avg score -0.004291845493562232
avg score -0.004273504273504274
avg score -0.00425531914893617
avg score -0.00423728813559322
avg score -0.004219409282700422
avg score -0.004201680672268907
avg score -0.0041841004184100415
avg sc

avg score -0.009615384615384616
avg score -0.009569377990430622
avg score -0.009523809523809525
avg score -0.009478672985781991
avg score -0.009433962264150943
avg score -0.009389671361502348
avg score -0.009345794392523364
avg score -0.009302325581395349
avg score -0.009259259259259259
avg score -0.009216589861751152
avg score -0.009174311926605505
avg score -0.0091324200913242
avg score -0.00909090909090909
avg score -0.00904977375565611
avg score -0.009009009009009009
avg score -0.008968609865470852
avg score -0.008928571428571428
avg score -0.008888888888888889
avg score -0.008849557522123894
avg score -0.00881057268722467
avg score -0.008771929824561403
avg score -0.008733624454148471
avg score -0.008695652173913044
avg score -0.008658008658008658
avg score -0.008620689655172414
avg score -0.008583690987124463
avg score -0.008547008547008548
avg score -0.00851063829787234
avg score -0.00847457627118644
avg score -0.008438818565400843
avg score -0.008403361344537815
avg score -0.00

avg score -0.003424657534246575
avg score -0.0034129692832764505
avg score -0.003401360544217687
avg score -0.003389830508474576
avg score -0.0033783783783783786
avg score -0.003367003367003367
avg score -0.003355704697986577
avg score -0.0033444816053511705
avg score -0.0033333333333333335
Episode 25	Average Score: -0.20avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.037037037037037035
avg score 0.03571428571428571
avg score 0.034482758620689655
avg score 0.03333333333333333
avg score 0.03225806451612903
avg score 0.03125
avg score 0.030303030303030304
avg score 0.029411764705882353
avg score 0.02857142857142857
avg score 0.027777777777777776
avg score 0.02702702

avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.029411764705882353
avg score 0.02857142857142857
avg score 0.027777777777777776
avg score 0.02702702702702703
avg score 0.02631578947368421
avg score 0.02564102564102564
avg score 0.025
avg score 0.024390243902439025
avg score 0.023809523809523808
avg score 0.023255813953488372
avg score 0.022727272727272728
avg score 0.022222222222222223
avg score 0.021739130434782608
avg score 0.02127659574468085
avg score 0.020833333333333332
avg score 0.02040816326530612
avg score 0.02
avg score 0.0196078431372549
avg score 0.019230769230769232
avg score 0.018867924528301886
avg score 0.018518518518518

avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg sc

avg score -0.008
avg score -0.007936507936507936
avg score -0.007874015748031496
avg score -0.0078125
avg score -0.007751937984496124
avg score -0.007692307692307693
avg score -0.007633587786259542
avg score -0.007575757575757576
avg score -0.007518796992481203
avg score -0.007462686567164179
avg score -0.007407407407407408
avg score -0.007352941176470588
avg score -0.0072992700729927005
avg score -0.007246376811594203
avg score -0.007194244604316547
avg score -0.007142857142857143
avg score -0.0070921985815602835
avg score -0.007042253521126761
avg score -0.006993006993006993
avg score -0.006944444444444444
avg score -0.006896551724137931
avg score -0.00684931506849315
avg score -0.006802721088435374
avg score -0.006756756756756757
avg score -0.006711409395973154
avg score -0.006666666666666667
avg score -0.006622516556291391
avg score -0.006578947368421052
avg score -0.006535947712418301
avg score -0.006493506493506494
avg score -0.0064516129032258064
avg score -0.00641025641025641
a

avg score -0.007692307692307693
avg score -0.007633587786259542
avg score -0.007575757575757576
avg score -0.007518796992481203
avg score -0.007462686567164179
avg score -0.007407407407407408
avg score -0.007352941176470588
avg score -0.0072992700729927005
avg score -0.007246376811594203
avg score -0.007194244604316547
avg score -0.007142857142857143
avg score -0.0070921985815602835
avg score -0.007042253521126761
avg score -0.006993006993006993
avg score -0.006944444444444444
avg score -0.006896551724137931
avg score -0.00684931506849315
avg score -0.006802721088435374
avg score -0.006756756756756757
avg score -0.006711409395973154
avg score -0.006666666666666667
avg score -0.006622516556291391
avg score -0.006578947368421052
avg score -0.006535947712418301
avg score -0.006493506493506494
avg score -0.0064516129032258064
avg score -0.00641025641025641
avg score -0.006369426751592357
avg score -0.006329113924050633
avg score -0.006289308176100629
avg score -0.00625
avg score -0.0062111

avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
Episode 31	Average Score: -0.19avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg

avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
Episode 33	Average Score: -0.18avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg

avg score 0.027777777777777776
avg score 0.02702702702702703
avg score 0.02631578947368421
avg score 0.02564102564102564
avg score 0.025
avg score 0.024390243902439025
avg score 0.023809523809523808
avg score 0.023255813953488372
avg score 0.022727272727272728
avg score 0.044444444444444446
avg score 0.043478260869565216
avg score 0.0425531914893617
avg score 0.041666666666666664
avg score 0.04081632653061224
avg score 0.04
avg score 0.0392156862745098
avg score 0.038461538461538464
avg score 0.03773584905660377
avg score 0.037037037037037035
avg score 0.03636363636363636
avg score 0.03571428571428571
avg score 0.03508771929824561
avg score 0.034482758620689655
avg score 0.03389830508474576
avg score 0.03333333333333333
avg score 0.03278688524590164
avg score 0.03225806451612903
avg score 0.031746031746031744
avg score 0.03125
avg score 0.03076923076923077
avg score 0.030303030303030304
avg score 0.029850746268656716
avg score 0.029411764705882353
avg score 0.028985507246376812
avg sco

avg score 0.04
avg score 0.038461538461538464
avg score 0.037037037037037035
avg score 0.03571428571428571
avg score 0.034482758620689655
avg score 0.03333333333333333
avg score 0.03225806451612903
avg score 0.03125
avg score 0.030303030303030304
avg score 0.029411764705882353
avg score 0.02857142857142857
avg score 0.027777777777777776
avg score 0.02702702702702703
avg score 0.02631578947368421
avg score 0.02564102564102564
avg score 0.025
avg score 0.024390243902439025
avg score 0.023809523809523808
avg score 0.023255813953488372
avg score 0.022727272727272728
avg score 0.022222222222222223
avg score 0.021739130434782608
avg score 0.02127659574468085
avg score 0.020833333333333332
avg score 0.02040816326530612
avg score 0.02
avg score 0.0196078431372549
avg score 0.019230769230769232
avg score 0.018867924528301886
avg score 0.018518518518518517
avg score 0.01818181818181818
avg score 0.017857142857142856
avg score 0.017543859649122806
avg score 0.017241379310344827
avg score 0.016949

avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg sc

avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg sc

avg score 0.009259259259259259
avg score 0.009216589861751152
avg score 0.009174311926605505
avg score 0.0091324200913242
avg score 0.00909090909090909
avg score 0.00904977375565611
avg score 0.009009009009009009
avg score 0.008968609865470852
avg score 0.008928571428571428
avg score 0.008888888888888889
avg score 0.008849557522123894
avg score 0.00881057268722467
avg score 0.008771929824561403
avg score 0.008733624454148471
avg score 0.008695652173913044
avg score 0.008658008658008658
avg score 0.008620689655172414
avg score 0.008583690987124463
avg score 0.008547008547008548
avg score 0.00851063829787234
avg score 0.00847457627118644
avg score 0.008438818565400843
avg score 0.008403361344537815
avg score 0.008368200836820083
avg score 0.008333333333333333
avg score 0.008298755186721992
avg score 0.008264462809917356
avg score 0.00823045267489712
avg score 0.00819672131147541
avg score 0.00816326530612245
avg score 0.008130081300813009
avg score 0.008097165991902834
avg score 0.008064

avg score 0.00851063829787234
avg score 0.00847457627118644
avg score 0.008438818565400843
avg score 0.008403361344537815
avg score 0.008368200836820083
avg score 0.008333333333333333
avg score 0.008298755186721992
avg score 0.008264462809917356
avg score 0.00823045267489712
avg score 0.00819672131147541
avg score 0.00816326530612245
avg score 0.008130081300813009
avg score 0.008097165991902834
avg score 0.008064516129032258
avg score 0.008032128514056224
avg score 0.008
avg score 0.00796812749003984
avg score 0.007936507936507936
avg score 0.007905138339920948
avg score 0.007874015748031496
avg score 0.00784313725490196
avg score 0.0078125
avg score 0.007782101167315175
avg score 0.007751937984496124
avg score 0.007722007722007722
avg score 0.007692307692307693
avg score 0.007662835249042145
avg score 0.007633587786259542
avg score 0.0076045627376425855
avg score 0.007575757575757576
avg score 0.007547169811320755
avg score 0.007518796992481203
avg score 0.00749063670411985
avg score 

avg score 0.004016064257028112
avg score 0.004
avg score 0.00398406374501992
avg score 0.003968253968253968
avg score 0.003952569169960474
avg score 0.003937007874015748
avg score 0.00392156862745098
avg score 0.00390625
avg score 0.0038910505836575876
avg score 0.003875968992248062
avg score 0.003861003861003861
avg score 0.0038461538461538464
avg score 0.0038314176245210726
avg score 0.003816793893129771
avg score 0.0038022813688212928
avg score 0.003787878787878788
avg score 0.0037735849056603774
avg score 0.0037593984962406013
avg score 0.003745318352059925
avg score 0.0037313432835820895
avg score 0.0037174721189591076
avg score 0.003703703703703704
avg score 0.0036900369003690036
avg score 0.003676470588235294
avg score 0.003663003663003663
avg score 0.0036496350364963502
avg score 0.0036363636363636364
avg score 0.0036231884057971015
avg score 0.0036101083032490976
avg score 0.0035971223021582736
avg score 0.0035842293906810036
avg score 0.0035714285714285713
avg score 0.0035587

avg score 0.008695652173913044
avg score 0.008620689655172414
avg score 0.008547008547008548
avg score 0.01694915254237288
avg score 0.01680672268907563
avg score 0.016666666666666666
avg score 0.01652892561983471
avg score 0.01639344262295082
avg score 0.016260162601626018
avg score 0.016129032258064516
avg score 0.016
avg score 0.015873015873015872
avg score 0.015748031496062992
avg score 0.015625
avg score 0.015503875968992248
avg score 0.015384615384615385
avg score 0.015267175572519083
avg score 0.007575757575757576
avg score 0.007518796992481203
avg score 0.007462686567164179
avg score 0.007407407407407408
avg score 0.007352941176470588
avg score 0.0072992700729927005
avg score 0.007246376811594203
avg score 0.007194244604316547
avg score 0.007142857142857143
avg score 0.0070921985815602835
avg score 0.007042253521126761
avg score 0.006993006993006993
avg score 0.006944444444444444
avg score 0.013793103448275862
avg score 0.0136986301369863
avg score 0.013605442176870748
avg scor

avg score -0.008
avg score -0.007936507936507936
avg score -0.007874015748031496
avg score -0.0078125
avg score -0.007751937984496124
avg score -0.007692307692307693
avg score -0.007633587786259542
avg score -0.007575757575757576
avg score -0.007518796992481203
avg score -0.007462686567164179
avg score -0.007407407407407408
avg score -0.007352941176470588
avg score -0.0072992700729927005
avg score -0.007246376811594203
avg score -0.007194244604316547
avg score -0.007142857142857143
avg score -0.0070921985815602835
avg score -0.007042253521126761
avg score -0.006993006993006993
avg score -0.006944444444444444
avg score -0.006896551724137931
avg score -0.00684931506849315
avg score -0.006802721088435374
avg score -0.006756756756756757
avg score -0.006711409395973154
avg score -0.006666666666666667
avg score -0.006622516556291391
avg score -0.006578947368421052
avg score -0.006535947712418301
avg score -0.006493506493506494
avg score -0.0064516129032258064
avg score -0.00641025641025641
a

avg score 0.0035211267605633804
avg score 0.0035087719298245615
avg score 0.0034965034965034965
avg score 0.003484320557491289
avg score 0.003472222222222222
avg score 0.0034602076124567475
avg score 0.0034482758620689655
avg score 0.003436426116838488
avg score 0.003424657534246575
avg score 0.0034129692832764505
avg score 0.003401360544217687
avg score 0.003389830508474576
avg score 0.0033783783783783786
avg score 0.003367003367003367
avg score 0.003355704697986577
avg score 0.0033444816053511705
avg score 0.0033333333333333335
Episode 45	Average Score: 0.20avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0625
avg score 0.058823529411764705
avg score 0.05555555555555555
avg score 0.05263157894736842
avg score 0.05
avg score 0.047619047619047616
avg score 0.045454545454545456
avg score 0.043478260869565216
avg scor

avg score 0.007326007326007326
avg score 0.0072992700729927005
avg score 0.007272727272727273
avg score 0.007246376811594203
avg score 0.007220216606498195
avg score 0.0035971223021582736
avg score 0.0035842293906810036
avg score 0.0035714285714285713
avg score 0.0035587188612099642
avg score 0.0035460992907801418
avg score 0.0035335689045936395
avg score 0.0035211267605633804
avg score 0.0035087719298245615
avg score 0.0034965034965034965
avg score 0.003484320557491289
avg score 0.003472222222222222
avg score 0.0034602076124567475
avg score 0.0034482758620689655
avg score 0.003436426116838488
avg score 0.003424657534246575
avg score 0.0034129692832764505
avg score 0.003401360544217687
avg score 0.003389830508474576
avg score 0.0033783783783783786
avg score 0.003367003367003367
avg score 0.003355704697986577
avg score 0.0033444816053511705
avg score 0.0033333333333333335
Episode 46	Average Score: 0.22avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
av

avg score 0.0035714285714285713
avg score 0.0035587188612099642
avg score 0.0035460992907801418
avg score 0.0035335689045936395
avg score 0.0035211267605633804
avg score 0.0035087719298245615
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
Episode 47	Average Score: 0.21avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg score 0.0
avg scor

When finished, you can close the environment.

In [None]:
env.close()

### 4. It's Your Turn!

Now it's your turn to train your own agent to solve the environment!  When training the environment, set `train_mode=True`, so that the line for resetting the environment looks like the following:
```python
env_info = env.reset(train_mode=True)[brain_name]
```