# Navigation

---

You are welcome to use this coding environment to train your agent for the project.  Follow the instructions below to get started!

### 1. Start the Environment

Run the next code cell to install a few packages.  This line will take a few minutes to run!

In [1]:
!pip -q install ./python

[31mtensorflow 1.7.1 has requirement numpy>=1.13.3, but you'll have numpy 1.12.1 which is incompatible.[0m
[31mipython 6.5.0 has requirement prompt-toolkit<2.0.0,>=1.0.15, but you'll have prompt-toolkit 2.0.10 which is incompatible.[0m


The environment is already saved in the Workspace and can be accessed at the file path provided below.  Please run the next code cell without making any changes.

In [2]:
from unityagents import UnityEnvironment
import numpy as np

# please do not modify the line below
env = UnityEnvironment(file_name="/data/Banana_Linux_NoVis/Banana.x86_64")

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		
Unity brain name: BananaBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 37
        Number of stacked Vector Observation: 1
        Vector Action space type: discrete
        Vector Action space size (per agent): 4
        Vector Action descriptions: , , , 


Environments contain **_brains_** which are responsible for deciding the actions of their associated agents. Here we check for the first brain available, and set it as the default brain we will be controlling from Python.

In [3]:
# get the default brain
brain_name = env.brain_names[0]
brain = env.brains[brain_name]

### 2. Examine the State and Action Spaces

Run the code cell below to print some information about the environment.

In [4]:
# reset the environment
env_info = env.reset(train_mode=True)[brain_name]

# number of agents in the environment
print('Number of agents:', len(env_info.agents))

# number of actions
action_size = brain.vector_action_space_size
print('Number of actions:', action_size)

# examine the state space 
state = env_info.vector_observations[0]
print('States look like:', state)
state_size = len(state)
print('States have length:', state_size)

Number of agents: 1
Number of actions: 4
States look like: [ 1.          0.          0.          0.          0.84408134  0.          0.
  1.          0.          0.0748472   0.          1.          0.          0.
  0.25755     1.          0.          0.          0.          0.74177343
  0.          1.          0.          0.          0.25854847  0.          0.
  1.          0.          0.09355672  0.          1.          0.          0.
  0.31969345  0.          0.        ]
States have length: 37


### 3. Take Random Actions in the Environment

In the next code cell, you will learn how to use the Python API to control the agent and receive feedback from the environment.

Note that **in this coding environment, you will not be able to watch the agent while it is training**, and you should set `train_mode=True` to restart the environment.

In [5]:
env_info = env.reset(train_mode=True)[brain_name] # reset the environment
state = env_info.vector_observations[0]            # get the current state
score = 0                                          # initialize the score
while True:
    action = np.random.randint(action_size)        # select an action
    env_info = env.step(action)[brain_name]        # send the action to the environment
    next_state = env_info.vector_observations[0]   # get the next state
    reward = env_info.rewards[0]                   # get the reward
    done = env_info.local_done[0]                  # see if episode has finished
    score += reward                                # update the score
    state = next_state                             # roll over the state to next time step
    if done:                                       # exit loop if episode finished
        break
    
print("Score: {}".format(score))

Score: 0.0



When finished, you can close the environment.

In [6]:
env.close()

### 4. It's Your Turn!

Now it's your turn to train your own agent to solve the environment!  A few **important notes**:
- When training the environment, set `train_mode=True`, so that the line for resetting the environment looks like the following:
```python
env_info = env.reset(train_mode=True)[brain_name]
```
- To structure your work, you're welcome to work directly in this Jupyter notebook, or you might like to start over with a new file!  You can see the list of files in the workspace by clicking on **_Jupyter_** in the top left corner of the notebook.
- In this coding environment, you will not be able to watch the agent while it is training.  However, **_after training the agent_**, you can download the saved model weights to watch the agent on your own machine! 

In [14]:
# IMPORT LIBRARIES

import numpy as np
import torch
from torch import nn as nn
import torch.nn.functional as F
import torch.optim as optim
from collections import deque
import pickle

In [15]:
# CREATE A SUMTREE TO STORE THE EPISODES

class SumTree():

    def __init__(self, capacity):

        self.capacity = capacity                                    # number of leaf nodes (Priorities)
        self.tree = np.zeros(capacity*2 - 1)                        # array with total number of nodes
        self.data = np.zeros(capacity, dtype=object)                # array to store the data [S, A, S', R, done]
        self.height = np.log2(capacity) + 1                         # number of layer in the tree
        self.pointer = 0                                            # will be used to select where store new data

    def add(self, p, data):
        """
        Arguments:
        p -- scalar value: priority value = abs(expected - target)
        data -- list [S, A, R, S', done]

        Return:
        Add a new priority value to the leaf node, and associate this leaf node to a list containing [S, A, R, S', done]
        """
        # add a new priority and data to the tree.
        idx = self.capacity - 1 + self.pointer
        self.update(idx, p)
        self.data[self.pointer] = data
        self.pointer = self.pointer + 1
        # if the tree if full, we will replace existing priorities, starting from the beginning (First in First Out)
        if self.pointer > self.capacity-1:
            self.pointer = 0

    def _propagate(self, idx, change):
        """
        Arguments:
        idx -- scalar value
        change -- scalar value:  difference btw new and old priority

        Return:
        propagate the change from the leaf node to the root
        """

        # calculate the parent node of this index
        parent = (idx - 1) // 2
        # add the change to this parent node
        self.tree[parent] += change

        # if the parent node is not the root, repeat the process until the root node is updated
        if parent != 0:
            idx = parent
            self._propagate(idx, change)

    def total_priority(self):
        """
        sum of all priorities, value of the root node
        """
        return self.tree[0]

    def update(self, idx, p):
        """
        Arguments:
        idx -- scalar value: index of the leaf node that we want to update
        p -- scalar: new priority value

        Return:
        propagate the difference btw the new and old  priority value in each parent node until we reach the root node
        """
        # calculate the difference btw the new priority and old priority (change).
        change = p - self.tree[idx]
        self.tree[idx] = p
        # propagate this change from the leaf until the root node
        self._propagate(idx, change)

    def _retrieve(self, value, parent):
        """
        Arguments:
        value -- scalar:  number that represents the sum of the leaf nodes that we random choose to looking for.
        parent -- start with  0 (root node)

        Return:
        list with [index , priority, [S, A, R, Next S, done]]
        """

        left_node = parent*2 + 1
        right_node = left_node + 1

        if value < self.tree[left_node]:
            if left_node >= self.capacity-1:
                return [left_node, self.tree[left_node], self.data[left_node - self.capacity+1]]
            else:
                parent = left_node
                return self._retrieve(value, parent)

        else:
            if right_node >= self.capacity-1:
                return [right_node, self.tree[right_node], self.data[right_node - self.capacity+1]]
            else:
                parent = right_node
                value = value - self.tree[left_node]
                return self._retrieve(value, parent)

In [16]:
# CREATE PRIORITIZE EXPERIENCE REPLAY

# SUMTREE will be used to store the experiences

class PER():

    def __init__(self, capacity, epsilon_per, a_per, b_per, increment_b, absolute_error):

        self.capacity = capacity                            # maximum number of priorities (leaf nodes)
        self.epsilon_per = epsilon_per                      # add to the priority, avoid to have priority zero
        self.a_per = a_per                                  # [0,1]  0 uniform distribution
        self.b_per = b_per                                  # [0,1]  calculate Importance Sampling Weights  IS_Weights
        self.increment_b = increment_b                      # how fast move b_per to 1.
        self.absolute_error = absolute_error                # maximum Priority value
        # Create a SumTree
        self.sumtree = SumTree(self.capacity)

    def add(self, data):
        """
        Arguments:
        data -- list [S, A, R, S', done]  each element of the list can be anything that you want

        Return
        it will include the data in the Sum Tree and add a new priority value to the leaf node
        """
        # calculate the priority of a new experiences. The Error was not calculated yet
        p_max = max(self.sumtree.tree[self.capacity-1:])
        if p_max == 0:
            p = self.absolute_error
        else:
            p = p_max
        self.sumtree.add(p, data)

    def sample(self, n):
        """
        Arguments:
        n -- number of elements you want to retrieve from the Sum Tree

        Return:
        idxs -- list with "n"" indexes related to the samples [index1, index2, etc...] - scalar
        mini_batches -- list with "n" experiences [[S1, A1, R1, S1', done1], [S1, A1, R1, S1', done1], etc...] - tensors
        IS_Weights -- list with "n" IS_weights  [IS_Weight_1, IS_Weight_2, etc.. ]  - scalar
        """

        # create a list to store the values
        idxs = []
        mini_batches = []
        IS_Weights = []

        # divide the priorities in "n" ranges. Select one experience on each range using an uniform distribution
        total_priority = self.sumtree.total_priority()
        priority_segment = total_priority/n

        # calculate Max Weight
        # the priorities added on the tree, are already (error + epsilon)^ per_a

        # if there are empty leaf nodes, some priorities are equal to 0
        if min(self.sumtree.tree[self.capacity - 1:]) == 0:
            # we only want to look for the min priority among the leaf nodes that has an experience
            start = self.capacity - 1
            end = start + self.sumtree.pointer
            p_min = min(self.sumtree.tree[start: end] / total_priority)
        # if the tree is full
        else:
            p_min = min(self.sumtree.tree[self.capacity - 1:] / total_priority)

        max_weight = (1 / (p_min * n)) ** self.b_per


        for i in range(n):

            # random select a number in each range of priorities
            min_value, max_value = i*priority_segment, (i+1)*priority_segment
            value = np.random.uniform(min_value, max_value)
            # get priority and [S,A,R,S', done] from the SumTree
            idx, priority, mini_batch = self.sumtree._retrieve(value, parent=0)

            # calculate IS Weight
            probability = priority / total_priority
            # normalize the IS Weight
            IS_Weight = ((1 / (probability*n))**self.b_per) / max_weight

            # store the index, mini_batch, IS Weight
            idxs.append(idx)
            mini_batches.append(mini_batch)
            IS_Weights.append(IS_Weight)

        # update per_b each time we sample experiences to train the Network
        self.b_per = min(self.b_per + self.increment_b, 1)


        return idxs, mini_batches, IS_Weights


    def update(self, idx, error):
        """
        Arguments:
        idx --list of Indexes [idx_1, idx_2, ... idx_n]
        error -- list of scalar values [priority_1, priority_2,.., priority_n]

        Return:
        It will include the error on the SumTree and recalculate its values from the leaf to the root
        """
        error = np.array(error)
        priority = error + self.epsilon_per
        priority = priority ** self.a_per
        priorities = np.minimum(priority, self.absolute_error)
        for i, j in list(zip(idx, priorities)):
            self.sumtree.update(i, j)


In [18]:
# CREATE A DUELING DQN

class Network(nn.Module):
    def __init__(self):
        super(Network, self).__init__()
        self.fc1 = nn.Linear(in_features=37, out_features=64)
        self.fc2 = nn.Linear(in_features=64, out_features=64)
        self.fc3 = nn.Linear(in_features=64, out_features=64)
        self.fc4 = nn.Linear(in_features=64, out_features=64)
        self.fc5 = nn.Linear(in_features=64, out_features=1)
        self.fc6 = nn.Linear(in_features=64, out_features=4)

    def forward(self, t):
        t = self.fc1(t)
        t = F.relu(t)

        t = self.fc2(t)
        t = F.relu(t)

        t = self.fc3(t)
        t = F.relu(t)
        
        t = self.fc4(t)
        t = F.relu(t)
        
        # Value of the State
        v = self.fc5(t)

        # Advantage of take an Action
        a = self.fc6(t)
        average_a = a.mean(dim=1).unsqueeze(1)

        # Q value =  V + A - Average_A
        q = torch.add(v, torch.add(a, - average_a))

        return q

In [19]:
# CREATE AN AGENT TO INTERACT WITH THE ENVIRONMENT

class Agent():
    """ Create an Agent that will interact with the environment and decide the best action to take in order to receive
    more rewards. """

    def __init__(self, env, state_size, action_size, epsilon, epsilon_decay, epsilon_min, gamma, tau, batch_size,
                 update_every, learning_rate, capacity, epsilon_per, a_per, b_per, increment_b, absolute_error,
                 terminal_state):

        # Environment
        self.env = env                                    # instance of an environment 
        self.state_size = state_size                      # state size -- input of the DQN (number of features)
        self.action_size = action_size                    # number of action -- output DQN

        # Exploration Rate
        self.epsilon = epsilon                            # exploration rate
        self.epsilon_decay = epsilon_decay                # factor to decrease the exploration rate at each episode
        self.epsilon_min = epsilon_min                    # min exploration rate

        # Train DQN
        self.gamma = gamma                                # discount rate of the future rewards
        self.tau = tau                                    # learning rate to update the parameters of the DQN_Target
        self.batch_size = batch_size                      # number of examples the DQN will train at the same time
        self.update_every = update_every                  # number of time steps we need to take, before update DQN and DQN_Target
        self.learning_rate = learning_rate                # learning rate to train the DQN
        self.counter = 0                                  # count number of steps in order to update the DQN and DQN_Target
        self.terminal_state = terminal_state              # Q Value of the Terminal State

        # Prioritize Experience Replay - PER
        self.capacity = capacity                          # maximum number of experiences (leaf nodes)
        self.epsilon_per = epsilon_per                    # add to the priority, avoid priority equal zero
        self.a_per = a_per                                # [0,1]  0 uniform distribution
        self.b_per = b_per                                # [0,1]  calculate Importance Sampling Weights  IS_Weights
        self.increment_b = increment_b                    # how fast move b_per to 1.
        self.absolute_error = absolute_error              # maximum Priority value

        # Create a Prioritize Experience Replay - PER
        # Store Experiences (S, A, R, S', P)
        self.per = PER(self.capacity, self.epsilon_per, self.a_per, self.b_per, self.increment_b, self.absolute_error)


    def create_Dueling_DQN(self):
        # Create a Dueling  DQN

        # Create a DQN to calculate the Q(state,action)
        self.DQN = Network()

        # Create a DQN Target to calculate the Q(next_state, action)
        self.DQN_Target = Network()

        # set DQN_Target parameters equal to DQN
        self.DQN_Target.load_state_dict(self.DQN.state_dict())


    def update_DQN_Target(self):
        # Soft update the parameters of the DQN_Target:
        # p_target = (1-tau)*p_target + tau*p_dqn

        
        for p_dqn, p_target in zip(self.DQN.parameters(), self.DQN_Target.parameters()):
            p_target.data.copy_((1-self.tau)*p_target.data + self.tau*p_dqn.data)

        
    def train_DQN(self, idxs, mini_batches, IS_Weights):
        """
        Arguments:
        idx -- list with index of the samples (lef nodes of the SumTree)
        mini_batches -- List os Tensors [State, Action, Reward, Next State, done]
        IS_Weights -- list with [IS_Weights]

        Return:
        loss -- value of the lost function
        error -- list of tensors with the priority value to update the SumTree
        idx -- list with index of the samples (lef nodes of the SumTree)
        """

        # CALCULATE THE LOSS

        # retrieve experience  [S, A, R, S', done]  from mini_batches
        # Concatenate the values
        states, actions, rewards, next_states, dones = list(zip(*mini_batches))
        states = torch.cat(states, 0)
        next_states = torch.cat(next_states, 0)
        actions = torch.cat(actions, 0)
        rewards = torch.cat(rewards, 0)
        dones = torch.cat(dones, 0)


        # Importance Sampling Weight - Prioritize Experience Replay-PER:   shape (batch_size, 1)
        IS_Weights = torch.tensor(IS_Weights, dtype=torch.float32).unsqueeze(1)


        # Find the expected Q(state, action).  shape (batch_size, 1)
        expected = self.DQN(states).gather(1, actions)


        # DOUBLE DQN / FIXED TARGET  
        
        # Using DQN -- Find the  best action using as input Next_State: Q(Next_State, all actions)
        _, action_DQN = self.DQN(next_states).max(1)
        action_DQN = action_DQN.unsqueeze(1)

        # Using DQN_Target -- Find the Q Value:    Q(Next_State, best action found using Next State on DQN)
        Q_Next = self.DQN_Target(next_states).detach().gather(1, action_DQN)


        # Q(next) = 0,  if it is a terminal state and the game is over
        if self.terminal_state == 0:
            target = (rewards + (self.gamma * Q_Next * (1 - dones))).detach()
        else:
            target = (rewards + (self.gamma * Q_Next)).detach()


        # calculate the error - absolute value ==> convert to a list with scalar values
        error1 = torch.abs(expected - target).detach()
        error = [i.item() for i in error1.view(-1, 1)]
            
        # calculate the Loss using the Importance Sampling Weights  -  PER
        loss = torch.sum(IS_Weights * (expected - target) ** 2)

        # define the optimizer ADAM or SGD
        self.optimizer = optim.SGD(self.DQN.parameters(), lr=self.learning_rate)

        # zero gradients.  This tensor accumulates the gradients at each step.
        self.optimizer.zero_grad()

        # Calculate the gradients.
        loss.backward()

        # update parameters
        self.optimizer.step()


        return idxs, error, loss.item()


    def acc(self, state):

        state = torch.tensor(state, dtype=torch.float32).unsqueeze(0)

        # Exploration x Exploitation
        if np.random.rand() < self.epsilon:
            action = np.random.randint(self.action_size)
            return action

        # Greedy Policy, take the best action with the highest Q value
        else:
            action = self.DQN(state).argmax(dim=1).item()
            return action


    def update_epsilon(self):

        # after each episode the exploration rate should decrease and the agent will exploit more the best actions
        self.epsilon = max(self.epsilon_min, self.epsilon_decay*self.epsilon)


    def step(self, state, action, reward, next_state, done):
        """
        Arguments:
        state -- tensor  torch.Size([1, number of features])
        action -- tensor torch.Size([1, 1])
        reward -- tensor torch.Size([1, 1])
        next_state -- tensor  torch.Size([1, number of features])
        done --  tensor  torch.Size([1, 1])    False=0, True=1
        """

        # Convert to Tensor (1, number of features)
        state = torch.tensor(state, dtype=torch.float32).unsqueeze(0)
        action = torch.tensor(action).unsqueeze(0).unsqueeze(0)
        reward = torch.tensor(reward, dtype=torch.float32).unsqueeze(0).unsqueeze(0)
        next_state = torch.tensor(next_state, dtype=torch.float32).unsqueeze(0)
        done = torch.tensor(done, dtype=torch.float32).unsqueeze(0).unsqueeze(0)

        # add new experiences to PER
        self.per.add([state, action, reward, next_state, done])


        # Count the number of experiences added on the SumTree
        self.counter = self.counter + 1


        # Check if we have more experiences than the mini_batch size
        if self.counter >= self.batch_size:

            # Only train the DQN after we add "x" number of new experiences
            if self.counter % self.update_every == 0:


                # retrieve samples from PER
                # list with Index [scalar], mini_batches[tensors] and IS_Weights [scalar]
                idxs, mini_batches, IS_Weights = self.per.sample(self.batch_size)


                # Train DQN and calculate the Loss
                idxs, error, loss = self.train_DQN(idxs, mini_batches, IS_Weights)

                # update the priority value: PER
                self.per.update(idxs, error)


                # update DQN_Target Parameters - Soft Update
                self.update_DQN_Target()


In [20]:
def DQN_Model(env, num_episodes):

    # save the last 100 scores
    scores = deque(maxlen=100)
    all_scores=[]
    
    # Create an Agent
    agent = Agent(env, state_size=37, action_size=4, epsilon=1, epsilon_decay=0.9985, epsilon_min=0.01, gamma=0.97, tau=0.01,
                  batch_size=64, update_every=4, learning_rate=0.01, capacity=2**14, epsilon_per=0.0001, a_per=1, b_per=0.5,
                  increment_b=0.0000025, absolute_error=1, terminal_state=None)

    
    # Create Networks: DQN and DQN_Target
    agent.create_Dueling_DQN()


    # loop over the episodes
    for i in range(1, num_episodes + 1):
        score = 0
        r = []
        
        # reset the environment
        env_info = env.reset(train_mode=True)[brain_name]
        state = env_info.vector_observations[0]

        while True:
            # Choose an action
            action = agent.acc(state)

            # Take an action on the environment
            env_info = env.step(action)[brain_name]
            reward = env_info.rewards[0]
            next_state = env_info.vector_observations[0]
            done = env_info.local_done[0]

            # Train the model
            agent.step(state, action, reward, next_state, done)

            # actual state will be the next_state of previous step
            state = next_state

            # calculate the total score in 1 episode
            score = score + reward
            r.append(reward)
            if done:
                break

        # save the score of each episode on the list
        scores.append(score)

        if i >= 0:
            all_scores.append(np.average(scores))
            print('Episode:{}......Average Score:{:.2f}......Epsilon:{:.2f}....b_per:{:.2f}.....experience:{:.2f}'.format(i, np.average(scores), agent.epsilon, agent.per.b_per, agent.counter))
            print('blue bananas', r.count(-1), 'nothing', r.count(0), 'yellow bananas', r.count(1))
            print()
            if np.average(scores) >= 13:
                print('Congratulations, problem solved!!!')
                break
        
        # After each episode, update the exploration rate
        agent.update_epsilon()
    
    # Save Parameters
    print('saving parameters...')
    torch.save(agent.DQN.state_dict(), 'parameters3.pt')
    
    # Save All Scores - Average of 100 episodes
    print('saving scores...')
    file_out = open('all_scores3.pickle', 'wb')
    pickle.dump(all_scores, file_out)
    file_out.close()

    return all_scores

In [28]:
all_scores = DQN_Model(env, num_episodes=2500)

Episode:1......Average Score:0.00......Epsilon:1.00....b_per:0.50.....experience:300.00
blue bananas 1 nothing 298 yellow bananas 1

Episode:2......Average Score:0.00......Epsilon:1.00....b_per:0.50.....experience:600.00
blue bananas 0 nothing 300 yellow bananas 0

Episode:3......Average Score:-0.67......Epsilon:1.00....b_per:0.50.....experience:900.00
blue bananas 2 nothing 298 yellow bananas 0

Episode:4......Average Score:-0.25......Epsilon:1.00....b_per:0.50.....experience:1200.00
blue bananas 0 nothing 299 yellow bananas 1

Episode:5......Average Score:-0.20......Epsilon:0.99....b_per:0.50.....experience:1500.00
blue bananas 1 nothing 298 yellow bananas 1

Episode:6......Average Score:-0.17......Epsilon:0.99....b_per:0.50.....experience:1800.00
blue bananas 0 nothing 300 yellow bananas 0

Episode:7......Average Score:-0.43......Epsilon:0.99....b_per:0.50.....experience:2100.00
blue bananas 3 nothing 296 yellow bananas 1

Episode:8......Average Score:-0.50......Epsilon:0.99....b_pe

In [30]:
file_in = open('all_scores3.pickle', 'rb')
a = pickle.load(file_in)
print(a[-10:])

[12.859999999999999, 12.869999999999999, 12.92, 12.92, 12.970000000000001, 12.98, 12.98, 12.970000000000001, 12.98, 13.0]


In [38]:
# TEST the MODEL

scores = deque(maxlen=100)
all_scores=[]

# CREATE Dueling DQN
DUELING_DQN = Network()

# LOAD Trained Parameters
DUELING_DQN.load_state_dict(torch.load('parameters3.pt'))

# PLAY 100 Episodes
for i in range(1, 101):
    env_info = env.reset(train_mode=True)[brain_name]  
    state = env_info.vector_observations[0]            
    score = 0         
    
    while True:
        state_tensor = torch.tensor(state, dtype=torch.float32).unsqueeze(0)
        action = DUELING_DQN(state_tensor).argmax(dim=1).item()  
        env_info = env.step(action)[brain_name]        
        next_state = env_info.vector_observations[0]   
        reward = env_info.rewards[0]                   
        done = env_info.local_done[0]                  
        score += reward                                
        state = next_state                             
        if done:
            print('episode:', i, '.....score:', score)
            scores.append(score)
            break
            
print('average score 100 episodes:', np.average(scores))

episode: 1 .....score: 14.0
episode: 2 .....score: 14.0
episode: 3 .....score: 15.0
episode: 4 .....score: 14.0
episode: 5 .....score: 0.0
episode: 6 .....score: 13.0
episode: 7 .....score: 25.0
episode: 8 .....score: 6.0
episode: 9 .....score: 17.0
episode: 10 .....score: 14.0
episode: 11 .....score: 18.0
episode: 12 .....score: 15.0
episode: 13 .....score: 17.0
episode: 14 .....score: 16.0
episode: 15 .....score: 8.0
episode: 16 .....score: 17.0
episode: 17 .....score: 15.0
episode: 18 .....score: 13.0
episode: 19 .....score: 7.0
episode: 20 .....score: 8.0
episode: 21 .....score: 0.0
episode: 22 .....score: 14.0
episode: 23 .....score: 12.0
episode: 24 .....score: 19.0
episode: 25 .....score: 10.0
episode: 26 .....score: 21.0
episode: 27 .....score: 16.0
episode: 28 .....score: 15.0
episode: 29 .....score: 15.0
episode: 30 .....score: 12.0
episode: 31 .....score: 16.0
episode: 32 .....score: 17.0
episode: 33 .....score: 12.0
episode: 34 .....score: 14.0
episode: 35 .....score: 21.0
